Shen-yi Liao

As an outsider to this debate, I am just so impressed by everyone's collegiality and impartiality in approaching this replication project. We should all thank David, Josh, and John as setting a model. (In addition to, of course, the earlier model set by Jonathan Weinberg et al and the replicators in the cross-cultural intuition debate.)

Now, David notes that "The original study and the replication studies were distributed using different means, and also different population demographics. One or both of these differences may explain why the age finding failed to replicate." And these seem like plausible factors.

Psychologist Dan Simons suggests that it is helpful for experimental papers to specify the generalizability target ( ), e.g.

"Limits in scope and generalizability

The results from this study should generalize to experienced bridge players at other duplicate bridge clubs, as well as to other domains in which players regularly compete against the same group of players in games of skill, provided that the outcome of any individual match or session is not determined entirely by skill."

I'm just wondering that in this case, would the original authors have a suggestion for the conditions under which the age effect might be replicated? For example, is it for all contested knowledge cases, all fake-barn cases, or this particular fake barn case? I think understanding the intended generalizability target a little better can really help clarify the factors that replication attempts need to pay particular attention to.

(In the paper, sometimes the claim stated quite cautiously, but other times it is stated quite generally, e.g. "Third, perhaps the most interesting result of this study is that there is a negative relationship between knowledge attribution and age.")

Wesley Buckwalter

For those who are interested in ascription practices in these situations more broadly, several conceptual replications of the main result of David’s paper regarding knowledge ascription in the fake barn case are offered in “Knowledge and Luck” forthcoming in Psychonomic Bulletin and Review. The penultimate draft of that paper can be found here:

I called them conceptual replications because the materials that were used involved an array of different cover stories that nonetheless still retained the same structural features as fake barn cases. For instance, they featured different agents who perceptually detect truth-makers in the presence of salient but ultimately failed threats to the agent’s ability to detect them. The result was that patterns of ascription again closely resembled responses to paradigmatic knowledge.

At this point, it seems beyond doubt that there is very high rate of knowledge attribution in fake barn and structurally similar cases, and that the conventional philosophical wisdom about this is wrong. I’d also note that age effects were not detected in these studies, although they were also conducted using mturk, and David’s points in the OP again seem to apply.

Joshua Knobe

First of all, hats off to David and his colleagues. As lots of people have already noted, this post really does a lot to advance our study of these questions and, just generally, to establish the right tone for thinking about replication studies.

Shen-yi makes a very good point in his comment above. Basically, he suggests that researchers should do more to be clear about the conditions under which they expect their effects to arise. I think this point is a very helpful one, but I wanted to propose what I hope will be a friendly amendment.

Specifically, I'm not sure that researchers need to say anything directly about the conditions under which they expect their effects to arise. Instead, it seems like researchers can just make hypotheses about the underlying *psychological processes* that give rise to their effects. These hypotheses will then generate predictions about when the effect should show up -- though only in conjunction with a whole lot of auxilliary hypotheses.

To give just one example, in their excellent paper on intuitions about reference, Machery, Mallon, Nichols & Stich suggest that the reason why Western participants give different responses from East Asian participants is that causation is more salient for Western participants. This claim does not directly tell us when the effect is supposed to show up, but it does give us a more indirect way of figuring it out. (Suppose someone asks, 'Will people from this other culture give the same answer that Western participants do?' The hypothesis yields a prediction, which is something like this: 'They should give the same response if they have they same way of thinking about causation.')

Needless to say, I am not trying to suggest that we have some a priori way of knowing that hypotheses about underlying psychological processes are the way to go. Rather, my reason for thinking all of this comes from facts about how research has been proceeding thus far. Looking at what has happened over the past ten years or so, it seems like there has been a lot of productive research that started with hypotheses about psychological processes and then triggered numerous follow-up studies that moved things forward in helpful new directions.


"More plausibly, the older population surveyed in the original study (in person on paper) may be unlike the population surveyed in a replication (online via the m-turk service). If they are different, more research must be done to assess which population, if either, is more representative of older people more generally."

I seriously doubt this claim. One clear possibility is that the original study was underpowered for making the correlation claim you make in the paper so the presence of outliers spuriously drove your effect.

You seem to think you can ignore the possibility of a type 1 error...but you should not ignore this possibility, especially considering the study is insufficiently powered.

David Colaco

Thanks, again, everyone, for the comments on the replication so far. Here are my thoughts on some of the comments.

Shen-yi, perhaps the findings would be replicated if the study were performed in person in public places, as was the case in the original study. This would greatly reduce the worry that the population surveyed in the original is different from the population surveyed using M-Turk. I don’t take our findings to generalize to all instances of disputed knowledge, but rather to fake-barn vignettes similar to the version we used. The only way to find out which features are relevant is to run more studies, I think.

Josh, in the paper, we suggest a number of reasons why there might be a relationship between age and knowledge attribution, in which age is a relevant factor (age and its relation to conservative thinking, age and its relation to the experience of merely apparent instances of knowledge, etc.). We also gesture at reasons in which age is a spurious factor (people within certain age ranges had different life experiences, age being correlated with some other factor affecting knowledge attribution). I think one could draw on these speculations when forming future studies if one thinks that each or any of them is responsible for the effect we found.

Zach, I agree with you that a type I error very well might explain our original findings, given a failure to replicate within a different population from the original. If you are picking up on the ‘more plausibly’ language in the sentences you copied, I meant that language to note that it is comparatively more plausible that the populations are unlike (and one might be more representative than the other) than that the original and replication studies have different age ranges. You are right to point out that it may be that neither of these is correct; I did not intend my post to imply that the only explanation for a failure to replicate is a difference between populations.

Shen-yi Liao

Thanks, David. I definitely agree that more studies can clarify which features are ultimately relevant.

However, I was puzzled by your responses to Josh and me. If the speculation of the underlying psychological mechanisms has to do with the relationship between age and knowledge attribution generally, why would you expect the effect to only be replicated in fake-barn cases, and not other cases of contested knowledge attribution? (The contested part is there to rule out the lack of an age effect due to a ceiling or floor effect.)

David Colaco

Shen-yi, I thought it prudent to not make claims that our findings will extrapolate to other instances of contested knowledge attribution without additional data to support them. The speculations we make in the paper might not be relevant to all cases of contested knowledge. Given that ‘cases of contested knowledge’ is a broad category, I am not currently in a position to state what additional features might be present in other cases, and whether or not these features might confound any process that might result in a correlation between age and knowledge attribution.

