Blog Coordinator

Knobe's X-Phi Page

X-Phi Grad Programs

« CFP: Experimental Philosophy meets Experimental Economics, in Kyoto | Main | how not to do experimental philosophy: a case study »



Feed You can follow this conversation by subscribing to the comment feed for this post.

Shen-yi liao


Thanks for posting this. It would be great to hear more about how to increase the likelihood that participants read the instructions and respond to the questions carefully. Have you found some ways to successfully do that?

I too have found that participants tend to take VERY short time in responding, even when offered higher pay ($1, when the usual is more like $0.25 I gather). A survey that tends to take ~10 minutes in lab settings was completed in less than 3 minutes by nearly half of the participants in one thing that I ran.

Some other potential problems: I am not sure if there is an option to do otherwise, but it seems that all questions (and vignettes) have to be on the same page. There is also no easy way to counterbalance the order other than manually. As you say, MTurk might be good for piloting, but I have some doubts about its use for a more rigorous study.

At any rate, it'd be nice to hear how people use this tool! One helpful resource page that I've found (through Nina Strohminger) is (including demographic information of MTurk workers).

Jonathan Phillips


I think the best way to make sure participants are carefully reading the vignettes and instructions is to include control questions or comprehension checks which can only be answered with careful reading. It is then easy to determine which participants did, in fact, follow the instructions and read carefully.

When you create a survey, you can also control who is able to participate (e.g. allowing only people whose work has have very rarely been rejected). I also find this to be helpful.

You are definitely right, though, that you can't counterbalance the order of questions or break up an experiment into multiple pages. For more rigorous, technical experiments, it may be better to use online services like Cotterweb or Qualtrics.

Thanks for including that link! It really does have a lot of very helpful information.

Eddy Nahmias

I'd like to use this resource, but it sounds problematic. Is there any way to make payment contingent on getting comprehension questions right? One thing Dylan Murray came up with in our recent studies is, in addition to dropping those participants who miss any of the 2-3 comprehension questions, we drop those who complete the survey in less time than two standard deviations from the mean (and there's lots of overlap between these fast-takers and those who miss comp. questions).

While we're at it, does anyone know if there are any standards in psychology for the best format for comprehension questions (and whether they should appear before or after experimental questions)?

And has anyone discovered any other ways to get diverse subject pools (other than Josh's park roving)? I still use students, though they are at Georgia State, so more diverse in SES, race, and religion than most universities. But I'd like to try some other populations.

Chandra Sripada

Hi guys,

I used MTurk to run two surveys (Sam turned me on to it). I ran 280 people in 18 hours total (in two separate blocks). I chose only people from the USA, assuming that would provide more homogeneity in English competence/idiolect. They averaged close to 3 minutes per survey, which seems brisk to me, but maybe ok.

I directed the Turkers to a U. Michigan secure site that hosts the survey. This allowed me to do multiple pages, full counter-balancing, etc.. When the survey is done, it gives the Turker a confirmation code to enter on the MTurk page. If you have Qualtrics access, this would be a perfect way to use MTurk and Qualtrics together.

In terms of data quality, I had run a subset of these questions paper and pencil with more than 200 students. The results are very very similar (all statistical tests for differences are highly non-significant). Also, in my MTurk survey, I asked the same question multiple times in slightly different ways (If this sounds weird, its a long story why I did this). There was good within subject consistency even though similar question items were often separated by a dozen other questions (making it hard for someone entering random responses to be consistent).

Interestingly, I also ran a similar survey using Craig's List ads and got about 100 responses. Quality was much worse with many, many more missing items and strange responses that are hard to believe or inconsistent. And it took a LOT of time and effort to get the ads placed.

All in all, I am pretty satisfied with the MTurk experience and will likely use it a lot in the future, for pilots if not for actual stuff to be published.


Jonathan Phillips


Using Mturk, you can either 'accept' or 'reject' any completed survey, and participants are only paid for accepted surveys. So there is no need to compensate participants who took the survey too quickly or who failed the comprehension questions. You get to choose who gets paid.

I like Dylan's way of selecting how quickly is too quickly for survey completion.

As far as the standards for comprehension questions, my sense is that they are generally at the end of the survey because you don't want to risk having them affect participants' answers. However, I am definitely not an expert on this.


This is really encouraging, and I really like your approach of combining MTurk and Qualtrics! This definitely seems like the way to go, and I think I'll take that approach in the future for studies that are complex enough to need multiple pages. Thanks for the tip.

Edouard MAchery


For memory, Justin Sytsma, jonathan Livengood, Adam Feltz and I are running a website, called PhilosophicalPersonality (google it if you do not know it).

We are perfectly willing to add some vignettes, and it costs something between 20 and 25 dollars (if memory serves) per vignette.



Josh May

Thanks for posting this, Jonathan! Sorry I'm sort of late to the discussion.

While I share Sam's worry about the speed at which Mturk subjects work, it might not be such a bad thing. After all, most of the time we want snap, intuitive judgments based on simple cases. And these Mturk subjects have the monetary incentive to both move quickly (so they can do more HITs) as well as to comprehend the material (else they don't get paid). Given Chandra's positive experience comparing data from more usual subjects, this sounds like a good resource for more than just piloting.

A major issue then is whether Institutional Review Boards would be at all wary of the use of human subjects on Mturk. My IRB at least is quite picky!

The comments to this entry are closed.

3QD Prize 2012: Wesley Buckwalter