When I write a sentence, there’s about a 10% chance it will have typo or grammatical error of some kind. It’s often painful to find them later, as like most people, I tend to “fill in the gaps” or glide over typos when reading my own writing. Fortunately, this kind of editing, unlike, say, reading for structure or consistency, is very parallelizable. In fact, reading each sentence alone, out of order, might even be better than reading the whole document, sentence by sentence.
As an experiment, I wrote a little script that splits a document up into sentences, with one sentence per line (the script is here). With this CSV, I can use Mechanical Turk to create HITs, with one HIT per sentence. The instructions for workers to label each sentence as “OK” or “Not OK” with an optional field to explain their reasoning. The Mturk interface looks like this:
After splitting the sentences, I went through the CSV file to remove blank lines and LaTeX commands by hand, though one could easily add this feature to the script.
I posted the HITs on MTurk this morning, paying 2 cents, with 4 HITS per sentence (so each sentence will be checked 4 times by different workers). The text was a paper I’m working on. Results starting coming in remarkably quickly—here it as after 30 minutes:
I’m not thrilled with the hourly rate (I try to shoot for $5/hour) but this average is always very sensitive to workers who take a long time. So far, the comments are very helpful, especially since with multiple ratings, you can find problematic sentences—for example:
The “86” is the line number from the LaTeX document, which is nice because it makes it easier to go find the appropriate sentence to fix. Here are some more samples of the kinds of responses I’m getting:
Overall, I think it’s a successful experiment, though it was already well known that MTurk workers can do editing tasks well, from soylent.