Writing Smell Detector (WSD) – a tool for finding problematic writing

tl;dr version: WSD is a python tool to help find problems in your writing. Here’s the source and here’s example output

In grad school, I wrote a program that used a series of regular expressions to detect “writing smell” (analogous to code smell), i.e., telltale signs of bad writing and mistakes. The rules for smelliness were loosely based on one of my favorite writing how-to’s: Style: Toward Clarity and Grace by Joseph Williams.

The program took as input a text file and output was an annotated report with snippets of the offending bits. I used it for all my papers and found it really helpful, but the coding was very, um,  academic (i.e., written for use by the person who wrote it) and it was written in Mathematica [1], which was the language I knew best at the time. FWIW, here is my original version.

For a long time, I’ve wanted to port it to some other language and make it accessible and capable of receiving new rule contributions and explanations. To this end, I recently commissioned an oDesk contractor (utapyngo) to make a more polished, modular version in Python. I think he totally outdid himself. It’s got a nice modular model now that lets you easily incorporate new rules and he greatly improved upon my often-flawed regular expressions. Be forewarned—the documentation is non-existent and the rules aren’t explained, but I plan to take fix this over time, while I’m using it.

It’s open source (courtesy of oDesk, who paid the bills) and available here on github (live example output). To use it, just clone it, install the python package jinja2 and then do:

$ python wsd.py -o output_file.html your_masterpiece.tex

Here’s a screenshot of what the HTML output looks like, illustrating the a/an rule (i.e., that it’s “an ox” but “a cat”):

Note the statement of the rule, the patterns that it looks for and the snippets. It also has a hyperlink to the full text, which is available at the bottom of the document.

A few thoughts:

  1. If you’re interested in contributing (rules or features), let me know. 
  2. It might be nice to turn this into a web-service, though my instinct is that someone interested in algorithmically evaluating their LaTeX/structured text isn’t going to find cloning the repository & then running a script to be a big obstacle. And they probably don’t want to make their writing public.   
  3. A few weeks ago, I read this usethis profile of CS professor Matt Might. In the software section of the interview, he said that he had some shell scripts that do something similar. I haven’t really investigated, but maybe there’s ideas here worth incorporating. 

[1] When I told the other members of the oDesk Research / Match Team that I had code for doing this writing smell thing, they were impressed and wanted a copy; when I told them it was written in Mathematica, they thought this was hilarious and mocked me for several minutes. I tried to explain that Mathematica actually has great tools for pattern matching, but this fell on deaf ears.