Category Archives: labor markets

Would a job by any other name pay as much?

I’m working on a project where it would be useful to know what an oDesk job is likely to pay at the time it is posted. Although there are plenty of structured predictors available (e.g., the category, skills, estimated duration etc.), presumably the job description and the job title contain lots of wage-relevant information. The title in particular is likely to identify the main skill needed, the task to be done and perhaps the quality of the person the would-be employer is looking for (e.g., “beginner”, or “senior”).

Unfortunately, I haven’t done any natural language processing before, so I’m a bit out of my element. However, there are good tutorials online as well as R packages that can guide you through the rough parts. I thought writing up my explorations might be useful to others that want to get started with this approach. The gist of the code I wrote is available at here.

What I did:


I took 20K recent hourly oDesk jobs that where the freelancer worked at least 5 hours. I calculated the log wage over the course of the contract. Incidentally, oDesk wages—like real wages—are pretty well approximated by a normal distribution. 
2) I used the RTextTools  package to create a document term matrix from the job titles (this is just a matrix of 1 & 0 where the rows are jobs and the columns are relatively frequent words that are not common English words—if the job title contained that word, it gets a 1, otherwise a 0).

3) I fit a linear model using the lasso for regularization (using the glmnet package). I used cross validation to select the best lambda. A linear model probably isn’t ideal for this, but at least it gives nicely interpretable coefficients.

So, how does it do?  Here are a sample of the coefficients that didn’t get set to zero by the lasso, ordered by magnitude (point sizes are scaled by the log number of times that word appears in the 10K training sample): 

The coefficients can be interpreted as % changes from the mean wage in the sample when that corresponding word (or word fragment) is present in the title. Nothing too surprising I think: at the extremes, SEO is a very low paying job, whereas developing true applications is high paying.

In terms of out of sample prediction, the R-squared was a little over 0.30. I’ll have to see how much of an improvement can be obtained from using some of the structured data available, but explaining 30% of the variation just using the titles is a higher than I would have expected before fitting the model.


Bigger Big Mac by Simon Miller, on Flickr
Creative Commons Attribution-Noncommercial-Share Alike 2.0 Generic License Image by Simon Miller 

A standard pricing strategy in many industries is bundling goods, e.g., productivity “suites” like Microsoft Office, value meals at fast food restaurants, hotel and flight combos, etc. In the labor market, we also see a kind of bundling, though not by design: each worker is a collection of skills and attributes that can’t be broken apart and purchased separately by the firm. For example, by hiring me, my company gets my writing, meeting attendance, programming, etc.; they can’t choose to not buy my low-quality expense-report-filing service.

Good mangers deal with this bundling by keeping workers engaged at their highest value activity. However, every activity has decreasing marginal returns, so even activities that start out as high-value eventually reach the “flat of the curve” where the marginal benefit of more of X gets pretty small. This phenomena gives large firms an advantage, in that their (generally) larger problems give workers more runway to ply their best skills (by the same token, small firms have to worry much more about “fit” within their existing team).

While pervasive, this flat-of-the curve dynamic and the resulting small-firm handicap is not a fundamental feature of organizations or labor markets–it springs from the binary nature of employment. It goes away or it least is diminished if a worker can instead being partly employed (i.e., freelance) at a number of firms, each paying the worker to do what they do best. To date, the stated value proposition of most freelancing sites has been that they allow for global wage arbitrage. Obviously that’s important, but I suspect this “unbundling” efficiency gain will, in the long term, have a more profound effect on how firms organize and how labor markets function.