Category Archives: economics

AI, Labor, and the Parable of the Horse

Today I attended the 5th year anniversary celebration for MSR NYC. There was a great group of speakers and panelists—I’m super impressed by what MSR has accomplished. One topic that came up at several points during the day was the labor market effects of technological developments—particularly that powerful AI might displace many workers.

Economists have traditionally been sanguine about the effects of technological change on the labor market, viewing widespread technological unemployment as unlikely. This perspective is based on the historical experience of substantial technological change not having persistent disemployment effects. However, it has been pointed out that we have one vivid example where this optimism has not been warranted—what I call the parable of the horse.

The story is that the internal combustion engine came along and horses saw their marginal product decline below the cost of their feed and so horses disappeared, at least in the “labor” market. This is undoubtedly true—the figure below shows the number of horses (and mules) in the US (from The Demographics of the U.S. Equine Population). The implication is that today’s “horses”—low skilled labor or maybe even labor in general—will disappear as AI can do more and more tasks.

I find the horse parable interesting, but unpersuasive—at least with respect to how it is likely to affect relatively low-skilled workers—because I think the analogy misses the reason why horses fared so poorly. The problem was not that they were “low skilled” but rather their extreme specialization. Horses did one job and one job only—exerting physical force, which could be used to pull or push things. That they could be almost entirely displaced by a superior pushing/pulling technology is, in some sense, not surprising. But what I think is important in the human labor market is that being able to do one thing—and one thing only—is typically a characteristic of high skilled labor, not low skilled labor.

Most low-skilled labor is no longer like horse labor, in that the low-skilled jobs that exist now are those that require some mix of physical, intellectual and even “emotional” skills. This mix makes full automation challenging. But even when some specific job does “fall” to automation, there is a still a very large pool of remaining jobs that could still need to be done and require relatively little skill or new training, by definition of being low-skilled. In short, one advantage of the low skilled labor market is that there are lots of jobs you qualify for. The downside, of course, is that precisely because lots of people can qualify for those jobs and so wages are low. The workers that are vulnerable to technical change—in the sense that they are likely to experience large declines in income—are those workers with highly specialized skills.

Truck driving as a low-skilled example

To give a more concrete example, consider the job of truck driver, which might be on the automation chopping block. First, many people with the job “truck driver” are actually some combination of sales representative, inventory-taker, first line mechanic, warehouse worker, forklift operator and so on. As such, it is far from obvious that even if substantial amounts of driving end up being done through automation that labor demand for “truck driver” would fall.

Second, even if the truck driver occupation sees a large negative demand shock, what other jobs could a truck driver do that pay about the same? Well, let’s look at the BLS occupational data. The table below shows the most recent BLS occupational data, w/ US employment totals and average hourly wages, sorted by average hourly wage. I restricted the list to occupations with more than 500K employees. We can see that being a light truck driver (i.e., not a heavy truck transporting freight or heavy equipment) pays about $16.50/hour, which is below the median wage in the US but still substantially higher than the minimum wage.

Retail Salespersons	4,612,510	12.67
Nursing Assistants	1,420,570	12.89
Landscaping and Groundskeeping Workers	895,600	13.20
Laborers and Freight, Stock, and Material Movers, Hand	2,487,680	13.39
Receptionists and Information Clerks	975,890	13.67
Security Guards	1,097,660	13.68
Substitute Teachers	626,750	14.25
Bus Drivers, School or Special Client	505,560	14.70
Team Assemblers	1,115,510	15.17
Office Clerks, General	2,944,420	15.33
Medical Assistants	601,240	15.34
Shipping, Receiving, and Traffic Clerks	674,820	15.55
First-Line Supervisors of Food Preparation and Serving Workers	884,090	16.02
Light Truck or Delivery Services Drivers	826,510	16.38
Industrial Truck and Tractor Operators	539,810	16.39
Medical Secretaries	530,360	16.50
Customer Service Representatives	2,595,990	16.62
Secretaries and Administrative Assistants, Except Legal, Medical, and Executive	2,281,120	16.92
Construction Laborers	887,580	17.57
Maintenance and Repair Workers, General	1,314,560	18.73
Bookkeeping, Accounting, and Auditing Clerks	1,580,220	18.74
Inspectors, Testers, Sorters, Samplers, and Weighers	508,590	18.95
Automotive Service Technicians and Mechanics	638,080	19.58
Heavy and Tractor-Trailer Truck Drivers	1,678,280	20.43

We can see that both above and below, there are a number of jobs that are plausible substitute occupations for a displaced truck driver. For example, of those below, most require no formal education or certification, perhaps except for substitute teachers or medical assistants. If we go higher, we start to see jobs that require more skills or that are more physically taxing or dangerous (e.g., construction laborer), but are still reasonable substitutes. For example, many truck drivers are also decent mechanics and perhaps with some more training, could find work as “Maintenance and Repair Workers, General.”

Not only do displaced truck drivers have lots of “nearby” occupations that pay about the same with little additional human capital requirements, the displaced drivers are not likely to drive down wages very much in their new occupations. There are, of course, lots of truck drivers, but if they split into a reasonably large chunk of other occupations, the new entrants would not be much of a supply shock.

A specialized occupation example

Now we’ll look at a more specialized occupation. Let’s consider accountants and auditors (even this one still seems far away from being even remotely automatable). It pays quite nicely and requires substantial specialized skill. If we look at nearby jobs, very few would be open to a displaced accountant without substantial re-training.

Lawyers	609,930	65.51
Financial Managers	531,120	64.58
General and Operations Managers	2,145,140	57.44
Software Developers, Applications	747,730	49.12
Management Analysts	614,110	44.12
Computer Systems Analysts	556,660	43.36
Accountants and Auditors	1,226,910	36.19
Business Operations Specialists, All Other	926,610	35.33
Registered Nurses	2,745,910	34.14
Market Research Analysts and Marketing Specialists	506,420	33.67
First-Line Supervisors of Construction Trades and Extraction Workers	517,560	32.13
Sales Representatives, Wholesale and Manufacturing, Except Technical and Scientific Products	1,409,550	32.11
Sales Representatives, Services, All Other	886,580	29.98
Police and Sheriff’s Patrol Officers	653,740	29.45
Secondary School Teachers, Except Special and Career/Technical Education	962,820	*

Accountants are the “horses” here—the ones that are vulnerable to large drop offs in earnings because of their specialization. Fortunately, if we care about inequality, this is the “right” group to be affected. Because of their existing financial wealth and their general human capital, they are likely better able to deal with the disequilibria created by technological change.

Data openness by private firms

The New York Times has a story today about social scientists working with company data and being unable or unwilling to make it public. The story begins:

When scientists publish their research, they also make the underlying data available so the results can be verified by other scientists.

I think the first sentence is probably more a description of how we’d like the world to be than how it actually is right now, especially in the social sciences. The main so-what of the story is that private companies are collecting enormous amounts of high quality data that lets you do fascinating social science, but companies are understandably reluctant to make this data public, primarily for privacy reasons (and probably also because they are afraid of giving up some competitive advantage).

I think the options for any organization that does or might do research are:

1) Do research for business purposes. Make neither the findings nor the data public.
2) Do research for business purposes. Make the findings but not the full data public.
3) Do research for business purposes. Make the findings and data public.
4) Do research. Make findings and data public.

Most companies probably aren’t interested in (4) and this is probably academia’s biggest comparative advantage. Barring (4), I think from a social perspective, privacy issues aside, the best outcomes in order are (3) > (2) > (1). I can understand (1) in some cases, but at least in the kind of companies I’m familiar with, the advantages of keeping everything secret probably aren’t that great.

The advantages of (2) or (3) over (1):

a) If you’re a software company and you release a feature that works, it will probably get copied anyway, regardless of whether you publish a paper, so you might as well get the thought leadership credit for coming up with the idea in the first place. This paper is/was the basis for Google’s secret sauce—posting it to the InfoLab servers back in 1999 didn’t doom the company and probably did a lot to increase the perceptions that they were doing something smarter (even though there were antecedents of this idea going back many years—including in Economics, by my academic grandfather).

b) If you give them access and them publish, you can get outside academics to work on your problems for free (the Netflix prize is an obvious example). You can recruit those academics to come work for you, or at least get their grad students to come work for you.

c) If you let your internal researchers publish, you can get them to work at reduced cost or get researchers you otherwise wouldn’t be able to attract (see Scott Stern’s paper on scientists “paying” to do science).

On (2) versus (3), I think there is a real dilemma: openness and privacy concerns are in tension. Furthermore, just releasing more aggregated or somehow obfuscated versions of the data is not risk free: there’s actually an emerging literature in Computer Science on how to release data in ways that are guaranteed to still have the right privacy properties (~~CMU~~ UPenn professor Aaron Roth recently taught a course on the topic). The fact that smart people are working on it is exciting, since they might figure out provably risk-free ways to release data publicly, but it’s also evidence that this isn’t a trivially easy problem—seemingly innocuous data disclosures would let someone unravel the obfuscation.

As a coda, I have a personal anecdote to share about this story. One of the people discussed in the article is Bernardo Huberman:

The chairman of the conference panel — Bernardo A. Huberman, a physicist who directs the social computing group at HP Labs here — responded angrily. In the future, he said, the conference should not accept papers from authors who did not make their data public. He was greeted by applause from the audience.

When I was a grad student, I taught a course to Harvard sophomore economics majors called “Online Labor” (syllabus). I assigned some of Huberman’s papers on motivation. I emailed him to ask for the data from one of his papers. He wrote back:

Dear Dr. Horton:
Thank you for your interest in my work and I certainly feel pleased when I learn that you liked my paper enough to assign it to your class.
As to your request, let me talk with the person who now handles the youtube data (we lately used it to uncover the persistence paradox) and I’ll get back to you.
Incidentally if you are interested in the role that attention and status (its marker) play among people I could send you a paper that reports on a experiment (as opposed to observational data) that elucidates it quite cleanly across cultures.
Best,
Bernardo

I got the data within days—I can state that he privately practices what he preaches publicly.

Update: I incorrectly stated that Aaron Roth was a professor at CMU—he did his PhD at CMU. He’s a professor at UPenn. Apologies.

Economics of the Cold Start Problem in Talent Discovery

Tyler Cowen recently highlighted this paper by Marko Terviö as an explanation for labor shortages in certain areas of IT. The gist of the model is that in hiring novices, firms cannot fully recoup their hiring costs if the novices’ true talents will become common knowledge post-hire. It’s a great paper, but what people might not know is that the theory it proposes has been tested and found to perform very well. For her job market paper, Mandy Pallais conducted a large experiment on oDesk where she essentially played the role of the talent-revealing firm.

Here’s the abstract from her paper:

… I formalize this intuition in a model of the labor market in which positive hiring costs and publicly observable output lead to inefficiently low novice hiring. I test the models relevance in an online labor market by hiring 952 workers at random from an applicant pool of 3,767 for a 10-hour data entry job. In this market, worker performance is publicly observable. Consistent with the models prediction, novice workers hired at random obtain significantly more employment and have higher earnings than the control group, following the initial hiring spell. A second treatment confirms that this causal effect is likely explained by information revelation rather than skills acquisition. Providing the market with more detailed information about the performance of a subset of the randomly-hired workers raised earnings of high productivity workers and decreased earnings of low-productivity workers.

In a nutshell, as a worker, you can’t get hired unless you have feedback, and you can’t get feedback unless you’ve been hired. This “cold start” problem is one of the key challenges of online labor markets, where there are far fewer signals about a worker’s ability and less common knowledge about what different signals even mean (quick: what’s the MIT of Romania?). I would argue that scalable talent discovery and revelation is the most important applied problem in online labor/crowdsourcing.

Although acute in online labor markets, the problem of talent discovery and revelation is no cake walk in traditional markets. Not surprisingly, several new start-ups (e.g., smarterer and gild) are focusing on scalable skill assessment, and there is excitement in the tech community about using talent revealing sites like StackOverflow and Github as replacements for traditional resumes. It is not hard to imagine these low-cost tools or their future incarnations being paired with scalable tools to create human capital, like the automated training programs and courses offered by Udacity, Kahn Academy, codeacademy and MITx. Taken together, they could create a kind of substitute for the combined training/signaling role that traditional higher education plays today.

Like what you read?
Why not follow me on twitter or subscribe to this blog via RSS?