Category Archives: Uncategorized

How would we know if “The Bay Area Should Levy a 5% Equity Tax on Startups”

In a recent Information article, Sam Lessin proposed a Bay Area 5% equity tax on startups. It’s an interesting idea; I don’t know whether it’s “good” idea. This blog post will not answer the “good” question, but I’d like to use this proposal to explore some ideas in public policy and economics and talk about some of my work that bears on the question.

If a 5% equity tax were imposed, what would happen? Ideally, we’d have a true experiment to settle the question: say we had 300 more or less equivalent Silicon Valleys, half of which got the tax, half of which didn’t, and then we’d check in on them in 5 or 10 years. Yeah, so that’s not going to work.

The problem is clear—we don’t have that many Silicon Valleys, we don’t have that much time, and we certainly don’t have the political power to impose such a tax randomly. Further, it is not clear what we should even look at to assess “good”—we could see how much revenue that tax generated, but what we care about is the revenue generated at what cost to society. If the shadow of a 5% tax causes a huge reduction in the number of startups, then whatever is raised could be very costly indeed. Though even saying something strong here would require some notion of the “quality” of the startups the tax displaced or prevented and whether some other startup would have just filled its place (e.g., kill Uber, get Lyft). We’d also care about who ultimately paid the tax, as the incidence is unclear—is it entrepreneurs? VCs? Workers in the tech sector? Landlords? Consumers of what Silicon Valley makes?

To assess the proposal, we’re going to need to be less empirical and more theoretical. I am highly empirical. I’m a card-carrying member of the credibility revolution. Most of my papers are not just empirical but experimental. That being said, there are important policy questions we care about that we need to answer quickly that existing empirical work just does not speak to. That leaves economic theory or guessing.

Screenshot 2017-04-24 11.18.28

My working paper, “A Model of Entrepreneurial Clusters, with an Application to the Efficiency of Entrepreneurship” is theory paper designed to answer this kind of question (among others). The model is not complex, but it has a few too many moving pieces for a blog post, but I can sketch out the relevant parts and show how to apply it.

In a nutshell, the paper describes a model with three important markets: the market for venture capital, the market for “engineers” and the product market for what successful startups sell. In the paper, would-be entrepreneurs weigh the expected returns to doing a startup to the “safe” returns to being an engineer/employee. A key feature of the model is the notion that lots of would-be entrepreneurs can pursue the same “idea” but that there is a single winner on each idea. This has some implications for the entrepreneurial system. One less startup does mean one less shot at commercializing some innovation, but if lots of startups were pursuing more of less the same idea, the welfare consequences of “losing” that startup to employment is not so bad. Furthermore, it doesn’t have much of a labor market consequences either—there is no “missing” successful startup that is no longer demanding labor.

Anyway, getting back to the tax question. We can think of the tax as increasing the cost of doing a startup. The effects of such a shock are worked out in Section 3.8 in the paper. This increase in cost shifts some would-be entrepreneurs back into the labor market, which lowers wages. This, to some extent, offsets the effect of the tax from the entrepreneurs perspective, as it lowers startup labor costs, making startups ex ante more attractive (imagine Google, but getting to pay 3% lower wages—starting Google is more attractive). So some of the tax gets borne by workers. How much? Well, in the model, the effect of a small change in startup costs on wages is

Screenshot 2017-04-24 11.37.58

which, uh, may still leave you with some questions. The “g” is the fraction of the labor force that is entrepreneurs. This part just says that when a large fraction of the labor force is entrepreneurs, a tax on that has a big spill-over effect on wages, and vice versa when it is small.

The term inside the parentheses has an economic interpretation, in that it captures how large a flow of engineers must leave entrepreneurship to re-establish an equilibrium, with larger flows leading to greater reductions in wages. Suppose that the startup success probability was completely inelastic, meaning that a reduction in the number of startups doesn’t “help” the startups that remain succeed. The increase in startup costs drives engineers from entrepreneurship, but because the startup success probability does not change, there is no compensating increase in success probability that would occur if the success probability was elastic. As such, a larger flow out of entrepreneurship is needed to re-establish the equilibrium, which means that employees see a larger fall off in wages. With a highly elastic success probability, a smaller number of exiting entrepreneurs is needed to establish a new equilibrium, and so there is less downward wage pressure and so less pass through of startup costs.

The model says that the overall surplus of the system is proportional to engineer wages in equilibrium. As such, what we would hope, as a social planner, is that the tax does not lower wages much in equilibrium. This happens when the startup success probability is highly elastic. A key feature of the model is that a highly elastic startup success probability is the sign in the model of too much entrepreneurship, in the sense that there are lots of entrepreneurs pursuing more of less the same ideas. In the model, ideas differ in their perceived “quality” and obviously good ideas get lots of entrants pursuing them, while only the marginal ideas get the efficient number of entrepreneurs (perhaps the ideas-that-seem-bad-but-are-actually-good). The figure below is the key figure from the paper:

Screenshot 2017-04-24 11.54.03

Conclusion

To wrap it up, the model says that if you think there is lots of duplicative entrepreneurship right now—too many entrepreneurs pursuing more or less the same idea—the model says that Sam’s tax is very likely to be a good idea, as it will mostly reduce, on the margin, startups pursuing ideas that were already being pursued, and hence the social welfare consequences will be minimal (interestingly, I think this elasticity question probably can be pursued empirically, using booms and busts in startup funding and/or technological shocks). Is my model the right way to model things? I have no idea, but it’s *a* model and we have to make choices. Of course, there are lots of considerations this analysis doesn’t consider, but I think it’s a  starting point for thinking about the issue, and also potentially the impetus for newer, better models.

 

2SLS in Mathematica

2SLS data setup. Note that there is RV u that appears in both x and in the error term. There is also an IV, z, that affects x but not u.

n = 10000;
z = Table[Random[NormalDistribution[0, 1]], {n}];
B0 = 1;
B1 = 2;
gamma0 = -2;
gamma1 = 4;
u = Table[Random[NormalDistribution[0, 1]], {n}];
x = u + gamma0 + gamma1 * z +
Table[Random[NormalDistribution[0, 1]], {n}];
e = 5*u + Table[Random[NormalDistribution[0, 1]], {n}];
y = B0 + B1*x + e;

Screen Shot 2017-04-23 at 9.53.52 AM

Note that the real coefficient on x is 2, but the estimated coefficient biased upwards. Now we can do the first stage:

First stage


iota = Table[1, {n}];
Z = Transpose[{iota, z}];
Gammahat = Inverse[Transpose[Z].Z].Transpose[Z].x;
xhat = Z.Gammahat;
Xhat = Transpose[{iota, xhat}];
Bhat = Inverse[Transpose[Xhat].Xhat].Transpose[Xhat].y

and now the coefficient estimates are close to the true values:

Screen Shot 2017-04-23 at 9.55.10 AM

Panel data in Mathematica

This code constructs the design matrix for a panel with both time and individual fixed effects and then estimates the model.


\[Beta] = 3;
NumIndividuals = 323;
NumPeriods = 4;
NumObs = NumIndividuals * NumPeriods;
x = Table[Random[NormalDistribution[0, 1]], {NumObs}];
y = \[Beta]*x + Table[Random[NormalDistribution[0, 1]], {NumObs}];
XindivFull =
KroneckerProduct[IdentityMatrix[NumPeriods],
Table[1, {NumIndividuals}]] // Transpose;
XtimeFull =
KroneckerProduct[Table[1, {NumIndividuals}],
IdentityMatrix[NumPeriods]];
Xcombo = Join[XindivFull[[All, {2, NumPeriods}]],
XtimeFull[[All, {2, NumPeriods}]], 2];
\[Iota] = Table[1, {NumObs}];
XwIntercept = MapThread[Append, {Xcombo, \[Iota]}];
Xfull = MapThread[Append, {XwIntercept, x}];
\[Beta]hat = Inverse[(Transpose[Xfull].Xfull)].Transpose[Xfull].y

OLS in Mathematica

These are some notes for myself on econometrics in Mathematica.

Set up the data


n = 100;
B0 = 1;
B1 = 2;
x = Table[Random[NormalDistribution[0, 1]], {n}];
\[Epsilon] = Table[Random[NormalDistribution[0, 1]], {n}];
y = B0 + B1*x + \[Epsilon];
ListPlot[Transpose[{x, y}]]

Screen Shot 2017-04-23 at 8.52.46 AM

Create the model matrix


iota = Table[1, {n}];
X = Transpose[{iota, x}];
k = Dimensions[X][[2]];

Estimate coefficients


Bhat = Inverse[Transpose[X].X].Transpose[X].y

Make predictions


yhat = X.Bhat;
ListPlot[{Transpose[{x, y}], Transpose[{x, yhat}]}]

Screen Shot 2017-04-23 at 8.55.34 AM

Compute the variance/covariance matrix


error = y - yhat;
sigma = Sqrt[Variance[error]]
sigma^2* Inverse[Transpose[X].X] // MatrixForm

Screen Shot 2017-04-23 at 8.57.34 AM

Regression statistics

R squared


rsq = Variance[yhat]/Variance[y]

A Way to Potentially Harm Many People for Little Benefit

Noah Smith has an article proposing some kind quasi-mandatory national service. The end-goal is not, say, winning WWII, but rather the social cohesion side-effect gained from making young people from different backgrounds work together. For many reasons, I think this is a bad idea, but perhaps the most important is that for it to “work”—to really forge some kind of deep band-of-brothers connection, you’d have to impose terrible costs on the participants.

The reason military service is described as “service” or a “sacrifice” is that is, even in peace time. You risk death and terrible injuries, both mental and physical. You lose a great deal of personal freedom and gain a great deal of worry and anxiety. You risk seeing your friends and people you are responsible for killed and maimed. You spend months and even years away from loved ones. I spent 5 years in the Army as a tank platoon leader & company executive officer, after 4 years at West Point. Of my active duty time, 15 months were spent in Iraq (Baghdad and Karbala). It was, without a doubt, the worst experience of my life—nothing else even comes close, and I got off easy.

One might say, well, this is just the “war” version of military service. Not really. Outside of combat, back in Germany: one soldier in my battalion (slowly) drowned when his tank got stuck in deep mud during a training exercise and the driver’s compartment filled with water; another in my brigade was electrocuted when loading tanks onto rail cars; another young soldier from my brigade was, two weeks after arriving in Germany, promptly robbed & beaten to death by two other privates from his battalion. With our deployment looming, one lieutenant in our brigade went AWOL and later killed himself. And I’m not considering the numerous injuries. This was never summer camp.

When you peel back the superficially appealing aspects of military service—focus on teamwork, training, college benefits, supposed egalitarian design etc., you’re confronted with the fact that militaries are impersonal bureaucracies that (1)  treat soldiers as a means to an end, and (2) are designed to efficiency kill people and destroy things. Both features are necessary , but that does not make them less evil. Participating in those two functions, no matter how just the cause, is mental damaging for many, and deeply unpleasant for almost everyone.

So that’s all cost. Does military service “work” to build cohesion? I would give a qualified “yes,” but I don’t think it’s a generalized social cohesion Smith is after anyway—I don’t feel some deep attachment to the white working class, though I am more familiar with that culture than I otherwise would be. I’m sure I know more Trump supporters than the average (any?) NYU professor, but I don’t think I’m any more sympathetic. I have a bond to soldiers from my *platoon* and a deep friendship with some of my fellow officers, but here’s the rub—it’s based on the shared sacrifice. If we had just spent our time together fixing up trails or building playgrounds, those fellow soldiers would be something I already have lots of—former work colleagues.

To wrap it up, society doesn’t get the cohesion without the costly sacrifice, and creating that sacrifice artificially would be deeply wrong. And if the goal of mandatory service is just to get people to meet people from other backgrounds—say the kind of band-of-brothers level cohesion isn’t needed—surely there are cheaper, less coercive ways to do it.

Performative economics, or how my paper is used in wage negotiations

One sociological critique of economics is that unlike the physical sciences, economic research can affect the thing it studies. I might not be using the jargon the correct way, but the basic idea is that economics is “performative“—it’s not just a magnifying glass—it’s a magnifying glass that sometimes focuses the light and burns what you’re looking at. I have an example of this from my own work that bugs me more than a little bit, but is ultimately, my own fault. Let me explain.

So back in graduate school, Lydia Chilton and I wrote a paper called “The Labor Economics of Paid Crowdsourcing” (data & code here). In a nutshell, we introduced the labor economics way of thinking about labor supply to the crowdsourcing/computer science crowd. We also did some experiments where we varied earnings to see how workers reacted on MTurk. We thought the really cool “so what” of the paper was that we presented strong evidence for target earning—that workers had a preference for earning amounts of money evenly divisible by 5 (here’s the key figure–note that taller black histogram bars):

Screenshot 2017-04-20 11.01.44

Almost as an afterthought, we estimated the distribution of worker reservation wages for our (*very* unusual) task. We came up with a median value of $1.38/hour, using some strong assumptions about functional form. We put this in the abstract & we even discussed how it could be used to predict how many people would accept a task, because every paper has to make some claim to how it is useful.

Screenshot 2017-04-20 11.03.27

Anyway, every once in a while, I see something on twitter like this (7 *years* later):

Screenshot 2017-04-20 11.07.13

Hmmm, I wonder where that $1.38/hour figure came from. Anyway, mea culpa. If you’re a MTurk worker, my apologies. Feel free to cite this blog post as the authority that $1.38/hour is a silly number that shouldn’t anchor wages.

Causes of the Silicon Valley real estate crunch (and some potential solutions)

I unexpectedly got into two Twitter discussions recently about Silicon Valley (SV) and its effects on the local real estate market. I felt constrained by the 140 character limit, so I thought I’d write a blog post explaining my thinking (and add supply & demand diagrams!).

To understand what is happening in SV, we need to think about three markets:
(1) the product market for what SV tech companies sell
(2) the SV tech labor market and
(3) the SV housing market.

First, what’s obvious: there’s been a huge increase in demand for what Silicon Valley sells: the world is using way more IT than it used to. Someone has to build & program this stuff, and so there’s been a large increase in demand for certain kinds of high-skilled labor—namely software engineers, designers, product managers and so on. Let’s call them “tech people.”

First thing google shows for

Most tech people are transplants, coming to SV specifically to work in tech. They need a place to live. As such, a demand shock for tech labor is also a demand shock for housing in SV.

How the labor demand shock plays out

In the figure below, the top diagram is the labor market and the bottom diagram is the housing market. The y-axes are wages and real estate prices, respectively. The x-axes are tech people hired and units of housing consumed, respectively. The connection between these two markets is so tight that I assume that changes in tech people employed must be met one for one with changes in housing units consumed. This is why the two diagrams are stacked on top of each other.

Pre-boom equilibrium:

Here comes the iPhone: Tech Boom!

Let’s consider how a product market demand shock leads to a new equilibrium. First, the demand curve for labor shifts out (in red, top panel). If we ignored the housing market, we would just see higher wages and more tech people hired. However, these new tech hires want a place to live, so they shift out the demand curve in the housing market (in bottom panel, also in red).

But the tech people labor supply curve depends on housing costs

At this new higher price for housing, fewer tech people are willing to work at each wage (i.e., “I’ll stay in Seattle and work for Jeff Bezos, spending more on tissues and psychological counseling, but spending less on rent”). The higher housing prices shifts in the tech people labor supply curve. This shift takes some pressure off the housing demand, pushing down housing prices a little. This tatonment goes back and forth until a new equilibrium is reached with:

(1) more tech employees (but not as many as would be in absence of housing effects)
(2) higher wages and
(3) higher real estate prices

Where we are now:

The importance of the housing supply elasticity

As you might expect, how this process works out depends a great deal upon how these curves are shaped and how big these shocks are. One critical piece is the slope of that one curve that didn’t move around—the housing supply curve. From the perspective of of tech and non-tech workers and tech firms, we can say “elastic”  = “good” and “inelastic” = “bad” (existing, homeowners are another story).

Elastic supply = good. Imagine a better world in which the housing supply is completely elastic. The housing supply curve is flat. This means that no matter how large the positive demand shock in the housing market, house prices stay the same. Here, the demand shock in the tech labor market has no effect on non-tech workers through the housing channel (because housing prices do not rise). Also note that there is no pulling in of the tech worker supply curve—the workers get the “full” benefit in higher wages.

Inelastic supply = bad. Now, let us imagine a world where housing is completely inelastic, making housing a vertical line. In this inelastic case, housing is fixed. We already “know” that tech companies aren’t going to be able to hire more. Tech wages are going to rise, but the main beneficiaries will existing owners of housing because of the price increase. They get enormous rents—literally. Of course, the curve is not completely inelastic because of one very controversial “source” of elasticity is displacement. The tech people move in, the non-tech person moves out. This is why people throw yogurt (at best) at tech buses.

How do non-tech people fare?

A more complete analysis might consider the effect of the tech boom on non-tech wages. Presumably they get some benefit from increase demand on their services from tech people. And to some extent, non-tech sectors have to increase wages to get people to still live and work in SV. It seems unlikely to me that this is fully off-setting.

The main adjustment is probably housing displacement, meaning longer commutes. It makes more economic sense for them to move farther away (i.e., travel an hour a day and save $20/day on rent). That being said, they are almost certainly worse off with these horrendously long commutes than they were pre-boom.

What are the solutions?

  1. Do nothing. One “solution” is do nothing, under the belief that things will run their course and the tech boom will fizzle. To the extent that the boom does not subside, other places in the world will become relatively more appealing for tech as the high cost of labor in SV persists (because of housing). However, to date, SV seems to be becoming more important and tech becoming more centralized in SV, not less, so this might be a slow-acting solution. Further, it seems bad for SV as a region: If I were king of SV, I wouldn’t be sanguine about the “Detroit solution” to too much product market demand for what my region specializes in.
  2. Build more housing. Another solution (of course) is to increase the housing stock. This should push prices down. A better solution might be to enact structural changes to make the supply of housing more elastic. Given how much housing prices have risen, it seems that the supply is very inelastic (more on this later).
  3. Let people work remotely.  Another solution is radically different, which is to try to sever or at least attenuate the connection between the housing and labor markets. This is the “Upwork” solution in a nutshell, which their CEO outlined in a recent Medium postIf a tech company is open to remote hiring, then those remote hires never enter the local rental market and do not drive up prices. It does not even have to be either/or, as letting your employees work remotely some of the time helps: if I only have to be in San Francisco three days a week, living in Half Moon Bay rather than Cole Valley becomes much more attractive (Uber also helps here and autonomous vehicles would help a lot).

I’m particularly optimistic about, (3), the tech-focused solution, as it seems more likely “work” right away and it requires little political change. Also, somewhat ironically, the increasing maturation of technology for remote collaboration means that this approach should become more attractive over time.

Incidentally, why is the supply of housing in SV so inelastic?
Some of it is surely geography, about which little can be done. The peninsula is just not that wide and there aren’t large, nearby tracts of undeveloped land. I imagine that the, uh, interesting geological properties of the area matter for construction. However, the main cause seems not to be so much the quantity of land, but rather the intensity with which the land is used.

Take a Google streetview of walk of Palo Alto or Melo Park. When you consider how large the demand for housing is and then look at the built environment of those cities, there is a wild disconnect. These cities should be Manhattan-dense or at least Cambridge, MA-dense but they are not—it is mostly single family homes, some on quite large lots. They could be nice suburbs more or less anywhere. These are *very* nice place to live, of course, and I can understand the instinct to preserve them as they are. But the unchanging neighborhood character of Palo Alto is part of the reason why tech is having a huge negative externality on non-tech people, through the channel of higher housing costs.

Relevant disclosures: I used to work at Upwork’s predecessor company, oDesk. I still consult with them and I conducted academic research with their data. I also visited Uber as a visiting economist last summer and my wife works for them still. When I worked for Upwork, I lived in Redwood City until my landlord decided to not renew our lease so he could sell the place. We rented somewhere else that was a little cheaper, but my commute got longer. I might go back to SF to work for a bit this summer, if I can find a cheap enough place on Airbnb.

Reputation systems are great for buyers (and good for sellers too)

In a recent NYTimes article about Uber drivers organizing in response to fare cuts, there was a description of the rating system and how it affects drivers:

They [drivers] are also constrained by the all-important rating system — maintain an average of around 4.6 out of 5 stars from customers in many cities or risk being deactivated — to behave a certain way, like not marketing other businesses to passengers.

Using “marketing a side business” as an example of behavior the reputation system curtails is like saying “the police prevent many crimes, like selling counterfeit maple syrup“—technically true, but it gives the wrong impression about what’s typical.

Bad experiences on ride-sharing apps presumably mirrors bad experiences in taxis: drivers having a dirty car, talking while driving, being rude, driving dangerously or inefficiently and so on. I’d wager that “marketing a side business” complaints more or less never happen. If they do happen, it’s probably because the driver was particularly aggressive or annoying about promoting their business (or the passenger was idiosyncratically touchy). It certainly doesn’t seem to be against Uber’s policy—an Uber spokesperson said recently that Uber not only condones it, but encourages it.

Being subject to a reputation system is certainly personally costly to drivers—who likes being accountable?—but it’s not clear to me that even drivers as a whole should dislike them, so long as they apply to every driver. Bad experiences from things like poor driving or unclean vehicles are not just costly to passengers, but are also costly to other drivers, as they reduce the total demand for transportation services (NB: Chris Nosko & Steve Tadelis have a really nice paper quantifying the effects of these negative spillovers on other sellers, in the context of eBay). The problem with quality in the taxi industry historically is that competition doesn’t “work” to fix quality problems.

Competition can’t solve quality problems because a passenger only learns someone was bad after already having the bad experience. Because of the way taxi hails work, passengers can’t meaningfully harm the driver by taking their business elsewhere in the future, like they could with a bad experience at a restaurant. As such, the bad apple drivers don’t have incentives up front to be good or to improve. (The same also goes for the other problem of bad passengers, which there are and the reputation system helps deal with.) Reputation systems—while far from perfect—solve this problem.

While reputation systems seem like something only a computer-mediated platform like Uber and Lyft can have,  there’s no reason (other than cost) why regulated taxis couldn’t also start having reputation systems. Taxis could ask for passenger feedback in the car using the touch screen, and then use some of the advertising real estate outside the car to show average driver feedback scores to would-be passengers. This would probably be more socially useful than the usual NYC advertisements on top of yellow cabs, such as for gentleman’s clubs, e-cigarettes, and yellow cabs.

real_new_yorkers

Disclosure: I worked with their data science team in the summer of 2015. However, the direction of causality is that I wanted to work with Uber because they are amazing; I don’t think Uber is amazing because I worked for them.     

Sign-up to receive my research updates by email:

One weird trick for eye-balling a means comparison

Often, I’m in a seminar or reading a paper and I want to quickly see if the the difference in two means is likely to be due to chance or not. This comparison requires computing the standard error of the difference in means, which is SE(\Delta) = \sqrt{SE_1^2 + SE_2^2}, where SE_1 is the standard error of the first mean and SE_2 is the standard error of the second mean. (Let’s call the difference in means \Delta.)

Squaring and taking square roots in your head (or on paper for that matter) is a hassle, but if the two standard errors are about the same, we can approximate this as SE(\Delta) \approx \frac{3}{2} \times SE_1, which is a particularly useful approximation. The reason is that the 95% CI for \Delta is 4 \times SE(\Delta) = 6 SE_1 (i.e., 6 of our “original” standard errors).  As such, we can construct the 95% CI for the difference Greek-geometer style, by taking the origin CI, diving it into 4ths and then adding one more SE to each end.

The figure below illustrates the idea – we’re comparing A & B and so we construct a confidence interval for the difference between them, that is 6 SE’s in height. And we can easily see if that CI includes the top of B.

blog_post_se_trick

What if the SE’s are different?

Often the means we compare don’t have the same standard error, and so the above approximation would be poor. However, so long as the standard errors are not so different, we can compute a better approximation without any squaring or taking square roots.  One approximation for the true standard error that’s fairly easy to remember is:

\sqrt{SE_1^2 + SE_2^2} \approx \frac{3}{2}SE_1 + \frac{2}{3}(SE_1 - SE_2).

This is just the Taylor series approximation of the correct formula about SE_1 - SE_2 \approx 0 (and using \sqrt{2} \approx 3/2 and 1/\sqrt{2} \approx 2/3).

Monte Carlo Clusterjerk

Chris Blattman recently lamented reviewers asking him to cluster standard errors for a true experiment, which he viewed as incorrect, but had no citation to support his claim. It seems intuitive to me that Chris is right (and everyone commenting on his blog post agreed), but no one could point to something definitive.

I asked on Twitter whether a blog post with some simulations might help placate reviewers and he replied “beggars can’t be choosers”—and so here it is. My full code is on github.

To keep things simple, suppose we have a collection of individuals that are nested in groups, indexed by g. For some outcome of interest y, there’s a individual-specific effect, \epsilon and a group-specific effect, \eta. This outcome also depends on whether a binary treatment has been applied (status indicated by W), which has an effect size of \beta.

y = \beta \times W + \eta_g + \epsilon

We are interested in estimating \beta and correctly reporting the uncertainty in that estimate.

First, we need create a data set with a nested structure. The R code below does this, with a few things hard-wired: the \eta and \epsilon are both drawn from a standard normal and the probability of treatment assignment is 1/2. Note that the function takes a boolean parameter randomize.by.group that lets us randomize by group instead of by individual. We can specify the sample size, the number of groups and the size of the treatment effect.

This function returns a data frame that we can analyze. Here’s an example of the output. Note that for two individuals with the same group assignment, the \eta term is the same, but that the treatment varies within groups.

Now we need a function that simulates us running an experiment and analyzing the data using a simple linear regression of the outcome on the treatment indicator. This function below returns the estimate, \hat{\beta} and the standard error, SE(\hat{\beta}) from one “run” of an experiment:

Let’s simulate running the experiment a 1,000 times (NB If the “%>%” notation looks funny to you— I’m using the magrittr package)

The standard error also has a sampling distribution but let’s just take the median value from all our simulations:

If we compare this to the standard deviation of our collection of \hat{\beta} point estimates, we see the two values are nearly identical (which is good news):

If we plot the empirical sampling distribution of \hat{\beta} and label the 2.5% and 97.5% percentiles as well as the 95% CI (constructed using that median standard error) around the true \beta, the two intervals are right on top of each other:

comparison

Code for the figure above:

Main takeaway: Despite the group structure, the plain vanilla OLS run with data from a true experiment returns the correct standard errors (at least for the parameters I’ve chosen for this particular simulation).

What if we randomize at the group level but don’t account for this group structure?

At the end of his blog post, Chris adds another cluster-related complaint:

Reviewing papers that randomize at the village or higher level and do not account for this through clustering or some other method. This too is wrong, wrong, wrong, and I see it happen all the time, especially political science and public health.

Let’s redo the analysis but change the level of randomization to group and see what happens if we ignore this level of randomization change. As before, we simulate and then compare the median standard error we observed from our simulations to the standard deviation of the sampling distribution of our estimated treatment effect:

The OLS standard errors are (way) too small—the median value from OLS is still about 0.08 (as expected) but the sampling distribution of the estimated treatment effect is 0.45. The resultant CIs looks like this:

no_cluster

Eek. Here are two R-specific fixes, both of which seem to work fine. First, we can use a random effects model (from the lme4 package):

random_effects

or we can cluster standard errors. The package I use for this is lfe, which is really fantastic. Note that you put the factor you want to cluster by in the 3rd position following the formula:

clustering_fix

One closing thought, a non-econometric argument why clustering can’t be necessary for a true experiment with randomization at the individual level: for *any* experiment, presumably there is some latent (i.e., unobserved to the researcher) grouping of the data such that the errors within that group are correlated with each other. As such, we could never use our standard tools for analyzing experiments to get the right standard errors if taking this latent grouping into account was necessary.