Author Archives: johnjhorton

Empirics in the very long-run

Much of my empirical work uses proprietary data from firms. I’m fully aware of the problem this creates for science—my work is, by and large, not (easily) reproducible. There are some things I do to try to enhance the credibility of my work despite this limitation, which I’ll save for another blog post, but I have an idea that I want to try and I think other similarly situated researchers should try: ask the data provider to agree to some release date in the future—potentially far in the future.

Research: Can I release this data next year?

Lawyer: No way.

Research: How about 5 years?

Lawyer: No.

How about 15?

Lawyer: Hmm (thinks they might be around in 15 years). Still no.

How about 40 years from now?

Lawyer: (50 year old lawyer contemplates own mortality) Uh, sure, OK.

Given my age, I’m not likely to be the one to work with this 40 year old data, but I’m pretty sure there will be empirical economists in 40 years who might like to revisit some aspect of the data I’ve worked with, hopefully with much more sophisticated methods and theories.  

How could you actually make this happen? Well, obviously picking a storage medium that can stand the test of time could be challenging. I’ve heard good things about M-DISC and I’ve been burning 100 GB back-ups with a burner I bought. The discs supposedly can last for 1,000+ years.

The second part is the software and data. It’s too much of a burden to rewrite all your code in some future proof language and picking winners will be hard anyway. I think the most promising approach is to just use whatever you normally use, but save not just your code and data, but all the dependencies for the whole OS. In other words, use something like Docker image of the code and data needed to produce you paper, with a script that orchestrates the production of the paper from raw data to final PDF. I feel pretty confident that 50 years from now, some variant of linux will still be in use that can run—or easily run a virtual environment that mimics what we use today.

In addition to the technical issue, there is also a social one, which is how do you get your data out there in 50 or 100 years? If the timeline is long but not too long, I think adding a section to your will with instructions to your executor could be sufficient. I’ve been working on a “research will” which is how I’d like my various projects wrapped up if I were to suddenly pass. If there were some time-related data releases, you could add sections to your will that assigned the data releasing to a young colleague or perhaps some institution that will be persistent (e.g., the head of the dept at your university). If the timeline is really long (say 100+ years), I don’t have a great answer, but perhaps a university library would be willing to take on this role. I’d be curious to hear other ideas on how to make sure your “time capsule” gets opened.    

Celebrating RJZ’s 50 years of teaching at Harvard

This weekend attended a celebration of Richard Zeckhauser’s 50 years of teaching at Harvard. The event started with two panels focusing on different aspects of Richard’s scholarship, followed by a dinner that included several speeches and a video tribute, with testimonials from friends, students, co-authors, and family. It was a wonderful event and it  gave me the urge to write a little bit about Richard. I’ve had the privilege of knowing Richard for over 10 years now, as he was my primary advisor in graduate school and co-author. I also TA’d his notoriously challenging course at the Kennedy School, API-301.  

As a graduate student, Richard gave me the freedom to pursue what I was interested in. It worked in part because he’s basically interested in everything. He had the patience for my bad ideas and the instinct to nurture the maybe-not-terrible ideas. What I recall most strongly was just how much he made time for his students. It wasn’t just quality time—it was quantity time. In graduate school, I saw him constantly. We’d get lunch weekly. He’d invite me to a Red Sox or Celtics games. Or we’d just grab a soda and talk in the afternoons.

Our time together was *always* interesting and frequently challenging. I learned at the dinner I was not the only one subject to his questions and quizzes. So many of our meetings would start with some version of “suppose you have a coin that comes up heads 45% of the time…” I always felt flattered he thought I was worth quizzing, even if I probably did not do much better than chance on the quizzes.    

When it came time for the job market, I think he spent more time thinking about my career than I did. And his thinking on *anything* is gold. When I face a hard question in life—how to handle some tricky professional or personal decision—I think “What would Richard do?” He’d collect information. Probe his assumptions. Come up with subjective probabilities. Consider the subjective utility from various states of the world. It’s amazing to experience his thinking and his passion for a rational, reasoned approach to decision-making. I think probably his most deeply held belief is that we can all make better decisions by being more analytical about our decisions.   

As one can imagine, the traits described above are part of why he’s such tremendous scholar. One of the reasons he’s been so prolific is how he operates as a co-author. If you send him a draft, you will, in short order, get 10 pages of typed, detailed, brilliant notes—very likely typed up in the middle of the night. If you can get him a word document—he’s not a LaTeX devotee—the tracked-changes edits to your writing could be the basis for a composition course on how to be a better writer.   

Although we have written several things together, I don’t have the breadth or the quickness of mind to be “Richard” in my research. I’m not sure anyone really can anymore. It’s a cliche, but that mold is probably broken. I think the best I or anyone can hope for is to bring some Zeckhauserian sensibilities to my work.

What are those Zeckhauserian sensibilities? To be clear. To be interesting. To care deeply about writing. To find great examples. To keep working and polishing until your thinking and writing are straight. To take economic theory seriously, but also not ignore what we can see in front of us. To be humble about what we know and don’t know.

I hope to live up to the example he’s set as an academic. There are a lot of joys to academia. But I think one of the greatest is to be a part of a great chain of teacher and student. I cherish the fact that I now have the chance to influence my students and pass on something of him. He’s a model for living a life of the mind—and more generally, for living a life well. 

Why I don’t like going to academic conferences

Academics go to academic conferences. I generally go to a few each year—always the AEA and NBER Summer Institute, usually SOLE (Society of Labor Economists) and INFORMS and ICIS (International Conference on Information Systems). I also typically go to a few one-off, non-annual conferences. I don’t like going to conferences—and have felt that way for a long time—but I never really thought about what it was I didn’t like (beyond the things that everyone dislikes about travel & missing their family).

I think the main reason I don’t like going to conferences is the research in aggregatge. Seeing up close the amount of work being done—and the knowledge that I could know about only a small fraction—and that most likely, only a small fraction of people will know about my work—always makes me melancholy. I get this at the AEA meeting especially, which is vast. It always give me this feeling of how limited our time is, and how much we will never know. Given the scarcity of attention and the somewhat artificial scarcity of journal pages, a conference is a salient reminder of the bad odds most research faces in terms of being noticed.   

As the old joke goes, somewhat modified—not only are the portions too large, the food is often terrible. Most presentations are bad—and not because the research is bad, but because giving good presentations is challenging. You have to summarize complex things to a diverse room of people, in a linear fashion. There are dangers everywhere: focus too much on details and robustness and your talk is boring; skip over details and you sow doubts about the work. Just getting the visuals on the slides right is an art that I feel like I’m still far away from mastering.

Part of the general badness of presentations is that the “medium” of presentations makes it harder for them to get better. Presentations seem disadvantaged relative to papers in terms of the potential for improvement. You are never in the audience for your own talk, and so you can only reflect ex post on how it went. In contrast, you can read your own paper. With a paper, I can iterate alone: I can write a bit, let it mellow for a bit, then read and revise. And so all by myself, I can improve my writing. Presentations, no so much.

In principle, the downsides of presentations would be balanced by some upsides. There are some—you can use tone of voice, pointing, etc. to convey more information. You can also make jokes and have some fun that you probably should keep out of your papers. One big upside is that they can be interactive—questions and discussions during seminars can be magic—but at big conferences, this rarely happens.

So what’s the solution? Well, I’m going to keep going to conferences – it’s part of the job. One thing I’d like to try is make some really high-quality screencast presentations. I bought Camtasia and I’ve played around with it a bit and it seems like, in principle, I could take a paper-like style of iteration and improvement to presentations (e.g, multiple takes, editing out bad parts and so on). The goal might be just to give better live talks, but it would also be interesting to see if screencasts could be halfway between live talks and papers as a means of scientific communication.

 

Extending geofacet to algorithmically generate grids from lat/long data

This morning I learned about the very cool geofacet R package for making geographic facets with ggplot2. Unfortunately, my cities aren’t one of the existing grids. There is a neat web-based tool for arranging cities, but after 5 minutes, I was feeling frustrated by my lack of relative geographical knowledge (“Is Chicago really south of Boston? Seems wrong.”).

Anyway, I forked it and wrote a function for generating new grids based on lat/long. The resultant grids are intended to be (a) small and (b) not violate any cardinal ordering (e.g., a city that is south of one city will never shown as strictly north, though they can be on the same row).

To see how it works, suppose we have a collection of 50 cities w/ lat and long data (here plotted at their actual position):

If we have n cities, we can always put them in an n \times n grid such that all cities are in the correct relative position to each other (by sorting east-west and north-south):

Now we want to start squishing this matrix to make it smaller. What my algorithm does is pick the long side and then find columns/rows of that side it can combine without causing two cities to occupy the same cell. Every time it finds a pair, it combines them in removes a row or column, as the case may be. The algorithm keeps doing this until it cannot squish the matrix down any more in either direction. Here’s an example squished version of the plot from above:

As it runs quickly, I just brute force this some number of times and take the most compact one. There’s definitely more that could be done, say relaxing some constraints for greater compactness. Even this one, I’d hand-tune it a bit, but it was already much better than what I could generate by hand, starting from scratch.

Here’s my code for this example:

Bartik instrument exploration in R

WordPress › Error

There has been a critical error on this website.

Learn more about troubleshooting WordPress.