The New York Times has a story today about social scientists working with company data and being unable or unwilling to make it public. The story begins:
When scientists publish their research, they also make the underlying data available so the results can be verified by other scientists.
I think the first sentence is probably more a description of how we’d like the world to be than how it actually is right now, especially in the social sciences. The main so-what of the story is that private companies are collecting enormous amounts of high quality data that lets you do fascinating social science, but companies are understandably reluctant to make this data public, primarily for privacy reasons (and probably also because they are afraid of giving up some competitive advantage).
I think the options for any organization that does or might do research are:
1) Do research for business purposes. Make neither the findings nor the data public.
2) Do research for business purposes. Make the findings but not the full data public.
3) Do research for business purposes. Make the findings and data public.
4) Do research. Make findings and data public.
On (2) versus (3), I think there is a real dilemma: openness and privacy concerns are in tension. Furthermore, just releasing more aggregated or somehow obfuscated versions of the data is not risk free: there’s actually an emerging literature in Computer Science on how to release data in ways that are guaranteed to still have the right privacy properties (CMU UPenn professor Aaron Roth recently taught a course on the topic). The fact that smart people are working on it is exciting, since they might figure out provably risk-free ways to release data publicly, but it’s also evidence that this isn’t a trivially easy problem—seemingly innocuous data disclosures would let someone unravel the obfuscation.
The chairman of the conference panel — Bernardo A. Huberman, a physicist who directs the social computing group at HP Labs here — responded angrily. In the future, he said, the conference should not accept papers from authors who did not make their data public. He was greeted by applause from the audience.
Dear Dr. Horton:
Thank you for your interest in my work and I certainly feel pleased when I learn that you liked my paper enough to assign it to your class.
As to your request, let me talk with the person who now handles the youtube data (we lately used it to uncover the persistence paradox) and I’ll get back to you.
Incidentally if you are interested in the role that attention and status (its marker) play among people I could send you a paper that reports on a experiment (as opposed to observational data) that elucidates it quite cleanly across cultures.
Best,
Bernardo
I got the data within days—I can state that he privately practices what he preaches publicly.
Update: I incorrectly stated that Aaron Roth was a professor at CMU—he did his PhD at CMU. He’s a professor at UPenn. Apologies.