Monday, December 28, 2015

Secret Data

On replication in economics. Just in time for bar-room discussions at the annual meetings.
"I have a truly marvelous demonstration of this proposition which this margin is too narrow to contain." -Fermat
"I have a truly marvelous regression result, but I can't show you the data and won't even show you the computer program that produced the result" - Typical paper in economics and finance.
The problem 

Science demands transparency. Yet much research in economics and finance uses secret data. The journals publish results and conclusions, but the data and sometimes even the programs are not available for review or inspection.  Replication, even just checking what the author(s) did given their data, is getting harder.

Quite often, when one digs in, empirical results are nowhere near as strong as the papers make them out to be.

  • Simple coding errors are not unknown. Reinhart and Rogoff are a famous example -- which only came to light because they were honest and ethical and posted their data. 
  • There are data errors. 
  • Many results are driven by one or two observations, which at least tempers the interpretation of the results. Often a simple plot of the data, not provided in the paper, reveals that fact. 
  • Standard error computation is a dark art, producing 2.11 t statistics and the requisite two or three stars suspiciously often. 
  • Small changes in sample period or specification destroy many "facts."  
  • Many regressions involve a large set of extra right hand variables, with no strong reason for inclusion or exclusion, and the fact is often quite sensitive to those choices. Just which instruments you use and how to transform variables changes results. 
  • Many large-data papers difference, difference differences, add dozens of controls and fixed effects, and so forth, throwing out most of the variation in the data in the admirable quest for cause-and-effect interpretability. Alas, that procedure can load the results up on measurement errors, or slightly different and equally plausible variations can produce very different results. 
  • There is often a lot of ambiguity in how to define variables,  which proxies to use, which data series to use, and so forth, and equally plausible variations change the results.

I have seen many examples of these problems, in papers published in top journals. Many facts that you think are facts are not facts. Yet as more and more papers use secret data, it's getting harder and harder to know.

The solution is pretty obvious: to be considered peer-reviewed "scientific" research, authors should post their programs and data. If the world cannot see your lab methods, you have an anecdote, an undocumented claim, you don't have research. An empirical paper without data and programs is like a theoretical paper without proofs.


Faced with this problem, most economists jump to rules and censorship. They want journals to impose replicability rules, and refuse to publish papers that don't meet those rules. The American Economic Review has followed this suggestion, and other journals such as the Journal of Political Economy, are following.

On reflection, that instinct is a bit of a paradox. Economists, when studying everyone else, by and large value free markets, demand as well as supply, emergent order, the marketplace of ideas, competition, entry, and so on, not tight rules and censorship. Yet in running our own affairs, the inner dirigiste quickly wins out. In my time at faculty meetings, were few problems that many colleagues did not want to address by writing more rules.

And with another moment's reflection (much more below), you can see that the rule-and-censorship approach simply won't work.  There isn't a set of rules we can write that assures replicability and transparency, without the rest of us having to do any work. And rule-based censorship invites its own type I errors.

Replicability is a squishy concept -- just like every other aspect of evaluating scholarly work. Why do we think we need referees, editors, recommendation letters, subcommittees, and so forth to evaluate method, novelty, statistical procedure, and importance, but replicability and transparency can be relegated to a set of mechanical rules?


So, rather than try to restrict supply and impose censorship, let's work on demand.  If you think that replicability matters, what can you do about it? A lot:
  • When a journal with a data policy asks you to referee a paper, check the data and program file. Part of your job is to see that this works correctly. 
  • When you are asked to referee a paper, and data and programs are not provided, see if data and programs are on authors' websites. If not, ask for the data and programs. If refused, refuse to referee the paper. You cannot properly peer-review empirical work without seeing the data and methods. 
  • I don't think it's necessary for referees to actually do the replication for most papers, any more than we have to verify arithmetic. Nor, in my view, do we have to dot is and cross t's on the journal's policy, any more than we pay attention to their current list of referee instructions. Our job is to evaluate whether we think the authors have done an adequate and reasonable job,  as standards are evolving, of making the data and programs available and documented. Run a regression or two to let them know you're looking, and to verify that their posted data actually works. Unless of course you smell a rat, in which case, dig in and find the rat. 
  • Do not cite unreplicable articles. If editors and referees ask you to cite such papers, write back "these papers are based on secret data, so should not be cited." If editors insist, cite the paper as "On request of the editor, I note that Smith and Jones (2016) claim x. However, since they do not make programs / data available, that claim is not replicable."  
  • When asked to write a promotion or tenure letter, check the author's website or journal websites of the important papers for programs and data. Point out secret data, and say such papers cannot be considered peer-reviewed for the purposes of promotion. (Do this the day you get the request for the letter. You might prompt some fast disclosures!)  
  • If asked to discuss a paper at a conference, look for programs and data on authors' websites. If not available, ask for the data and programs. If they are not provided, refuse. If they are, make at least one slide in which you replicate a result, and offer one opinion about its robustness. By example, let's make replication routinely accepted. 
  • A general point: Authors often do not want to post data and programs for unpublished papers, which can be reasonable. However, such programs and data can be made available to referees, discussants, letter writers, and so forth, in confidence. 
  • If organizing a conference, do not include papers that do not post data and programs. If you feel that's too harsh, at least require that authors post data and programs for published papers and make programs and data available to discussants at your conference. 
  • When discussing candidates for your institution to hire, insist that such candidates disclose their data and programs. Don't hire secret data artists. Or at least make a fuss about it. 
  • If asked to serve on a committee that awards best paper prizes, association presidencies, directorships, fellowships or other positions and honors, or when asked to vote on those, check the authors' websites or journal websites. No data, no vote. The same goes for annual AEA and AFA elections. Do the candidates disclose their data and programs? 
  • Obviously, lead by example. Put your data and programs on your website. 
  • Value replication. One reason we have so little replication is that there is so little reward for doing it. So, if you think replication is important, value it. If you edit a journal, publish replication studies, positive and negative. (Especially if your journal has a replication policy!) When you evaluate candidates, write tenure letters, and so forth, value replication studies, positive and negative. If you run conferences, include a replication session. 
In all this, you're not just looking for some mess on some website, put together to satisfy the letter of a journal's policy. You're evaluating whether the job the authors have done of documenting their procedures and data rises to the standards of what you'd call replicable science, within reason, just like every other part of your evaluation.

Though this issue has bothered me a long time, I have not started doing all the above. I will start now.

Here, some economists I have talked to jump to suggesting a call to coordinated action. That is not my view

I think this sort of thing can and should emerge gradually, as a social norm. If a few of us start doing this sort of thing, others might notice. They think "that's a good idea," and start doing it too. They also may feel empowered to start doing it. The first person to do it will seem like a bit of a jerk. But after you read three or four tenure letters that say "this seems like fine research, but without programs and data we won't really know," you'll feel better about writing that yourself. Like "would you mind putting out that cigarette."

Also, the issues are hard, and I'm not sure exactly what is the right policy.  Good social norms will evolve over time to reflect the costs and benefits of transparency in all the different kinds of work we do.

If we all start doing this, journals won't need to enforce  long rules. Data disclosure will become as natural and self-enforced part of writing a paper as is proving your theorems.

Conversely, if nobody feels like doing the above, then maybe replication isn't such a problem at all, and journals are mistaken in adding policies.

Rules won't work without demand

Journals are treading lightly, and rightly so.

Journals are competitive too. If the JPE refuses a paper because the author won't disclose data, and the QJE publishes it, the paper goes on to great acclaim, wins its author the Clark medal and the Nobel Prize, then the JPE falls in stature and the QJE rises. New journals will spring up with more lax policies. Journals themselves are a curious relic of the print age. If readers value empirical work based on secret data, academics will just post their papers on websites, working paper series, ssrn, repec, blogs, and so forth.

So if there is no demand, why restrict supply? If people are not taking the above steps on their own -- and by and large they are not -- why should journals try to shove it down authors' throats?

Replication is not an issue about which we really can write rules. It is an issue -- like all the others involving evaluation of scientific work -- for which norms have to evolve over time and users must apply some judgement.

Perfect, permanent replicability is impossible. If replication is done with programs that access someone else's database, those databases change and access routines change. Within a year, if the programs run at all, they give different numbers. New versions of software give different results. The best you can do is to  freeze the data you actually use, hosted on a virtual machine that uses the same operating system, software version, and so on. Even that does not last forever. And no journal asks for it.

Replication is a small part of a larger problem, data collection itself.  Much data these days is collected by hand, or scraped by computer. We cannot and should not ask for a webcam or keystroke log of how data was collected, or hand-categorized. Documenting this step so it can be redone is vital, but it will always be a fuzzy process.

In response to "post your data," authors respond that they aren't allowed to do so, and journal rules allow that response. You have only to post your programs, and then a would-be replicator must arrange for access to the underlying data.  No surprise, very little replication that requires such extensive effort is occurring.

And rules will never be enough.

Regulation invites just-within-the-boundaries games. Provide the programs, but no poor documentation.  Provide the data with no headers. Don't write down what the procedures are. You can follow the letter and not the spirit of rules.

Demand invites serious effort towards transparency. I post programs and data. Judging by emails when I make a mistake, these get looked at maybe once every 5 years. The incentive to do a really good job is not very strong right now.

Poor documentation is already a big problem. My modal referee comment these days is "the authors did not write down what they did, so I can't evaluate it." Even without posting programs and data, the authors simply don't write down the steps they took to produce the numbers. The demand for such documentation has to come from readers, referees, citers, and admirers, and posting the code is only a small part of that transparency.

A hopeful thought: Currently, one way we address these problems is by endless referee requests for alternative procedures and robustness checks.  Perhaps these can be answered in the future by "the data and code are online, run them yourself if you're worried!"

I'm not arguing against rules, such as the AER has put in. I just think that they will not make a dent in the issue until we economists show by our actions some interest in the issue.

Proprietary data, commercial data, government data. 

Many data sources explicitly prohibit public disclosure of the data. Disclosing such secret data remains beyond the current journal policies, or policies that anyone imagines asking journals to impose. Journals can require that you post code, but then a replicator has to arrange for access to the data. That can be very expensive, or require a coauthor who works at the government agency. No surprise, such replication doesn't happen very often.

However, this is mostly not an insoluble problem, as there is almost never a fundamental reason why the data needed for verification and robustness analysis cannot be disclosed. Rules and censorship is not strong enough to change things. Widespread demand for transparency might well be.

To substantiate much research, and check its robustness to small variations in statistical method,  you do not need full access to the underlying data. An extract is enough, and usually the nature of that extract makes it useless for other purposes.

The extract needed to verify one paper is usually useless for writing other papers. The terms for using posted data could be, you cannot use this data to publish new original work, only for verification and comment on the posted paper.  Abiding by this restriction is a lot easier to police than the current replication policies.

Even if the slice of data needed to check a paper's results cannot be public, it can be provided to referees or discussants, after signing a stack of non-use and non-disclosure agreements. (That is a less-than-optimal outcome of course, since in the end real verification won't happen unless people can publish verification papers.)

Academic papers take 3 to 5 years or more for publication. A 3 to 5 year old slice of data is useless for most purposes, especially the commercial ones that worry data providers.

Commercial and proprietary (banks) data sets are designed for paying customers who want up-to-the-minute data. Even CRSP data, a month old, is not much used commercially, because traders need up to the minute data useful for trading.  Hedge fund and mutual fund data is used and paid for by people researching the histories of potential investments. Two-year old data is useless to them -- so much so that getting the providers to keep old slices of data to overcome survivor bias is a headache.

In sum, the 3-5 year old, redacted, minimalist small slice of data needed to substantiate the empirical work in an academic paper are in fact seldom a substantial threat to the commercial, proprietary, or genuine privacy interest of the data collectors.

The problem is fundamentally about contracting costs. We are in most cases secondary or incidental users of data, not primary customers. Data providers' legal departments don't want to deal with the effort of writing contracts that allow disclosure of data that is 99% useless but might conceivably be of value or cause them trouble.  Both private and government agency lawyers naturally adopt a CYA attitude by just saying no. 

But that can change.  If academics can't get a paper conferenced, refereed, read and cited with secret data,  if they can't get tenure, citations, or a job on that basis, the academics will push harder.  Our funding centers and agencies (NSF)  will allocate resources to hire some lawyers. Government agencies respond to political pressure.  If their data collection cannot be used in peer-reviewed research, that's one less justification for their budget. If Congress hears loudly from angry researchers who want their data, there is a force for change. But so long as you can write famous research without pushing, the apparently immovable rock does not move. 

The contrary argument is that if we impose these costs on researchers, then less research will be done, and valuable insights will not benefit society. But here you have to decide whether research based on secret data is really research at all. My premise is that, really, it is not, so the social value of even apparently novel and important claims based on secret data is not that large. 

Clearly, nothing of this sort will happen if journals try to write rules, in a profession in which nobody is taking the above steps to demand replicability. Only if there is a strong, pervasive, professional demand for transparency and replicability will things change.

Author's interest 

Authors often want to preserve their use of data until they've fully mined it. If they put in all the effort to produce the data, they want first crack at the results.

This valid concern does not mean that they cannot create redacted slices of data needed to substantiate a given paper. They can also let referees and discussants access such slices, with the above strict non-disclosure and agreement not to use the data.

In fact, it is usually in authors' interest to make data available sooner rather than later. Everyone who uses your data is a citation. There are far more cases of authors who gained notoriety and long citation counts from making data public early then there are of authors who jealously guarded data so they would get credit for the magic regression that would appear 5 or more years after data collection.

Yet this property right is up to the data collector to decide. Our job is to say "that's nice, but we won't really believe you until you make the data public, at least the data I need to see how you ran this regression." If you want to wait 5 years to mine all the data before making it public, then you might not get the glory of "publishing" the preliminary results. That's again why voluntary pressure will work, and rules from above will not work.


One  empiricist who I talked to about these issues does not want to make programs public, because he doesn't want to deal with the consequent wave of emails from people asking him to explain bits of code, or claiming to have found errors in 20-year old programs.

Fair enough. But this is another reason why a loose code of ethics is better than a set of rules for journals.

You should make a best faith effort to document code and data when the paper is published. You are not required to answer every email from every confused graduate student for eternity after that point. Critiques and replication studies can be refereed in the usual way, and must rise to the usual standards of documentation and plausibility.

Why replication matters for economics 

Economics is unusual. In most experimental sciences, once you collect the data, the fact is there or not. If it's in doubt, collect more data. Economics features large and sophisticated statistical analysis of non-experimental data. Collecting more data is often not an option, and not really the crux of the problem anyway. You have to sort through the given data in a hundred or more different ways to understand that a cause and effect result is really robust. Individual authors can do some of that -- and referees tend to demand exhausting extra checks. But there really is no substitute for the social process by which many different authors, with different priors, play with the data and methods.

Economics is also unusual, in that the practice of redoing old experiments over and over, common in science, is rare in economics. When Ben Franklin stored lighting in a condenser, hundreds of other people went out to try it too, some discovering that it wasn't the safest thing in the world. They did not just read about it and take it as truth. A big part of a physics education is to rerun classic experiments in the lab. Yet it is rare for anyone to redo -- and question -- classic empirical work in economics, even as a student.

Of course everything comes down to costs. If a result is important enough, you can go get the data, program everything up again, and see if it's true.  Even then, the question comes, if you can't get x's number, why not?  It's really hard to answer that question without x's programs and data. But the whole thing is a whole lot less expensive and time consuming, and thus a whole lot more likely to happen, if you can use the author's programs and data.

Where we are 

The American Economic Review has a strong data and programs disclosure policy. The JPE adopted the AER data policy. A good John Taylor blog post on replication and the history of the AER policy. The QJE has decided not to; I asked an editor about it and heard very sensible reasons. Here is a very good review article on data policies at journals by By Sven Vlaeminck

The AEA is running a survey about its journals, and asks some replication questions. If you're an AEA member, you got it. Answer it. I added to mine, "if you care so much about replication, you should show you value it by routinely publishing replication articles."

How is it working? The Report on the American Economic Review Data Availability Compliance Project
All authors submitted something to the data archive. Roughly 80 percent of the submissions satisfied the spirit of the AER’s data availability policy, which is to make replication and robustness studies possible independently of the author(s). The replicated results generally agreed with the published results. There remains, however, room for improvement both in terms of compliance with the policy and the quality of the materials that authors submit
However, Andrew Chang and Phillip Li disagree, in the nicely titled "Is Economics Research Replicable? Sixty Published Papers from Thirteen Journals Say `Usually Not'"
We attempt to replicate 67 papers published in 13 well-regarded economics journals using author-provided replication files that include both data and code. ... Aside from 6 papers that use confidential data, we obtain data and code replication files for 29 of 35 papers (83%) that are required to provide such files as a condition of publication, compared to 11 of 26 papers (42%) that are not required to provide data and code replication files. We successfully replicate the key qualitative result of 22 of 67 papers (33%) without contacting the authors. Excluding the 6 papers that use confidential data and the 2 papers that use software we do not possess, we replicate 29 of 59 papers (49%) with assistance from the authors. Because we are able to replicate less than half of the papers in our sample even with help from the authors, we assert that economics research is usually not replicable. 
I read this as confirmation that replicability must come from a widespread social norm, demand, not journal policies.

The quest for rules and censorship reflects a world-view that once we get procedures in place, then everything published in a journal will be correct. Of course, once stated, you know how silly that is. Most of what gets published is wrong. Journals are for communication. They should be invitations to replication, not carved in stone truths.  Yes, peer-review sorts out a lot of complete garbage, but the balance of type 1 and type 2 errors will remain.

A few touchstones:

Mitch Petersen tallied up all papers in the top finance journals for 2001–2004. Out of 207 panel data papers, 42% made no correction at all for cross-sectional correlation of the errors.  This is a fundamental error, that typically cuts standard errors by as much as a factor of 5 or more. If firm i had an unusually good year, it's pretty likely firm j had a good year as well. Clearly, the empirical refereeing process is far from perfect, despite the endless rounds of revisions they typically ask for. (Nowadays the magic wand "cluster" is waved over the issue. Whether it's being done right is a ripe topic for a similar investigation.)

"Why Most Published Research Findings are False"  by John Ioannidis. Medicine, but relevant

A link on the  controversy on replicability in psychology

There will be a workshop on replication and transparency in economic research following the ASSA meetings in San Francisco

I anticipate an interesting exchange in the comments. I especially more links to and summaries of existing writing on the subject

Update On the need for a replication journal by Christian Zimmermann
There is very little replication of research in economics, particularly compared with other sciences. This paper argues that there is a dire need for studies that replicate research, that their scarcity is due to poor or negative rewards for replicators, and that this could be improved with a journal that exclusively publishes replication studies. I then discuss how such a journal could be organized, in particular in the face of some negative rewards some replication studies may elicit.
But why is that better than a dedicated "replication" section of the AER, especially if the AEA wants to encourage replication? I didn't see an answer, though it may be a second best proposal given that the AER isn't doing it.

Update 2

A second blog post on this topic, Secret Data Encore


  1. Replies
    1. Regarding the above question why a replication journal is better than a dedicated "replication" section of the AER: The AER has already published a number of replications. Like most journals that do they focus on replications of studies published by themselves. An independent replication journal could serve for replications of important studies published in any journal. Furthermore, it would not be subject to the conflict of interest editors face when deciding whether to publish replications that can be seen as calling into question their own previous editing decisions.

    2. Christian, you mention "poor or negative rewards for replicators" ... can you describe what those are? Thanks.

  2. In the field of psychology they have "The Reproducibility Project: Psychology".

    I don't understand how you can use the same data and the same program and have different outcomes. It seems to violate some sort of law of thermodynamics.

    Even more baffling is how the author can't help reproduce the outcome with their own data and program. Were these papers from the University of East Anglia?

    1. One of my worst experiences in economic consulting was discovering, at about midnight the night before it was due, that our programs and data were unable to reproduce the tables in our expert witness report for a major antitrust litigation case. Ultimately it came down to an error in the data cleaning process from some transactions records, where we had to arbitrarily drop some near-duplicates, and the sort order had changed. We found and fixed it around 5am, I think, but it was quite a scary night till then.

      Having to submit our backup to an adversarial team of economists who would themselves work round the clock to pick it apart, trying to contradict and discredit our analysis, really had a way of sharpening our minds. We made damn sure our analysis was replicable and robust.

      Then again, the opposing experts always seemed to reach the extreme opposite conclusion about the economic facts of the case, using the exact same data and training as our economists, which itself reflects pretty badly on our "science"...

  3. Good musings, John.

    Even a snippet of household tax data or identified trading data may be impossible to provide without a nightmare of NDAs. Data about rapidly evolving economic settings may be closely guarded, and it would be a loss if such data is categorically disallowed under your criteria.

    Perhaps a middle ground can be struck for confidential data. Authors using such data must obtain written (and public) consent for semi-public access of the source data after an embargo period, with the understanding that violation of this policy may trigger a retraction down the road. Whether the novelty of the setting is sufficiently compelling to be granted this exception can be left to the journal editors. Retractions are costly enough that authors would put serious effort into ensuring that these data are made available for replication and extension of the original work.

    1. Trade secrets aren't the barrier for lots of data. For example, in health care, HIPAA bars pretty much any data sharing, no matter how long you wait, because there are serious privacy concerns.

  4. So, how does an economist deal with data when risk is mispriced and bubbles are created by the Fed, and ultimately destroyed by the Fed. This article from Tyler Durden shows that bubbles are created by the Fed (and the housing bubble as well, IMO). and then I show how the Fed ingnored the LIBOR crisis and just let LIBOR cross the Swaps line, with the chart on the last page of the article:

    1. Your argument doesn't follow your premise. You ask how one deals with data, when a phenomena that (if exists as you suggest) should exist in the data. The data itself, assuming it is collected properly, is simply a series of numerical series we hope represents a data-generating process in reality. Within Financial Economics at least, this is in some sense nice, because financial data is in some sense the stock market itself.

      So if there are issues or whatever as you suggest, then find it in the data. Your argument doesn't imply the data itself is somehow wrong.

    2. What I am saying is that fraud renders the data void. What looks like supply and demand is really mispriced risk. So, there is demand, but it is manipulated by mispriced risk.

      By the way, I mentioned "grumpy" in the article. :)

  5. Just wondering - what is the most common program used for academic work? R runs on a single cpu, so large enough data sets usually take way too long. I would think something like SAS or matlab or Stata are the most common, but interested in hearing what the general standard is.

    1. My coauthors and I are dealing with a revision (for an economics paper) in which one reviewer said "The reader does not care what software package is used to get
      the results." In contrast, in my public health papers, we _always_ include a statement of which package and version was used to produce the results (FYI, Stata 14.1). Omission of that detail would attract criticism.

    2. SAS and Stata are the most common (by far) for econometrics. Many prefer Matlab for quantitative finance & simulation. In the industry, a lot of shops that need high end quant are moving to R for risk simulations, but it depends on where the people were trained. C++ is also reasonably common for quant.

    3. The ReplicationWiki gives information about software used for studies that were replicated or were published in the American Economic Review, the Journal of Political Economy, and the four American Economic Journals. Stata was used 907 times, MATLAB 278, SAS 64. An overview of different software is linked from the main page.

  6. I might add to the list of prescriptions that coauthors check the empirical work of their other coauthors! This is simple and gets around many of the issues with sharing secret data.

  7. It is perhaps a remarkable coincidence that the quote of the day in "The Economist" Espresso App today was an appropriate relative to this blog post quip from Ronald Coase that, "If you torture the data long enough, Nature will always confess."

  8. I absolutely agree with this and will follow the suggested "demand rules" as much as I can. Also, nice to know about the post-ASSA workshop--I will try to attend it.

  9. Good post.

    Here's an even troubling fact. I can't think of any tax-subsidized "non-partisan" organisations that make available to the public the "models" they use to produce "studies" that profoundly affect public policy and elections. Publishing broad stroke "methodology" descriptions just does not cut it.

    Case in point:

    Can you replicate the results of the analyses of tax reform proposals, such as those published here?

    The case here is much stronger than for "academic papers". The public investment and interest is too great for these models to be beyond reach of ordinary citizens. Not only the results but also the tools created from tax-subsidized funds should be a public good. In this case, the election stakes are too high for this material to be "proprietary". A little more sunshine here would be good for everyone, exempt the narrow entrenched interests behind these "not-for-profit", "public interest" outfits.

    Let's have some real climate change.

  10. I work in operational research, which shares much in common with economics. We have found that developing in R creates an open source work environment which makes replication much easier. Since migrating to R, our group has become more confident in our results and asking a colleague for a sanity check becomes much less of a hassle. In my conversation with economists, I have noticed that R is slowly making its way into the standard toolset. That trend might make the task of replication somewhat easier in the future.

  11. Bravo!

    And Happy New Year!

  12. All good points; I'd add that these things are double and triple-true for papers cited in applied policy (i.e. regulatory impact analyses, but anything cited by government as the support for a policy decision). More use of R definitely seconded; great for open-source replication and dissemination of results and procedures.

    For purely academic work, my only concern is variation in incentives. I'm a grad student and there's no incentive to release unique data right away. If it's useful, it will be used, quickly, by smarter scholars with better resources. We don't all have armies of research assistants to clean and code datasets, generate new methodological software, etc. People at tier-1 departments have these things, and will quickly eat up all the value in a dataset.

    Maintaining that property right for a few years can be the difference between getting an academic job and getting a few throw-away citations from the people who get the main value out of the data. Nature of academic publishing is to reward people who specialize in using other people's data, for better or for worse.

    1. That's part of the problem: people are hired given their publication, not following how often their names are mentionned in footnotes. Also, as you said, we aren't all equally equiped, which can at times be unfair. Free-ridding does suck.

  13. Excellent post. You might find the following website of interest: The goal of The Replication Network is to encourage economists and their journals to publish replications.

    1. Prof. Dave Giles pointed me to that in response to my question of his opinion of this subject.

  14. Thank you professor. Happy New Year!

  15. I agree with most of your points. However, I find myself in a difficult dilemma. Having spent my career promoting the use of field experiments in economics, I made the switch from a tenured university job to a job as a research scientist in a for-profit firm, with the intention of liberating some data to be able to publish results in scientific journals. I decided that some questions were very difficult to answer without going inside a firm.

    I've discovered that the tech companies I've worked for (Yahoo, Google, Pandora) typically are considerably more worried about consumer privacy than they are about corporate secrets. Therefore, I have yet to get approval to share any of my corporate data sets externally after many, many hours of trying. The corporate lawyers are too worried that some privacy violation could accidentally happen with a publicly available dataset (as AOL discovered when it released some search-query logs to academics 10 years ago). I have been able to publish some of my results, and I do think they are good science. I am scrupulous about checking my results for robustness and about being conservative with my standard errors. I discard any experiment that has serious execution errors, which is typically about 80% of experiments. However, I have not been able to share the data, and my employment contracts have all stressed the importance of data-privacy rules. Do you really think this makes my work non-scientific? Can you imagine doing something else in my position?

    Options I see include:
    1. Continue to do science with the understanding that I often will manage to publish results without being able to share the data.
    2. Fight harder with my employers, in what I perceive to be a losing battle. Also, note that this privacy problem is not totally unique to working within a firm. Human-subjects committees in universities often require researchers to have a plan for destroying data after the research is over, for the same privacy reasons that firms worry about. I think these privacy concerns are very overblown, and I think scientific transparency is really important, but I am stuck between a rock and a hard place.
    3. Give up on the idea of studying consumer behavior by doing field experiments within firms, and instead go back to doing small-scale field experiments that I can implement myself as a university professor. This will cause me to give up on contributing to certain areas of inquiry, such as the causal effects of digital advertising.

    1. I'm a research economist in a government agency and I'm in the same boat as David. While it is possible for other researchers to obtain access to the confidential micro data I use for much of my research for replication studies, such access requires a lengthy application process.

      Government agencies are very concerned about personal privacy, confidentiality, and public trust in institutions to protect data collected. The concerns reflect the concerns of the firms and individuals whose data they hold. The statistical agencies in particular cannot (by law) allow access to data that would harm the companies or individuals who have provided the data. These concerns aren't going to go away, and decades of researchers complaining about the difficulties in accessing government micro data for research have not changed these fundamental facts.

      If journals required that papers using government micro data provided this micro data to researchers at no transaction cost (no application process, no security checks, no inconvenience of getting to a secure location to access the data), the most likely outcome is that NSF/NIA/NIH would stop funding research using such data, and these papers would just disappear.

      I agree with your general point that replication is a good thing for the profession. But I think I would rather live in a second-best world in which access to government data had hurdles, but was still possible, than a world in which these rich sources of data for research were completely barred to the profession.

    2. If other researchers also have the data, there is no problem, especially as that group gets larger. One of those other researchers can be (and is in fact already likely to be) a referee of your paper at a journal. You should make it as easy as you possibly can for one of those researchers to be able to replicate your work.

      Not absolutely every paper needs to be replicated, but key results should be replicated often and the results should be shared. Groups of independent researchers can form that replicate (and in a sense authenticate) each other works. Norms can be developed around coding and data organization to make this replication easier.

      I think it all starts with just wanting this to happen.

  16. tl;dr

    It is the responsibility of the writer to provide access to his data and his methods. It is the responsibility of the reader to reject the work if these are not provided.

  17. Reproducibility is arguably a complex parameter to verify the credibility of research. Some disciplines have attempted to provide guidelines that could help researchers avoid irreproducibility, such as the five-sigma rule. But we need to remember that a study that is reproducible need not mean that its claim is correct. I've discussed this issue in detail here:

  18. Federal agencies with more that $100 million in annual research funding are now required to develop plans to increase public access to data and publications. See in general and for a status update with URLs (but in a scanned PDF, so not live)

  19. On only publishing statistically significant results, this recent paper is interesting:

    Brodeur, Lé, Sangnier, and Zylberberg. 2016. "Star Wars: The Empirics Strike Back." AEJ: Applied Economics.

    Abstract: Using 50,000 tests published in the AER, JPE, and QJE, we identify a residual in the distribution of tests that cannot be explained solely by journals favoring rejection of the null hypothesis. We observe a two-humped camel shape with missing p-values between 0.25 and 0.10 that can be retrieved just after the 0.05 threshold and represent 10-20 percent of marginally rejected tests. Our interpretation is that researchers inflate the value of just-rejected tests by choosing "significant" specifications. We propose a method to measure this residual and describe how it varies by article and author characteristics.

  20. The problem in finance is that it is very top down. If John Cochrane gets asked a question about the code in a 2003 paper that he wrote, who can trust John Cochrane to not be vindictive? It is easy to preach from a perch. As a practical matter, people who question people at the top will remain few and far between.

  21. Cochrane is right. The hidden data problem was remarked by Rogoff et al in their book "This Time...". Also a solution below (require 99.7% confidence rather than the standard 95% confidence limit) - RL

    April 20, 2015 ...the Cross-Section of Expected Returns
    Campbell R. Harvey et al.
    Hundreds of papers and hundreds of factors attempt to explain the cross-section of expected returns. Given this extensive data mining, it does not make sense to use the usual criteria for establishing significance (I.E., t-stat of 2 or 95% confidence). What hurdle should be used for current research? Our paper introduces a new multiple testing framework and provides historical cutoffs from the first empirical tests in 1967 to today. A new factor needs to clear a much higher hurdle, with a t-statistic greater than 3.0 [i.e. about 99.7% confidence]. We argue that most claimed research findings in financial economics are likely false. Keywords: Risk factors, Multiple tests, Beta, HML, SMB, 3-factor model, Momentum, Volatility, Skewness, Idiosyncratic volatility, Liquidity, Bonferroni, Factor zoo.

  22. Replication is essential and all journals in economics should adopt it like the AER did. But we also need evidence that the results are robust, pass most specification tests and the model predicts well when using a hold-out sample. Another malpractice (in finance journals) is when they report a table where one variable is omitted, another added etc. This is not eve =n reasonable. There are better ways to examine sensitivity.


Comments are welcome. Keep it short, polite, and on topic.

Thanks to a few abusers I am now moderating comments. I welcome thoughtful disagreement. I will block comments with insulting or abusive language. I'm also blocking totally inane comments. Try to make some sense. I am much more likely to allow critical comments if you have the honesty and courage to use your real name.