27 September 2024 12:31

Training on reproducible methods in empirical economics

Reproducing empirical research results is important for validating research findings that often are used for evidence-based policy advice.

Therefore, some economic journals ask researchers for data and code. Some also run the code to replicate the empirical output in the refereed and then published articles.

Astrid Kunze and Mateusz Mysliwski organised two sessions on reproducibility methods held by Lars Vilhuber. The event was funded by the Department of Economics. Lars Vilhuber is Professor at Cornell university and Data Editor of the American Economics Association Journals. The workshop took place during the morning of 5 Sept. 2024.

Thanks to all colleagues and guests who participated. We have received feedback that participants found the workshop extremely useful, and appreciated all material and knowhow that Lars so generously shared. All of us learned lots of tips, and inspiration for making our own research reproducible.

Here comes a short summary of key take aways for those who were not able to attend.

The workshop consisted of two sessions. The first session was particularly useful for new researchers starting to work on empirical research. There were a lot of hands on material Lars generously shared in interactive slides. The second part was a more advanced part for all, and more food for thought for the future that has already started. Topics included how to work in large research groups and share versions of software, making it possible to replicate results with exactly the same version of STATA. And, how to ensure that code work pressing the one button.

Session 1: Reproducibility from Day 1

Journals require that you share your code and data in a replication package upon acceptance. However, efficient reproducibility starts at the beginning of the research project. Following some best practices from day 1 can not only help you prepare a replication package later, but also make you a more productive researcher. In this workshop, we start with an empty folder and finish with a mini-project about public procurement across various European countries, ready for submission to a journal. Together we discuss and document all the choices we make about data collection and analysis, in a way that can help future readers of our research. For advanced topics, see [Session 2].

Session 2: Advanced methods for self-verification of reproducibility

In this second session, data editors describe a few possible steps to self-check replication packages before submitting them to a journal. It is not meant to be exhaustive, and it is not meant to be prescriptive. There are many ways to construct a replication package, and many more to check that it works. Questions from the audience are encouraged. Participants are encouraged to also attend [Session 1].

Astrid talked to Lars after the morning sessions. There is already a lot of material online, and Lars and his team are developing the webpages, links, material, etc. So stay tuned.

His mission is to disseminate material and encourage local researchers to also hold workshops of this type.

Here are the takeaways for (any/young in spirit) researcher by Lars:

consider the future replicator, including your future self, from the start
keep track of data sources, of data terms of use/permissions/licenses from the start, when it is fresh in your mind - use the Template README to structure your notes on this
when evolving your project code, let it be at each point in time as reproducible as possible, as simple as feasible
allow for the fact that software and data may change over time, and understand how to insure yourself and your project against such changes (also document in the README)
use evidence of successful complete runs of your code, top to bottom (log files) at frequent intervals (but at least each time you share your results with others, even if just internally)

Further useful links

A template README for social science replication packages (Github)

A few possible steps to self-check replication packages before submitting them to a journal (Github)

More internal news