Facilitate reproducible research with cookiecutter-research-template

A research template for reproducible research.

A directed acyclic graph representing the workflow of a research project.

A lot of time has passed since this post. Meanwhile I wrote a tool called pytask – a replacement for Waf – which allows you to execute the workflows in research project. It also comes with two recommended templates.

In one of the university courses I was introduced to a Waf framework for reproducible research by Hans-Martin von Gaudecker which is amazingly useful to manage your research project.

The basic idea is that a project is structured as a DAG, a directed-acyclic graph. A DAG is a graph with a finite amount of node and edges where the edges have a specific direction leading from input to output files. Furthermore, starting at node $\nu$ and following the directed edges, it is not possible to find a way back to $\nu$.

Take a look at the picture at the top. get_simulation_draws.py is the starting point and the source of initial_locations.csv which is the input of … You get it.

You get the sample project by installing cookiecutter first with

$ pip install -U cookiecutter

Then, go to the directory which should contain the folder with the project and type

$ cookiecutter https://github.com/tobiasraabe/cookiecutter-research-template.git

Answer the following prompts so the project will be customized to your needs.

In the end, go into the project folder and set up the conda environment.

$ conda env create -n <project-name> -f environment.yml

At last, run the following command to make sure that the sample project works.

$ python waf.py distclean configure build

For more information on Waf read Gaudecker’s documentation and my Waf Tips & Tricks.

To use all features of the template check out the documentation.

Tobias Raabe
Tobias Raabe

I am a data scientist and programmer living in Hamburg.