A workflow management system for reproducible data analyses

Why do you need pytask?

You’ve probably organized your project using a similar folder structure where each folder contains scripts carrying out specific tasks.

But, how do you execute all tasks or keep your project in sync?

How does pytask help you?

  • By defining dependencies and products of tasks, you implicitly define an execution order.
  • pytask validates this definition
  • and executes only tasks which need to be updated.

How do you define tasks, dependencies, and products?

Tasks are functions starting with task_. Use decorators to specify the dependencies and products of a task. Using depends_on and produces as function args, you access the paths to the files in the function body.

Execute a task

Type pytask in your terminal, and it will automatically collect and execute all tasks.

What are the benefits?

👉 Automation reduces errors and increases reproducibility.

👉 The build process is documented in code.

👉 You can iterate faster and be more productive.

Research

Is pytask used for actual research? Yes!

Here is a Covid-19 forecast project with an agent-based model, 10+ datasets, many different policy scenarios and 1,000+ simulations.

👉 https://arxiv.org/abs/2106.11129

Teaching

pytask is also part of a graduate course, teaching economists programming and best practices for research projects.

👉 https://github.com/OpenSourceEconomics/econ-project-templates

What else has pytask to offer?

Scale your project by repeating tasks! 🚀

For example, create ten different datasets with randomly generated data.

Customize pytask with plugins!

Templating

Start a new project from a template!

A minimal template: https://github.com/pytask-dev/cookiecutter-pytask-project

A template for reproducible economics projects: https://github.com/OpenSourceEconomics/econ-project-templates

Debugging

Enter the debugger if one of your tasks fails, and you want to find out why! 🏗️

Documentation

You can find out more about pytask in the documentation: https://pytask-dev.readthedocs.io/.

Follow the tutorials for a step-by-step introduction: https://pytask-dev.readthedocs.io/en/stable/tutorials/index.html

Ecosystem

pytask is also part of a more extensive ecosystem of research tools developed at @open_econ.

We will soon write about tools like estimagic, a package for complex numerical optimization, and estimation/calibration of scientific models.

Acknowledgements

Thanks for staying with me until the end! At last, some shout-outs to amazing people and projects.

Thanks to @kroehrl, @JanosGabler, and @econ_hmg, who helped me build this tool in endless and fruitful discussions! 🙇

Acknowledgements

pytask stands on the shoulders of these projects. Thank you!🙏