Latest Posts:

  • Sep, 2020
  • Sep 30, 2020What I have been doing lately
  • Hi everybody,

    I have not written a post in a long time and so I resuming this blog by keeping you posted on some of the projects I have been working on.

    In general, I have been developing or contributing to several research applications in the last year. I learned a lot about software engineering and designing applications. Maybe I find the time to make a post on some of the things I learned along the way.

  • Aug, 2019
  • Aug 15, 2019Matplotlib for publications
  • Binder

    This article shows how to create plots with matplotlib for publications where fonts and font sizes match the LaTeX document and graphics are not blocky, but allow for infinite zooming.

  • Apr, 2019
  • Apr 14, 2019Numba - @vectorize and @guvectorize
  • Binder

    In this post, I will explain how to use the @vectorize and @guvectorize decorator from Numba. You can use the former if you want to write a function which extrapolates from scalars to elements of arrays and the latter for a function which extrapolates from arrays to arrays of higher dimensions.

  • Mar, 2019
  • Mar 25, 2019Numpy - Views vs. Copies
  • Binder

    In one of my recent projects, I needed to accelarate a discrete choice dynamic programming model. After I changed a part of the implementation, the program was indeed faster. But, the most expensive operation according to profiling with snakeviz was now ~:0(<method 'copy' of 'numpy.ndarray' objects>). I was puzzled. I was sure that there was no use of np.copy() at all. After reading some StackOverflow posts and blog entries, it became clear that some operations and more importantly indexing methods return copies instead of views. The difference between the two is that views refer to the same underlying data in memory whereas a copy creates a new object. The disadvantages of a copy are:

    • takes more time
    • takes more memory

    But, what operations return copies?

  • Jan, 2019
  • Jan 19, 2019The Roy Model
  • Binder

    The Roy model (Roy, 1951) provides the modern framework for modeling occupational choices as an earnings maximization problem.

  • Oct, 2018
  • Oct 17, 2018A time series course with Julia
  • Last semester, I took a time series course where we implemented some models like the Hodrick-Prescott filter or structural vector autoregressive processes in Julia. The whole thing is available online with the notebooks running on Binder which allows you to go through the programming examples in your browser. If you plan to use Julia yourself and want to play around, it might be a place to start.

  • Aug, 2018
  • Aug 27, 2018Facilitate reproducible research with cookiecutter-research-template
  • This DAG is produced by a sample project for reproducible research from I extended this template with the templating engine cookiecutter and various other software engineering tools.

  • Aug 21, 2018Identifying Software Patents
  • In 2015, I wrote my Bachelor's thesis on identifying software patents. This is useful and necessary in two ways. First, there is no official system to sort patents this way. The main system used by the USPTO focuses on the technological and functional form. A subclass dealing with dispensing solids contains manure spreaders and toothpaste tubes. In contrast, researchers are more interested in topics like automation or software. Second, I learned Python and made my first steps into the world of machine learning.

    You can find the whole project on Github as well as the paper. There is also a script to download different kinds of data sets. The raw data uses approximately 90GB of disk space whereas the data for replicating the previous results based on a simple algorithm is currently less than 1GB.

    Now, let us see what has been done so far.

  • Jun, 2018
  • Jun 11, 2018How to download files with Python
  • Binder

    This is a short script in python to download files, resume and to validate downloads with hash values. It is useful to distribute projects and data separately. You can find it at the end of the article. An interactive version of the notebook is available as a Binder notebook.

  • Mar, 2018
  • Mar 21, 2018How to compile and distribute an R package with conda
  • This article shows how to compile and distribute R packages on to be used in your data science projects. This is useful as R has not really a neat dependency pinning tool like Python with requirements.txt or environment.yml with conda and R is shipped with conda anyway. But, if you want to use the MKL accelerated Microsoft R Open instead of plain R, there are some packages which are currently not provided in conda's default channels or conda-forge. Here is how to lift this obstacle.

  • Oct, 2017
  • Oct 22, 2017The Tragedy of Titanic
  • Binder

    Analysis of survival rates on the Titanic with a placebo test for whether traveling in couples increased the likelihood of survival. An interactive version of the notebook is available by clicking on the binder badge above.