This article shows how to compile and distribute R packages on anaconda.org to be used in your data science projects. This is useful as R has not really a neat dependency pinning tool like Python with requirements.txt or environment.yml with conda and R is shipped with conda anyway. But, if you want to use the MKL accelerated Microsoft R Open instead of plain R, there are some packages which are currently not provided in conda's default channels or conda-forge. Here is how to lift this obstacle.

Introduction to Anaconda and R

I like to manage my research projects with conda which is the package manager for Anaconda, a popular Python distribution for data science. For one of my recent projects, I also needed to install R and I was lucky to find out that R is also available with conda.

First, you create your normal Python environment for your new project

$ conda create --name project python=3.6 anaconda

This installs a complete Anaconda distribution with Python 3.6 under the name project. If you only want the bare Python interpreter, call

$ conda create -n project python=3.6

Activate the environment with

$ activate project

Hint: Since my os is Windows, I prefer to use Powershell. Unfortunately, activating and deactivating your environment usually fails. Try out pscondaenvs which installs corrected Powershell scripts.

If we also need an R distribution for our project, we have to make the decision between to R from R-Project and MRO from Microsoft. The latter is the default with Anaconda, but the former is still provided if you are running a 32-bit operating system or an older macOS version. The advantage of MRO is that it is using the Intel Math Kernel Library (MKL) and enables multithreading by default (here are some benchmark reports and information on how to set the number of threads used).

Installing a basic R interpreter from MRO is as simple as typing

$ conda install --channel r mro-base

If you want to install R from R-Project, type conda install -c r r-base.

It is important to note that it is impossible to have both R interpreters in one environment and that packages for either one of them do not work with the other. You can find more information here.

There is also the option to install a whole R distribution which is called r-essentials. It bundles many useful packages along with IRkernel which enables you to use R in Jupyter notebooks. Again, there are two commands depending on which R interpreter you are using.

$ conda install -c r r-essentials
$ conda install -c r r-essentials r-base

To update all of your R packages run

$ conda update r-essentials

If your desired package is not available in r-essentials, you can use the search on https://anaconda.org to find a channel which offers your package. But what if your package is not available?

Building and Distributing an R package

I had the same issue when I wanted to use mice which is a known framework for multiple imputations by chained equations.

To build the package for your conda distribution, invoke the following command

$ conda skeleton cran r-mice

This will create a folder called r-mice which contains three files, bld.bat, build.sh and meta.yml. meta.yml is the important file which controls the compilation. An example can be found here. Note the key called requirements. inside host and run the R interpreter is defined. By default, it is r-base. If you are using MRO, you have to change this to mro-base.

requirements:
  build:
    - {{ compiler('c') }}          # [not win]
    - {{ compiler('cxx') }}        # [not win]
    - {{native}}toolchain          # [win]
    - {{posix}}filesystem          # [win]
    - {{posix}}make

  host:
    - r-base
    - r-mass
    - r-rcpp
    - r-lattice
    - r-nnet
    - r-rpart
    - r-survival

  run:
    - r-base
    - {{native}}gcc-libs           # [win]
    - r-mass
    - r-rcpp
    - r-lattice
    - r-nnet
    - r-rpart
    - r-survival

In the next step, we want to compile the package. If you are on Windows, make sure to install RTools in advance and add binaries to your system's path during the installation. To compile the package, type

$ conda build r-mice

After the compilation ended successfully, you can install the package with

$ conda install --use-local r-mice

You can also upload the package to Anaconda.org to your private repository and make it accessible to all people.

That's what I did. I have created a repository which builds the package for Linux and macOS with Travis-CI and Windows with Appveyor. The compiled packages are uploaded to my account and can be installed via

$ conda install -c brimborium r-mice

Compile your own package

If you want to use my solution for yourself, fork my repository. Then, replace the recipe in conda-recipe with your recipe create with

$ conda skeleton cran <package name>

Push the repository to your Github account and create accounts on Travis- CI, Appveyor and Anaconda.org.

Next, go to Anaconda.org, settings, access and create a token. Make sure to check allow read and write API access. The rest is optional.

Set the resulting token as a secret environment variable to your projects on Travis-CI and Appveyor. Make sure to name it CONDA_UPLOAD_TOKEN as the scripts look for a variable called like this.

I know this was a fair amount of information and if you are not familiar with automation tools like Travis-CI and Appveyor, this looks even more deterrent. But that is why I created my repository which is reduced to the minimal amount of code to do the task.

I will also extend this post and the repository description in the future. Since I do only have a rudimentary knowledge of the background of what I did, there is a lot of room to grow :).

References