In early 2015, Microsoft announced its successful acquisition of Revolution Analytics, which made R available as an enterprise ready statistical and data science solution. Initially, Microsoft stated that they would be using this acquisition to seamlessly integrate the power of the R language into Microsoft SQL server and the Azure product line. Throughout the R community, there were worries about the fate of Revolution Analytics’ Revolution R Open.
Luckily, in January 2016, fears were assuaged when Microsoft announced that in addition to their enterprise R offerings that they would also be continuing to offer Microsoft R Open, the enhanced R distribution formerly known as Revolution R Open.
For both new and experienced R users, Microsoft R Open offers additional functionalities to the R language. What are these enhancements and in what situations are they useful?
First, Microsoft R Open contains a set of libraries to enable certain analyses to run in a multithreaded way. They state that just by enabling these packages, your machine will automatically utilize all available cores and see improvement for many common R operations, as well as any functions that use matrix operations. Depending on the operation and number of cores, you could see speed-ups of anywhere from 5x to 45x.
When R was initially conceived, it was built with single processing in mind, but as the amount of data has grown, computational power has reached it’s limits. With Microsoft R Open, you can run various matrix computations and utilize your machine’s multiple cores to see performance enhancements. These enhancements can be used directly but are also used for common analyses like regression. In the past, if you wanted to run an analysis multithreaded, you had to write custom code. Now, you can write the same code and see the performance benefits.
One frustration with R is a lack of built-in version control capabilities. Since R is an open source project with packages changing rapidly, you can be left with code that worked last week but not today. For R users in a research or enterprise setting, reproducibility of analyses is key. You might want to pass a script off to a colleague or move the script to run on a server, but the script’s behavior might change from your initial runs due to changes in the libraries you used.
Microsoft helps to remedy this issue in two ways:
- Fixed CRAN Snapshot – With every release of Microsoft R Open, they take a snapshot of CRAN at a specific date. For example, the current release of Microsoft R Open is 3.2.3 and it’s CRAN snapshot was taken on January 1st, 2016. This means that every user of Microsoft R Open using 3.2.3 has access to the same packages.
- The checkpoint package – This package allows you to take custom snapshots of your R configuration at any point in time. Instead of relying on the fixed snapshot that is already provided, you have the ability to move the clock forward or backward as needed.
CRAN vs. Microsoft R Open
The big question comes down to, when should you use Microsoft R Open over CRAN, the most popular R distribution? To be clear, there is no functionality lost if you use Microsoft R Open over CRAN. When you use Microsoft R Open, you retain the full functionality of a CRAN distribution and have access to all available libraries and development environments like RStudio or Jupyter. Microsoft states that anywhere you use CRAN, you can use Microsoft R Open.
Along with the added functionalities described above, a benefit of Microsoft R Open is that you retain the flexibility of an open source project while still having a large corporation guarantee the reliability of the product. This kind of confidence is key for companies that rely on certain analyses to make business decisions and want to guarantee that an analysis will work today, tomorrow, and months down the road.
With all these benefits, why would you not use Microsoft R Open over CRAN? First of all, CRAN always to have access to the latest package code. In Microsoft R Open, you have to wait for a new release for a guarantee that new and updated packages will work with your distribution. For the casual R user, the additional benefits come with a learning curve without the established community of CRAN. Since Microsoft R Open is still new to most users, you may not have the community support or knowledge like you do with CRAN.
If you or your company is looking for additional performance improvements with R or a better reproducibility of your R scripts, then Microsoft R Open may be a good solution. Just like with CRAN, Microsoft R Open is open source and available for download.