Over a million developers have joined DZone.

Building a Statistical Significance Testing Web Service with R

DZone's Guide to

Building a Statistical Significance Testing Web Service with R

· Big Data Zone
Free Resource

Learn how you can maximize big data in the cloud with Apache Hadoop. Download this eBook now. Brought to you in partnership with Hortonworks.

R is a programming language focused on solving statistical and mathematical calculations. R programs often operate on largein-memory data sets, which feels somewhat similar to database programming. Examples in the R Cookbook bear a resemblence to functional programming in clojure, as others have noted.

I’ve been exploring the language to gain insight into related, but disparate technologies that I use with regularity (e.g. Postgres), but for this to be really useful, I’d like to see R behind a webservice. Looking through the official website, there are many defunct attempts at using R in this manner, often abandoned once the maintainer finishes their masters.

A couple have survived, notably Rook and rApache. Rook is a web server inside of R, and rApache, as you might guess, is an Apache module that calls R. I’ve chosen rApache, as I’d like to have a battle-tested front-end for this – while R seems to have very committed maintainers, there do not seem to be very many of them, and I have yet to find examples of anyone running this as a production application.

Inspired by WolframAlpha’s APIs, I built a small web service to test statistical significance. In the future I intend to do tests on performance and security, as well as available JSON libraries.

Here is the installation procedure:

apt-get upgrade
apt-get update
apt-get install r-base r-base-dev
apt-get install apache2-mpm-prefork apache2-prefork-dev
apt-get install git-core
git clone https://github.com/jeffreyhorner/rapache.git
cd rapache
make test
make install
vi /etc/apache2/httpd.conf

Apache configuration settings:

LoadModule R_module /usr/lib/apache2/modules/mod_R.so
<Location /RApacheInfo>
SetHandler r-info
<Directory /var/www/R>
        SetHandler r-script
        RHandler sys.source
/etc/init.d/apache2 restart

And these are the contents of ws.R:

zscore<-function(p, pc, N, Nc){ (p-pc)
     / sqrt(p * (1-p) / N + pc * (1-pc) / Nc) }
significant<-function(p, pc, N, Nc){
     zscore(p, pc, N, Nc) > 1.65 }
valid<-function(x){ nchar(x) < 10 }
if (!valid(GET$pc)
 || !valid(GET$p)
 || !valid(GET$N)
 || !valid(GET$Nc)) {
  cat('error:arg length')
} else {

For instance, the output of http://localhost:8080/R/ws.R?p=.15&pc=.10&N=1000&Nc=1100
is “TRUE”

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks


Published at DZone with permission of Gary Sieling, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}