Over a million developers have joined DZone.
Platinum Partner

Building a Statistical Significance Testing Web Service with R

· Big Data Zone

The Big Data Zone is brought to you in partnership with Hortonworks.  Learn more about Connected Data Platforms that power the creation of modern data applications and how they deliver actionable intelligence. 

R is a programming language focused on solving statistical and mathematical calculations. R programs often operate on largein-memory data sets, which feels somewhat similar to database programming. Examples in the R Cookbook bear a resemblence to functional programming in clojure, as others have noted.

I’ve been exploring the language to gain insight into related, but disparate technologies that I use with regularity (e.g. Postgres), but for this to be really useful, I’d like to see R behind a webservice. Looking through the official website, there are many defunct attempts at using R in this manner, often abandoned once the maintainer finishes their masters.

A couple have survived, notably Rook and rApache. Rook is a web server inside of R, and rApache, as you might guess, is an Apache module that calls R. I’ve chosen rApache, as I’d like to have a battle-tested front-end for this – while R seems to have very committed maintainers, there do not seem to be very many of them, and I have yet to find examples of anyone running this as a production application.

Inspired by WolframAlpha’s APIs, I built a small web service to test statistical significance. In the future I intend to do tests on performance and security, as well as available JSON libraries.

Here is the installation procedure:

apt-get upgrade
apt-get update
apt-get install r-base r-base-dev
apt-get install apache2-mpm-prefork apache2-prefork-dev
apt-get install git-core
git clone https://github.com/jeffreyhorner/rapache.git
cd rapache
./configure
make
make test
make install
vi /etc/apache2/httpd.conf

Apache configuration settings:

 
LoadModule R_module /usr/lib/apache2/modules/mod_R.so
 
<Location /RApacheInfo>
SetHandler r-info
</Location>
 
ROutputErrors
 
<Directory /var/www/R>
        SetHandler r-script
        RHandler sys.source
</Directory>
/etc/init.d/apache2 restart

And these are the contents of ws.R:

 
setContentType("application/json")
 
zscore<-function(p, pc, N, Nc){ (p-pc)
     / sqrt(p * (1-p) / N + pc * (1-pc) / Nc) }
significant<-function(p, pc, N, Nc){
     zscore(p, pc, N, Nc) > 1.65 }
 
valid<-function(x){ nchar(x) < 10 }
 
if (!valid(GET$pc)
 || !valid(GET$p)
 || !valid(GET$N)
 || !valid(GET$Nc)) {
  cat('error:arg length')
} else {
cat(significant(as.numeric(GET$p),
                as.numeric(GET$pc),
                as.numeric(GET$N),
                as.numeric(GET$Nc)))
}
 
OK

For instance, the output of http://localhost:8080/R/ws.R?p=.15&pc=.10&N=1000&Nc=1100
is “TRUE”

The Big Data Zone is brought to you in partnership with Hortonworks.  Learn, Collaborate, and Thrive with Hortonworks Community Connection

Topics:

Published at DZone with permission of Gary Sieling , DZone MVB .

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}