Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

Building a Statistical Significance Testing Web Service with R

DZone's Guide to

Building a Statistical Significance Testing Web Service with R

· Big Data Zone ·
Free Resource

The open source HPCC Systems platform is a proven, easy to use solution for managing data at scale. Visit our Easy Guide to learn more about this completely free platform, test drive some code in the online Playground, and get started today.

R is a programming language focused on solving statistical and mathematical calculations. R programs often operate on largein-memory data sets, which feels somewhat similar to database programming. Examples in the R Cookbook bear a resemblence to functional programming in clojure, as others have noted.

I’ve been exploring the language to gain insight into related, but disparate technologies that I use with regularity (e.g. Postgres), but for this to be really useful, I’d like to see R behind a webservice. Looking through the official website, there are many defunct attempts at using R in this manner, often abandoned once the maintainer finishes their masters.

A couple have survived, notably Rook and rApache. Rook is a web server inside of R, and rApache, as you might guess, is an Apache module that calls R. I’ve chosen rApache, as I’d like to have a battle-tested front-end for this – while R seems to have very committed maintainers, there do not seem to be very many of them, and I have yet to find examples of anyone running this as a production application.

Inspired by WolframAlpha’s APIs, I built a small web service to test statistical significance. In the future I intend to do tests on performance and security, as well as available JSON libraries.

Here is the installation procedure:

apt-get upgrade
apt-get update
apt-get install r-base r-base-dev
apt-get install apache2-mpm-prefork apache2-prefork-dev
apt-get install git-core
git clone https://github.com/jeffreyhorner/rapache.git
cd rapache
./configure
make
make test
make install
vi /etc/apache2/httpd.conf

Apache configuration settings:

 
LoadModule R_module /usr/lib/apache2/modules/mod_R.so
 
<Location /RApacheInfo>
SetHandler r-info
</Location>
 
ROutputErrors
 
<Directory /var/www/R>
        SetHandler r-script
        RHandler sys.source
</Directory>
/etc/init.d/apache2 restart

And these are the contents of ws.R:

 
setContentType("application/json")
 
zscore<-function(p, pc, N, Nc){ (p-pc)
     / sqrt(p * (1-p) / N + pc * (1-pc) / Nc) }
significant<-function(p, pc, N, Nc){
     zscore(p, pc, N, Nc) > 1.65 }
 
valid<-function(x){ nchar(x) < 10 }
 
if (!valid(GET$pc)
 || !valid(GET$p)
 || !valid(GET$N)
 || !valid(GET$Nc)) {
  cat('error:arg length')
} else {
cat(significant(as.numeric(GET$p),
                as.numeric(GET$pc),
                as.numeric(GET$N),
                as.numeric(GET$Nc)))
}
 
OK

For instance, the output of http://localhost:8080/R/ws.R?p=.15&pc=.10&N=1000&Nc=1100
is “TRUE”

Managing data at scale doesn’t have to be hard. Find out how the completely free, open source HPCC Systems platform makes it easier to update, easier to program, easier to integrate data, and easier to manage clusters. Download and get started today.

Topics:

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}