Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

R vs. SAS vs. SPSS

DZone's Guide to

R vs. SAS vs. SPSS

A comparison of the SAS and SPSS analytics tools vs. the R programming language.

· Big Data Zone
Free Resource

Effortlessly power IoT, predictive analytics, and machine learning applications with an elastic, resilient data infrastructure. Learn how with Mesosphere DC/OS.

Such titles, in many cases, are just introductions to flam wars. But not on this blog.

Today we are going to illustrate some subtle differences among three statistical packages, R/SAS/SPSS. Small differences, but sometimes even a very small difference may have large consequences. So it is worth to know such things.

In statistics it is not that uncommon that different estimators may be used for the same parameter. A typical example is the standard deviation with two widely used estimators (biased/unbiased). But do you know, that for skewness and for kurtosis there are three common estimators? And for quantiles there is even more, namely 9 different estimators?

And the bizarre thing is that for different statistical packages different estimators are selected as the default ones?

Let’s have a more detailed look.

Skewness / Kurtosis

To calculate these two statistics in R, one can use functions skewness and kurtosis from the package e1071. Both functions have additional parameter type to select which estimate of skewness / kurtosis should be calculated.

In R the default option is type=3, but in SAS and SPSS by default equivalents of type=2 are calculated.

 x = runif(101)
 sapply(1:3, skewness, x=x, na.rm=T)
# [1] 0.1245367 0.1264220 0.1226917
 sapply(1:3, kurtosis, x=x, na.rm=T)
# [1] -1.116490 -1.111956 -1.153602

Quantiles

In R in order to calculate qunatiles one can use function quantile. It has an additional argument type, which takes values from 1 to 9. Each option is a different estimator for quantiles. In R by default the definition 7 is used. But for SAS you shall expect results equivalent to type=3, while for SPSS results equivalent with type=6.

sapply(1:9, function(q) quantile(x, 0.01, type=q))
        1%         1%         1%         1%         1%         1%         1%         1%         1% 
0.02272536 0.02272536 0.01426692 0.01435151 0.01858073 0.01443609 0.02272536 0.01719918 0.01754457 

Contrasts

In R, to fit a linear model one usually uses the lm function. The argument contrasts specifies what contrasts are used for qualitative variables. The default contrasts in R are contr.treatment while in SAS you shall expect results equal to these obtained with contr.SAS.

 lm(Sepal.Width~Species, data=iris, contrasts = contr.SAS)$coef
#      (Intercept) Speciesversicolor  Speciesvirginica 
#            3.428            -0.658            -0.454 
 lm(Sepal.Width~Species, data=iris, contrasts = list(Species=contr.SAS))$coef
#(Intercept)    Species1    Species2 
#      2.974       0.454      -0.204 

Take Home

Even basic statistics like skewness or kurtosis may be calculated in a different way in different statistical packages.

If we are building an analytical solution that is based on R/SAS/SPSS we shall be aware of the possibility that for the same statistic default settings for different packages may lead to different results.

Learn to design and build better data-rich applications with this free eBook from O’Reilly. Brought to you by Mesosphere DC/OS.

Topics:
big data ,comparison ,tools ,analytics

Published at DZone with permission of deepsense.io Blog, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

THE DZONE NEWSLETTER

Dev Resources & Solutions Straight to Your Inbox

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

X

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}