# R vs. SAS vs. SPSS

# R vs. SAS vs. SPSS

### A comparison of the SAS and SPSS analytics tools vs. the R programming language.

Join the DZone community and get the full member experience.

Join For Free**Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.**

Such titles, in many cases, are just introductions to flam wars. But not on this blog.

Today we are going to illustrate some subtle differences among three statistical packages, R/SAS/SPSS. Small differences, but sometimes even a very small difference may have large consequences. So it is worth to know such things.

In statistics it is not that uncommon that different estimators may be used for the same parameter. A typical example is the standard deviation with two widely used estimators (biased/unbiased). But do you know, that for skewness and for kurtosis there are three common estimators? And for quantiles there is even more, namely 9 different estimators?

And the bizarre thing is that for different statistical packages different estimators are selected as the default ones?

Let’s have a more detailed look.

**Skewness / Kurtosis**

To calculate these two statistics in R, one can use functions *skewness* and *kurtosis* from the package *e1071*. Both functions have additional parameter *type* to select which estimate of skewness / kurtosis should be calculated.

In R the default option is *type=3*, but in SAS and SPSS by default equivalents of *type=2* are calculated.

```
x = runif(101)
sapply(1:3, skewness, x=x, na.rm=T)
# [1] 0.1245367 0.1264220 0.1226917
sapply(1:3, kurtosis, x=x, na.rm=T)
# [1] -1.116490 -1.111956 -1.153602
```

**Quantiles**

In R in order to calculate qunatiles one can use function *quantile*. It has an additional argument *type*, which takes values from 1 to 9. Each option is a different estimator for quantiles. In R by default the definition 7 is used. But for SAS you shall expect results equivalent to *type=3*, while for SPSS results equivalent with* type=6*.

```
sapply(1:9, function(q) quantile(x, 0.01, type=q))
1% 1% 1% 1% 1% 1% 1% 1% 1%
0.02272536 0.02272536 0.01426692 0.01435151 0.01858073 0.01443609 0.02272536 0.01719918 0.01754457
```

**Contrasts**

In R, to fit a linear model one usually uses the *lm* function. The argument *contrasts* specifies what contrasts are used for qualitative variables. The default contrasts in R are *contr.treatment* while in SAS you shall expect results equal to these obtained with *contr.SAS*.

```
lm(Sepal.Width~Species, data=iris, contrasts = contr.SAS)$coef
# (Intercept) Speciesversicolor Speciesvirginica
# 3.428 -0.658 -0.454
lm(Sepal.Width~Species, data=iris, contrasts = list(Species=contr.SAS))$coef
#(Intercept) Species1 Species2
# 2.974 0.454 -0.204
```

**Take Home**

Even basic statistics like skewness or kurtosis may be calculated in a different way in different statistical packages.

If we are building an analytical solution that is based on R/SAS/SPSS we shall be aware of the possibility that for the same statistic default settings for different packages may lead to different results.

**Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub. Join the discussion.**

Published at DZone with permission of deepsense.io Blog , DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

## {{ parent.title || parent.header.title}}

## {{ parent.tldr }}

## {{ parent.linkDescription }}

{{ parent.urlSource.name }}