Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

DZone's Guide to

### Understanding all the R packages out there is hard, but this one's pretty interesting. Check out some standard censored lifetime data and see how to generate it into a nice visual.

· Big Data Zone
Free Resource

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

Learn best practices according to DataOps. Download the free O'Reilly eBook on building a modern Big Data platform.

There are now more than 10,000 R packages available from CRAN — and much more if you include those available only on GitHub. So, to be honest, it's difficult to know all of them. But sometimes, you discover a nice function in one of them, and that is really awesome. Consider for instance some (standard) censored lifetime data:

``````n=10000
idx=sample(1:4,size=n,replace=TRUE)
pd=LETTERS[idx]
lambda=1+(idx-1)/3
t=rexp(n,lambda)
x=rexp(n)
c=t>x
y=pmin(t,x)
df=data.frame(time=y,status=c,product=pd)``````

Yes, I will generate them here. Consider the Kaplan-Meier estimator of the survival function:

``````library(survival)
km.base = survfit( Surv(time,status) ~ 1  , data = df )
plot(km.base)``````

Recently, Anat (currently finishing the Data Science for Actuaries program) helped me discover a nice R function to add information to that graph (well, not that graph, since it will be a ggplot version, but the same survival distribution plot):

``````library(ggplot2)
library(survminer)
ggsurvplot(km.base, main = "", color = "blue" , censor = FALSE, xlim = c(0,3), risk.table = TRUE ,
risk.table.col = "blue" , risk.table.height = 0.2, risk.table.title = "" , legend.labs = "All" , legend.title = "" , break.time.by = 1, xlab = "" , ylab = "")``````

This is more interesting when we have different lifetimes:

``````km.prod = survfit( Surv(time,status) ~ product  , data = df )
ggsurvplot(km.prod, main = "", censor = FALSE, xlim = c(0,3), risk.table = TRUE , risk.table.col = "strata" , risk.table.height = 0.3, risk.table.title = "" , legend.labs = LETTERS[1:4] , legend.title = "" , break.time.by = 1, xlab = "" , ylab = "")``````

Or a different time granularity:

Nice, isn't it?

Topics:
big data ,tutorial ,data visualizations ,r ,lifetime distributions

Comment (0)

Save
{{ articles[0].views | formatCount}} Views

Published at DZone with permission of Arthur Charpentier, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.