Over a million developers have joined DZone.
{{announcement.body}}
{{announcement.title}}

dplyr – Update Rows with Earlier/Previous Row Values

DZone's Guide to

dplyr – Update Rows with Earlier/Previous Row Values

· Database Zone ·
Free Resource

Running out of memory? Learn how Redis Enterprise enables large dataset analysis with the highest throughput and lowest latency while reducing costs over 75%! 

Recently I had a data frame which contained a column which had mostly empty values:

> data.frame(col1 = c(1,2,3,4,5), col2  = c("a", NA, NA , "b", NA))
  col1 col2
1    1    a
2    2 <NA>
3    3 <NA>
4    4    b
5    5 <NA>

I wanted to fill in the NA values with the last non NA value from that column. So I want the data frame to look like this:

1    1    a
2    2    a
3    3    a
4    4    b
5    5    b

I spent ages searching around before I came across the na.locf function in the zoo library which does the job:

library(zoo)
library(dplyr)

> data.frame(col1 = c(1,2,3,4,5), col2  = c("a", NA, NA , "b", NA)) %>% 
    do(na.locf(.))
  col1 col2
1    1    a
2    2    a
3    3    a
4    4    b
5    5    b

This will fill in the missing values for every column, so if we had a third column with missing values it would populate those too:

> data.frame(col1 = c(1,2,3,4,5), col2  = c("a", NA, NA , "b", NA), col3 = c("A", NA, "B", NA, NA)) %>% 
    do(na.locf(.))

  col1 col2 col3
1    1    a    A
2    2    a    A
3    3    a    B
4    4    b    B
5    5    b    B

If we only want to populate ‘col2′ and leave ‘col3′ as it is we can apply the function specifically to that column:

> data.frame(col1 = c(1,2,3,4,5), col2  = c("a", NA, NA , "b", NA), col3 = c("A", NA, "B", NA, NA)) %>% 
    mutate(col2 = na.locf(col2))
  col1 col2 col3
1    1    a    A
2    2    a <NA>
3    3    a    B
4    4    b <NA>
5    5    b <NA>

It’s quite a neat function and certainly comes in helpful when cleaning up data sets which don’t tend to be as uniform as you’d hope!

Running out of memory? Never run out of memory with Redis Enterprise databaseStart your free trial today.

Topics:
database ,r language ,dplyr

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}