DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Tips and Tricks for Efficient Coding in R
  • How to Get a Non-Programmer Started with R
  • Python vs. R: A Comparison of Machine Learning in the Medical Industry
  • How to Rectify R Package Error in Android Studio

Trending

  • Spring AI Advisors: Chat Memory, Token Tracking, and Message Logging
  • When Snowflake Lies to You: Understanding False Failures in dbt Pipelines
  • Agentic AI Has an Observability Blind Spot Nobody Is Talking About
  • Mobile App Development Trends and Best Practices
  1. DZone
  2. Coding
  3. Languages
  4. R: dplyr -- Segfault Cause 'memory not mapped'

R: dplyr -- Segfault Cause 'memory not mapped'

By 
Mark Needham user avatar
Mark Needham
·
Jun. 24, 15 · Interview
Likes (0)
Comment
Save
Tweet
Share
1.7K Views

Join the DZone community and get the full member experience.

Join For Free

In my continued playing around with web logs in R I wanted to process the logs for a day and see what the most popular URIs were.

I first read in all the lines using the read_lines function in readr and put the vector it produced into a data frame so I could process it using dplyr.

library(readr)
dlines = data.frame(column = read_lines("~/projects/logs/2015-06-18-22-docs"))

In the previous post I showed some code to extract the URI from a log line. I extracted this code out into a function and adapted it so that I could pass in a list of values instead of a single value:

extract_uri = function(log) {
  parts = str_extract_all(log, "\"[^\"]*\"")
  return(lapply(parts, function(p) str_match(p[1], "GET (.*) HTTP")[2] %>% as.character))
}

Next I ran the following function to count the number of times each URI appeared in the logs:

library(dplyr)
pages_viewed = dlines %>%
  mutate(uri  = extract_uri(column)) %>% 
  count(uri) %>%
  arrange(desc(n))

This crashed my R process with the following error message:

segfault cause 'memory not mapped'

I narrowed it down to a problem when doing a group by operation on the ‘uri’ field and came across this post which suggested that it was handled more cleanly in more recently version of dplyr.

I upgraded to 0.4.2 and tried again:

## Error in eval(expr, envir, enclos): cannot group column uri, of class 'list'

That makes more sense. We’re probably returning a list from extract_uri rather than a vector which would fit nicely back into the data frame. That’s fixed easily enough by unlisting the result:

extract_uri = function(log) {
  parts = str_extract_all(log, "\"[^\"]*\"")
  return(unlist(lapply(parts, function(p) str_match(p[1], "GET (.*) HTTP")[2] %>% as.character)))
}

And now when we run the count function it’s happy again, good times!





R (programming language) Dplyr

Opinions expressed by DZone contributors are their own.

Related

  • Tips and Tricks for Efficient Coding in R
  • How to Get a Non-Programmer Started with R
  • Python vs. R: A Comparison of Machine Learning in the Medical Industry
  • How to Rectify R Package Error in Android Studio

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook