Over a million developers have joined DZone.

Python Style Logging in R

Logging in R? Not fun. Not fun without using particular settings that is. Read on to make your R logging more effective.

· Big Data Zone

Hortonworks DataFlow is an integrated platform that makes data ingestion fast, easy, and secure. Download the white paper now.  Brought to you in partnership with Hortonworks

We are increasingly using R in “operational” settings that require robust error handling and logging. In this post, we describe a quick-and-dirty way to get python style multi-level log files by wrapping the futile.logger package.

Making Messy Data Pretty

Our real world scenario involves R scripts that process raw smoke monitoring data that is updated hourly. The raw data comes from various different instruments, set up by different agencies and transmitted over at least two satellites before eventually arriving on our computers.

Data can be missing, delayed or corrupted for a variety of reasons before it gets to us. And then our R scripts perform QC based on various columns available in the raw (aka “engineering level”) data.

Ultimately, the cleaned up data gets displayed in a user interface here.

monitoring_v3_mapNo one needs to know all of the hard work that goes into cleaning and QC’ing the data that goes into this map. Our goal, after all, is to make it easier for stakeholders to make decisions without having to worry about the fussy details of raw data.

Logging When Things Go Wrong

However, neither the incoming data, nor the R scripts we have written to process it, are perfect. Things occasionally go wrong in a variety of ways and, when decision makers depend on the data being available, phone calls and text messages are next.

Buddhist teacher Pema Chödrön’s book When Things Fall Apart has several advice categories that actually apply in this situation. We will focus on one:

  • Using painful emotions to cultivate wisdom, compassion, and courage
  • Communicating so as to encourage others to open up rather than shut down
  • Practices for reversing habitual patterns
  • Methods for working with chaotic situations
  • Ways for creating effective social action

One of the methods for working with chaotic situations in operational software is to have lots and Lots and LOTS of logging.

Python has the brilliant “logging” module that allows us to quickly set up separate output files for log statements at different levels: DEBUG, INFO, WARNING, ERROR, WARNING, CRITICAL. Inside our operational code we only need to issue single calls like:

logger.info('Ingesting data file: %s', data_file)

and this information will automatically be written out to both the DEBUG and the INFO log files if we have set them up.

In our situation it makes sense to write out three log files for quick perusal by authorized personnel:

  • ~_DEBUG.log — for programmers needing to debug what went wrong
  • ~_INFO.log — for managers trying to understand the overall situation
  • ~_ERROR.log — (hopefully zero length!) for programmers and managers

These files get written to a uniform directory every time a data processing script runs with the ‘~’ replaced by the name of the processing script.

Wrapping futile.logger

Luckily, most of the functionality we need is provided by futile.logger package (CRAN or Github). Unfortunately, this package does not currently support multiple “appenders” per “logger” so we found ourselves initializing multiple named loggers to mimic the natural behavior of Python’s logging module:

# Only FATAL messages go to the console
dummy <- flog.threshold(FATAL)

# Set up new log files
if ( file.exists(DEBUG_LOG) ) dummy <- file.remove(DEBUG_LOG)
dummy <- flog.logger("debug", DEBUG, appender.file(DEBUG_LOG))

if ( file.exists(INFO_LOG) ) dummy <- file.remove(INFO_LOG)
dummy <- flog.logger("info", INFO, appender.file(INFO_LOG))

if ( file.exists(ERROR_LOG) ) dummy <- file.remove(ERROR_LOG)
dummy <- flog.logger("error", ERROR, appender.file(ERROR_LOG))

# Silence other warning messages
options(warn=-1) # -1=ignore, 0=save/print, 1=print, 2=error

Then, inside the code, we created logging and error handling blocks like this:

if ( class(result)[1] == "try-error" ) {
    err_msg <- geterrmessage()
    flog.error('Error creating SpatialPoitnsDataFrame: %s', err_msg, name='debug')
    flog.error('Error creating SpatialPointsDataFrame: %s', err_msg, name='info')
    flog.error('Error creating SpatialPointsDataFrame: %s', err_msg, name='error')
  flog.info('Using rgdal::writeOGR to create geojson file %s', filename, name='debug')
  flog.info('Using rgdal::writeOGR to create geojson file %s', filename, name='info')


After writing just a few lines like this we tried a different approach and came up with a simple py_logging.R script that can be used to wrap futile.logger calls and give us the python style behavior we want. Setup is a cinch:

source('py_logging.R') # python style logging

# Set up logging
logger.setup(debugLog, infoLog, errorLog)

# Silence other warning messages
options(warn=-1) # -1=ignore, 0=save/print, 1=print, 2=error

And logging is much more natural:

  if ( class(result)[1] == "try-error" ) {
    err_msg <- geterrmessage()
    logger.error('Error creating SpatialPointsDataFrame: %s', err_msg)
  logger.info('Using rgdal::writeOGR to create goejson file %s', filename)

Output from our logger looks great!

INFO [2016-08-24 10:11:54] Running process_data.R version 0.1.5
INFO [2016-08-24 10:11:54] Working on /home/data/location/datafile_1.RData
INFO [2016-08-24 10:11:55] Found 1232 data columns
INFO [2016-08-24 10:12:03] Working on /home/data/location/datafile_2.RData
INFO [2016-08-24 10:12:03] Found 103 data columns
INFO [2016-08-24 10:12:03] Working on /home/data/location/datafile_3.RData
INFO [2016-08-24 10:12:03] Found 44 data columns

At the end of the day, we can only hope we have succeeded at one or another other of:

  • Using painful emotions to cultivate wisdom, compassion, and courage
  • Communicating so as to encourage others to open up rather than shut down

Best wishes for excellent logging!

Hortonworks Sandbox is a personal, portable Apache Hadoop® environment that comes with dozens of interactive Hadoop and it's ecosystem tutorials and the most exciting developments from the latest HDP distribution, brought to you in partnership with Hortonworks.


Published at DZone with permission of Jonathan Callahan, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

The best of DZone straight to your inbox.

Please provide a valid email address.

Thanks for subscribing!

Awesome! Check your inbox to verify your email so you can start receiving the latest in tech news and resources.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}