Over a million developers have joined DZone.

Analyzing the Oroville Dam Spillover With the ELK Stack

DZone's Guide to

Analyzing the Oroville Dam Spillover With the ELK Stack

We ingest the data collected during the recent Oroville Dam incident into the ELK Stack via Logstash and then visualize and analyze the information in Kibana.

· Big Data Zone ·
Free Resource

Access NoSQL and Big Data through SQL using standard drivers (ODBC, JDBC, ADO.NET). Free Download 

Natural disasters and other types of dramatic events taking place over the globe sometimes provide good opportunities to those interested in practicing their data analysis and visualization pipelines.

These days, publicly available datasets such as the one explored below can be easily retrieved and from a variety of open repositories, stored in a datastore, and then analyzed with your tool of choice.

This article will show how to ingest the data collected during the recent Oroville Dam incident into the ELK Stack via Logstash and then visualize and analyze the information in Kibana. This same process can be used for virtually any public dataset.

The Oroville Dam Data

The California Department of Water Resources publishes rich datasets on all the water sources in the state, so it was extremely easy to query the website for the required data.

As you can see, the query results in a dataset showing the following metrics collected by the sensors deployed at the dam:

  • Reservoir elevation.
  • Reservoir storage.
  • Dam outflow.
  • Dam inflow.
  • Spillway outflow.
  • Rain.
  • Volts.

All that is required is to copy the data into a text file so it can be ingested into Elasticsearch via Logstash.

The Logstash Configuration

When analyzing logs, parsing and filtering the data during ingestion is a critical step in the pipeline. This process is what helps give the logs the context needed so that they can be later be analyzed more easily in Kibana.

The same rule applies in our case. We want the time-ordered readings from the dam sensors to be parsed correctly so each field is mapped correctly. The way to do this is, of course, using grok.

A sample measurement is:

02/06/2017 03:00     849.47    2801248   29703     31700     29792     29.68     13.4

So, the configuration for Logstash in our case looks as such:

input {
 file {
       path => "/logs/oroville.log"
       start_position => "beginning"
       type => "log"
filter {
 if [type] == "log"  {
   grok {
     match => [
       "message", '%{DATA:timestamp}\t%{DATA:Reservoir_Elevation:int}\t\t%{DATA:Reservoir_Storage:int}\t\t%{DATA:Dam_Outflow:int}\t\t%{DATA:Dam_Inflow:int}\t\t%{DATA:Spillway_Outflow:int}\t\t%{DATA:Rain:int}\t\t%{GREEDYDATA:Volts:int}\t'
   date {
     match => [ "timestamp" , "MM/dd/yyyy HH:mm" ]
     remove_field => [ "timestamp" ]
     timezone => "PST8PDT"      
output {
   tcp {
   host => "listener.logz.io"
   port => 5050
   codec => json_lines

We are using the file input plugin to pull the readings, grok filters to parse the message, and the output is pointing to our Logz.io listeners. If you were using your own ELK Stack, you would need to change this to point to your Elasticsearch instance.

Analyzing the Data

When you open Kibana, the sensor readings will look as follows:

sensor readings

All the metrics measured by the different sensors at the dam are mapped, so the data is much easier to slice and dice.

Let’s start with visualizing the average amount of rain measured. To do this, we will use a bar chart in which each bar represents a day within the time period analyzed:

average amount of rain

We can see a gradual rise of rainfall (in inches) as the event grew worse.

How about the storage level of the reservoir held back by the dam? In this case, we’re using a line chart visualization to see how the reservoir levels rise over time:

reservoir levels

The max capacity of the reservoir is just under 3.54 million acre-feet, and so we can see how this capacity was surpassed during the crisis. We can also add a traffic light visualization to display the breach:

display breach

The data for the spillover (measured in cubic feet per second) — the emergency drainage mechanism to handle overflows — is a bit more erratic and can be explained by how the event developed, with various tactics applied and failing one after the other:

spillover data

The total amount of water spilled over during the incident and as measured by the dam sensors can be displayed using a Kibana metric visualization:

water amount spilled

Adding all of these into one dashboard gives us a more comprehensive picture of the event:


This is just an example of how the ELK Stack can easily be used to ingest and visualize data, but of course, the stack is designed for much larger datasets.

As an endnote, I’d like to express my sympathy with the 200,000 people evacuated from their homes during this event. Thank you to the State of California for making this data available.

The fastest databases need the fastest drivers - learn how you can leverage CData Drivers for high performance NoSQL & Big Data Access.

big data ,elk stack ,logstash ,tutorial ,data analysis ,kibana

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}