DZone
Big Data Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Big Data Zone > Processing and analysing sensor data: a DIY approach

Processing and analysing sensor data: a DIY approach

Comsysto Gmbh user avatar by
Comsysto Gmbh
·
May. 16, 14 · Big Data Zone · Interview
Like (0)
Save
Tweet
9.51K Views

Join the DZone community and get the full member experience.

Join For Free

This post was written by Mario Koppen at the comSysto blog.

Motivated by a current customer project (and the interesting nature of Big Data projects from industry in general), we decided to get our hands on sensor data. We wanted to learn how to handle, store and analyze it and what specific challenges sensor data presents.

To get sensor data, we decided to generate our own by putting sensors into our office. We found Tinkerforge’s system of bricks and bricklets quite nice and easy to start with, so we went for that option.

We got the following four sensor bricklets:

  • Sound intensity (basically a small microphone)
  • Temperature
  • A multi-touch bricklet (12 self-made touch pads from aluminium foil can be connected)
  • A motion detector

The four bricklets are connected to a master bricklet, which is in turn connected to a Raspberry Pi.

We put the temperature sensor into a central place in the office. We set up the motion detector in the hallway leading to the kitchen and the bathrooms. We put the sound intensity sensor next to the kitchen door and placed touch sensors on the coffee machine, the fridge door and the door handle for the men’s bathroom.

Although this is clearly a toy setup (and you will have to wait a long time for the data to become big) we quickly came upon some key issues that also arise in real-world situations with sensors involved.

As a storage solution we chose MongoDB, mainly because it was also used in the customer project that motivated the lab.

The data generated by the four sensors can be grouped into two categories: While the temperature and sound intensity sensors output a constant stream of data, the motion detector and multi-touch sensor are triggered by events that typically don’t occur with a fixed frequency.

This gave rise to two different document models in MongoDB. For the first category (streaming), we used the model that MongoDB actually suggests as best practice for such a situation and that could be called the “Time Series Model” (see http://blog.mongodb.org/post/65517193370/schema-design-for-time-series-data-in-mongodb). It consists of one collection with nested documents in it. The number of nesting levels and the number of subdocuments on each level depends on the time granularity of the data. In our case, the highest time resolution of the Tinkerforge sensors is 100 ms, which gives rise to the following document structure:

  • One document per hour
  • Fields: timestamp of the hour, sensor type, values
  • Values: nested set of subdocuments: 60 subdocuments for each minute, 60 subdocs for each second, 10 subdocs for each tenth of a second
{
    "_id" : ObjectId("53304fcd74fece149f175975"),
    "timestamp_hour" : "ISODate(2014-03-24T16:00:00)",
    "type" : "SI",
    "values" : {
        "10l" : {
            "05" : {
                "00" : -500,
                "01" : -500,
                "02" : -500,
                "03" : -500,
                "04" : -500,                 
                "05" : -500, 
                "06" : -500, 
                "07" : -500,
                "08" : -500,
                "09" : 0
            }
        }
    }
}

The documents are pre-allocated in MongoDB, initalizing all data fields to a value that is outside the range of the sensor data. This is done to avoid constantly growing documents which MongoDB would have to keep moving around on disk.

Data of the second type (event-driven/triggered) is stored in a “bucket”-like document model. For each sensor type, a number of documents with a fixed number of entries for the values (a bucket of size of e.g. 100) are pre-allocated. Events are then written into these documents as they occur. Each event corresponds to a subdocument which sits in an array with 100 entries. The subdocument carries the start and end time of the event as well as its duration. As the first record/event is written into the document, the overall document gets a timestamp corresponding to the start date/time. On each write to the database, the application checks whether the current record is the last fitting into the current document. If so, it sets the end date/time of the document and starts directing writes to the next document.

{
    "_id" : ObjectId("532c1f9774fece0aa9325a13"),
    "end" : ISODate("2014-03-21T12:18:12.648Z"),
    "start" : ISODate("2014-03-21T12:16:39.047Z"),
    "type" : "MD",
    "values" : [
        {
            "start" : ISODate("2014-03-21T12:16:44.594Z"),
            "length" : 5,
            "end" : ISODate("2014-03-21T12:16:49.801Z")
        },
        {
            "start" : ISODate("2014-03-21T12:16:53.617Z"),
            "length" : 5,
            "end" : ISODate("2014-03-21T12:16:59.615Z")
        },
        {
            "start" : ISODate("2014-03-21T12:17:01.683Z"),
            "length" : 3,
            "end" : ISODate("2014-03-21T12:17:05.147Z")
        },
        {
            "start" : ISODate("2014-03-21T12:17:55.223Z"),
            "length" : 5,
            "end" : ISODate("2014-03-21T12:18:00.470Z")
        }, 
        {
            "start" : ISODate("2014-03-21T12:18:04.653Z"),
            "length" : 7,
            "end" : ISODate("2014-03-21T12:18:12.648Z")
        }
    ]
}

These two document models represent the edge cases of a trade-off that seems to be quite common with sensor data.

The “Time Series” model suggested by MongoDB is great for efficient writing and has the advantage of having a nice, consistent schema: every document corresponds to a natural unit of time (in our case, one hour), which makes managing and retrieving data quite comfortable. Furthermore, the “current” document to write to can easily be inferred from the current time, so the application doesn’t have to keep track of it.

The nested structure allows for the easy aggregation of data at different levels of granularity – although you have to put up with the fact that these aggregations will have to be done “by hand” in your application. This is due to the fact that in this document model there are no single keys for “minute”, “second” and “millisecond”. Instead, every minute, second and millisecond has its own key.

This model has issues as soon as the data can be sparse. This is obviously the case for the data coming from the motion and multi-touch sensors: There is just no natural frequency for this data since events can happen at any time. For the Time Series document model this would mean that a certain fraction of the document fields would never be touched, which obviously is a waste of disk space.

Sparse data can also arise in situations where the sensor data does not seem to be event-driven at first. Namely, many sensors, although they measure data with a fixed frequency, only automatically output this data if the value has changed compared to the last measurement. This is a challenge one has to deal with. If one wanted to stick with the time series document model, one would have to constantly check whether values were omitted by the sensor and update the corresponding slots in the database with the last value that was sent from the sensor. Of course, this would introduce lots of redundancy in the database.

Continue reading here. 

 

 

Big data Document Processing

Published at DZone with permission of Comsysto Gmbh, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • What Is Cloud-Native Architecture?
  • A Guide to Understanding Vue Lifecycle Hooks
  • Image Classification Using SingleStore DB, Keras, and Tensorflow
  • Top 20 Git Commands With Examples

Comments

Big Data Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo