DZone
Big Data Zone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
  • Refcardz
  • Trend Reports
  • Webinars
  • Zones
  • |
    • Agile
    • AI
    • Big Data
    • Cloud
    • Database
    • DevOps
    • Integration
    • IoT
    • Java
    • Microservices
    • Open Source
    • Performance
    • Security
    • Web Dev
DZone > Big Data Zone > HDF 3.0 for Utilities (Part 1)

HDF 3.0 for Utilities (Part 1)

Learn from what I learned working at a real-time utilities monitoring startup to see how utilities can benefit from new streaming technologies.

Tim Spann user avatar by
Tim Spann
CORE ·
Jun. 28, 17 · Big Data Zone · Opinion
Like (2)
Save
Tweet
3.13K Views

Join the DZone community and get the full member experience.

Join For Free

Utilities companies have unique needs based on their infrastructure, equipment, and large SAP installations.  This is common across electric, gas, water, solar, and stream. When I worked at a real-time utilities monitoring startup, we saw the difficulties first-hand in installing devices at various types of plants, field locations, offices, warehouses, and other exotic locations.

Often, the most valuable data is locked away in specialized SCADA systems and not available to central IT to run machine learning, predictive analytics, or even basic reporting. Often, a manual export is the only source of occasional disconnected summary data.

Today's enterprises, including utilities, need real-time access to all streams of data. No one wants to wait until Amazon and/or Google become utilities. It's time to get your data, analyze it mid-stream, and land it for machine learning and deep learning.

One use case is ingesting drone data, as it is analyzing various hard infrastructure. This data consists of images, GPS, and metadata encoded inside the images and often additional sensors for LIDAR, temperature, air pressure, and humidity. For a detailed solution, see my talk at a recent Oracle conference and this meetup presentation. For one part of the flow using Apache NiFi, I analyze the image with TensorFlow image recognition. This could be expanded to use im2txt from TensorFlow, which will produce a nice paragraph of the image. This is good for reporting and for anomaly detection.  

Image title

Another use case where the data can be integrated and correlated via location is Twitter and social media data. This data is often just used for sentiment analysis. For utilities, it can also be used for real-time alerting and replying to reported outages. Utilizing machine learning, you can determine how confident you should be in the tweet's validity, as there is a lot of noise, liars, fakers, bots, and garbage in social media streams.

One method I recommend is using supervised machine learning to create white, gray, and black lists of tweeters. The black list would be blocked and any tweets from them ignored, filtered out, or blocked entirely. The white list is high-priority tweeters such as first responders, government agencies, professional news media, and other reputable sources. Gray is for normal people who fit the profile, location, and characteristics of a legitimate reporter of an issue. You can also match social media user IDs and information with internal customer data to see if your customers are reporting an issue for themselves and send this right to customer service. I would recommend all utilities add social media account information to their billing/status portal profiles for all customers, including corporate and home.

A third use case is ingesting sensor data directly from edge devices utilizing Apache MiniFi.

In HDF 3, there are several features that enable all of these use cases and more for utilities and all enterprises.   

  • HDF 3 Apache NiFi 1.2 supports running queries on live data streams for easy filtering.

  • HDF 3 Streaming Analytics Manager supports real-time streaming with live queries and stream joins for complex event processing.

  • HDF 3 Schema Registry allows for easy conversion between types and record manipulations of thousands of different types of data with different evolving schemas without the long cycle of code and deploy.

Image title

By utilizing these techniques, you can have a flexible, agile, real-time streaming Big Data solution that does not require laborious, error-prone, hand-coding, and manual deploy cycles that lead to delays and issues. You can now visually develop streaming microservices utilizing this next-generation streaming platform. Utilities will be leaping past the manual coding of big data in MapReduce, Spark, and other second-generation tools.

Big data hadoop

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • How to Submit a Post to DZone
  • DZone's Article Submission Guidelines
  • Refactoring Java Application: Object-Oriented And Functional Approaches
  • How to Generate Fake Test Data

Comments

Big Data Partner Resources

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • MVB Program
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends:

DZone.com is powered by 

AnswerHub logo