Real-time Data Analytics: The Way Forward
Real-time Data Analytics: The Way Forward
An introduction to the usefulness and use cases of real-time analytics.
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
Even as the business world is still grappling with the issue of how to make the most out of the tons of data being generated from various customer touch-points of businesses, and the transactions and interactions thereof, we’re seeing the dawn of another aspect of data – real-time streaming data — that may potentially hold as much value as stored and historic data is expected to, if not more.
While some who haven’t made use of big-data analytics yet would say that it’s a vision without reality — owing to the fewer number of proven use-cases wheren the best results are manifest, and also the fact that there’s a lot to be done before we see some real value — real-time streaming data seems to holds a lot of promise for business and people in-general.
Use Cases of Real-time Analytics
There are many industries and activities that would be greatly helped by stream processing in real-time. The situation is very quickly changing from being that of managing data that was already generated, perhaps quite some time ago, to acting on continuous data streams. If businesses could act as fast as the data streaming in from various sources such as sensors (GPS, Temperature), cameras, news feeds, satellites, stock tickers, web crawl, server logs, Flume, Twitter, traditional databases, and even Hadoop systems, then it could prove to be a business decision-making tool that would enable performance enhancement of business.
Having said that, each industry would have unique use cases of real-time data analysis allowing the management or the personnel responsible to act at a moment’s notice when risks and opportunities are spotted from the patterns arising out of the steaming data. Real-time data analytics have the potential to effect optimized decisions, facilitate quicker response(s) to critical events and, in general, extracting knowledge leading to vastly improved business insights. What’s more, as most of the insights arrived at are likely to be actionable primarily owing to them captured in real-time, all stake-holders can see through what’s happening with the business, as it happens.
- Healthcare and Life Sciences:
- ICU Monitoring – Effective supervision allowing for proactive and timely attention for patients in critical care.
- Remote healthcare monitoring — It would also be very effective in remote healthcare monitoring, allowing easy access and insight into the medical history of patients, thereby making sure they get proper and quality treatment on time, thus also reducing unnecessary costs.
- Clinical trials and medical device data — Instrument data stream analysis could point to unusual or disconcerting behavior or reveal aspects hereto unknown that could be used for better diagnosis and treatment thereof. In other words, it would also help in detection of early signs of disease, identification of correlations among multiple patients, and measuring the efficacy of treatments given.
- Epidemic early warning system — Sensor data analysis in real-time could help detect likely outbreak of epidemics to act as early warning systems helping in prevention and preparedness.
- Fraud detection
- Better case management
- Susceptibility check during policy enrollment
- Policy performance evaluation
- Better predicting future events and, designing, developing, and offering products accordingly
- Segmentation for a proper product-price mix
- Sales forecasting analysis based on current sales trends
- Due-diligence of prospects in real-time to cull-out undesired insured
- Telecom Providers can gain insights into customer behavior to offer customized and relevant services (e.g. location-based services, offers, or recommendations, etc.) to their customers and achieve loyalty and patronage through analysis of usage, preferences, trends, etc., by tracking mobile sessions. This helps in achieving a host of other things such as improved billing, quality of service delivered, security, and fraud prevention.
- A proven use case of real-time data analytics in the energy sector is the Smart Grid. More use cases would evolve in due time, making for a very energy-efficient world.
- Predicting the behavior of a device under specific set of conditions
- Detecting threshold levels of many devices in order to mitigate the impact of fault (conditions)
- Risk mitigation for personnel deployed at sites through real-time analysis of exploration and production data
- Customer profiling
- Social Media sentiment analysis for damage control or course-correction, if need be.
- Real-time tracking through GPS
- Intelligent traffic management to ease congestions in busy routes during peak hours
- Instant and automatic telematics to enable connected vehicles
- Speculation Market
- Sentiment analysis
- Momentum calculator
- Impact of weather on bourses & stock prices
- Market Data analysis of ultra-low latency periods.
- Law Enforcement
- Smart Policing (Sensors, CCTV cameras installed and linked to the centralized cloud data base, License-plate recognition, voice-recognition, GPS tracking of known suspects and offenders, etc.,)
- Surveillance for spotting unusual activity or behavior or incidents to enable faster and timely decision-making in order to both prevent as well as reduce criminal incidents.
- Criminal investigation
- Monitoring interactions between law enforcement and the public
- Website traffic analysis and engagement (most viewed pages, max time spent page, visitor behavior, user navigation patterns, etc.)
- Mobile applications – Downloads, sessions, preferences, transactions, usage patterns, etc., for effective customer profiling and designing services to better serve them
Challenges of Real-time Data Processing
Handling the velocity of streaming is a herculean task as the data generated by real-time events / transactions / interactions streams in at the rate of millions of events per second and needs to be collected by the system. Even as the data is being collected, the system should be robust and capable enough to process the data parallely. Then comes the complex task of performing event correlation to be able to extract meaningful information from the data. As if that is not daunting already, all this needs to happen in a fault tolerant and distributed manner — meaning the system employed for the purpose should be a low latency one enabling faster computation facilitating near real-time responses for events.
How to Go About It
To address this complex requirement, a combination of tools have to be employed. The Apache Kafka for collecting the streaming in data, routing to HIVE/HDFS through Apache Storm or Apache Spark (based on the need of the system), and then allowing the analytics engine to extract insights and send them to the dashboard.
How It Can Be Done Simply
The data coming from any source including Web Crawl Data, Sensor Data (GPS, Temperature sensor), Server Logs, Flume, or Twitter, is collected and stored temporarily in the Kafka Cluster where the Zoo-Keeper->Broker->Topic mechanism is performed. Then it is sent through either Storm or Spark Streaming to HIVE/HDFS which is further sent for processing to the analytics engine (E.g. SAS VA) – all happening in real-time and pushed to the dashboard for users to take stock of things and act.
Eventually, to be able to get the best out of any real-time data analysis, one would imagine businesses definitely need able hands. Proven and demonstrated ability in big data is a starting point for choosing your analytics partner. Then comes the partner’s proficiency with the tools, technologies, and mechanisms involved in making real-time data analytics truly effective. Finally, the partner’s wide exposure to various industry verticals should reflect a deep understanding of the respective businesses and the intricacies thereof.
Therefore, choosing a partner with these attributes would be like having half of the job done. Leave the other half for the technology partner to take care of!
Summary: For many businesses with a multitude of touch points with end users, and state organizations that have been entrusted with the responsibility of public welfare and safety, every passing second is a defining moment. Their efforts will be all the more productive if they had the instant knowledge of what’s happening and also what is likely to happen. Real time data analytics give them an option to have that power, and much more!
Opinions expressed by DZone contributors are their own.