Big Data Use Cases

DZone 's Guide to

Big Data Use Cases

The most frequent industries are financial services, retail, and healthcare while the most frequent applications are security and 360-degree view of the customer.

· Big Data Zone ·
Free Resource

To understand the current and future state of big data, we spoke to 31 IT executives from 28 organizations. We asked them, "What are a couple of big data use cases you’d like to highlight? What is the business problem being solved?" Here's what they told us:


  • Healthcare master data management (MDM) systems with a large appointment migration from one EMR NextGen to another, for Cerner. Using MDM to plug into both EMRs and move the data. They are doing this in real-time. Able to see the movement in real-time and identify any issues before they become issues with the patient. 
  • Expert Finder – mining available content, think documents, email, to identify SMEs and not rely on the author of the template. What they said, how they are collaborating with other people. Uncover experts dynamically. Preserves the evidence. Drug industries, pharma, drug repurposing for big pharma. A law firm has 21 offices around the world accelerating the creation of teams and ability to litigate. 360-degree view of a product you are building the bill of materials, supply chain data, production. Fraud detection – circuits and connections otherwise hidden.
  • Ensco Cancer Link independent cancer clinics around the US integrate data to run analytics. Billions of records all needed in one place that are able to work with all of the data at one time to run advanced math. Security is critical.
  • GSK supports advanced analytics around clinical trials and drug discovery. Moving data around to analysts. Leveraging capabilities around parameterization, APIs as we move toward self-service. Give data scientist access to the data they need. Ring Central is doing interesting things around UX on the platform as well as fraudulent activity. Ingest as fast as possible to understand the real-time customer experience. Anomalous patterns to get in front of denial of service attacks.
  • Fortune 25 customers with massive financial and insurance data help manage large data and manage highly available and accessible from any node, anywhere.
  • 1) Financial services because there’s money involved. Large credit card customers for fraud detection with 2.0 millisecond response time. 2) Large German retailer supermarket uses to track transactions, where everything that happens in the store is considered a transaction. Determine the entire COGS – broader perspective to see the full impact of what is transpiring. 3) Healthcare Liaison Technologies treat medical records as an immutable steam. Go into affiliate and from the second you walk into the building an immutable, electronic stream determines where you are and what you are doing via streaming technology.
  • Allstate is using natural language generation to train 10,000 agents nationwide. These agents are managed by 250 field sales managers who are managed by territory managers above that. We provide 14 dashboards customized for each person. Adding role-based narrative with agents. Narratives allow us to pull insights relevant to that reader. Here are the five things you need to know, here’s what to do, here’s the data. Communicate at different levels around the same dashboard. Real-time to insight, prescriptive insights.
  • 1) Bank in Europe with a data lake implemented a tag-based infrastructure to automatically tag and secure their data, establish access policies, as well as data quality rules. Automatically get tags and apply rules to the data so it can be visualized by the business analyst – governed, controlled, and secured. 2) McDonalds had a data lake with inventory data. They wanted to reduce their inventory, so they could serve more fresh food. Data from millions of suppliers was profiled and made available for analysts and data inventory software.
  • 1) One fun example is with the European Space Agency. They use our technology in the Gaia mission to store an enormous amount of data about the Milky Way. The mission aims at creating a three-dimensional map by monitoring a billion stars, hundreds of thousands of new celestial objects, and asteroids over a five-year period. Gaia collects and stores data on each star – including location, distance, changes in brightness, and other information – about 70 times per star, and stores this enormous amounts of data for a variety of analytic processing. The most important elements for this use case are super-fast data ingestion, reliable storage and transactions, and the ability to do analysis over trillions of records. 2) Another powerful use case we serve for multiple customers is in electronic trading. Increasing trade volumes and periods of high market volatility are creating significant technology challenges for financial services. Sell-side firms, in particular, can experience extremely high transaction volumes, since they partition already high volumes of incoming orders into even more child orders for execution. At the same time, they must support a high number of real-time, concurrent analytic queries to provide risk management, surveillance, order status, and other information for internal and external clients. This requirement for multi-workload processing at very high scale with the highest levels of performance and reliability has traditionally been difficult and expensive to satisfy. This is a big data use case for sure and we’re quite proud of how well we’ve been able to handle it and serve demanding customers with a uniquely powerful solution.


  • 1) Fraud is big, anomaly detection in healthcare, financial services. 2) Customer 360, the more you know about a customer the more trends you can identify. Reduce waste of fraud in insurance payer industries. Patterns defined through the graph and look for new patterns. 3) Improving the cost of care of service similarities in how treatment is being done or person looking for new insurance – better recommendations. See use cases explode. Last five years collected data, now a knowledge graph applied to the use cases to find patterns and apply pattern matching.
  • We provide managed security covering the threat kill chain end to end. We identify and prevent attacks against your business. Preventing and reacting to events is a responsibility with real business impacts – such as downtime, data loss, and sensitive and protected data theft. Mid-sized business cannot afford to provide effective multi-layer defense security environments. Staffing and operating an internal SOC and Incident Response Team is cost prohibitive, but we reduce cost, simplify security management, improve threat management, decrease vulnerabilities and help achieve regulatory compliance. Additionally, most IT departments spend over 50 percent of their time on repetitive maintenance tasks. When you spend too much time fighting fires, herding cats, and keeping the lights on, it’s time to start thinking differently. When automation tools are applied to big data sets, tasks that took weeks can now be delivered in about an hour.
  • 1) GDPR in Europe looks at documents, builds categories, and identifies as GDPR-sensitive data. Look at the universe of GDPR documents and then search for the name of the person that wants to be deleted. 2) Entry-level digital hoarding. Clean up digital hoarding in organizations with catalogs, last access dates, categorize. 3) Security because categorized information tags based on security DLP engines can look at security classification tags to determine how to protect information. AIP rights management tags for rights usage. Use security tags to manage access rights.
  • We use IBM Spark to process billable transaction logs for our customer invoicing. 
  • Periodic look back at existing customers to see trends in risk – missed payments, address payments – the benefit of doing more often rather than wait until someone defaults can be more proactive in reaching out to customers. Process huge amounts of data without impacting the production system.
  • Bank security analytics, volume, and variety of data were not attributes that were cost-effective or performant for what they have. We preprocess the data before it goes in the simulation. Able to bring data together in one place and then feed into a single. Security operations as an overwrapping of integration and analysis with the different tools available.
  • Aero di Roma and AMG Petrobras motorsport are both using our enterprise big data platform. Aero diRoma is using AI-driven insight to optimize airport operations. AMG Petrobras gets 18,000 panels of data per car per race. They analyze the data for smart set-up of the F1 car for the next race, tires, passing strategy, in-time race decisions. AMG just won their fifth F1 drivers’ championship. 
  • Cassandra is an active multi-data center support of a single database with nodes in different regions all over the world for low-latency services with every continent in the globe – financial services. 
  • A recommendation engine using the Spark Collaborative Filtering. Recommendation engines have become a very popular use case for the application of Spark. You see recommendation engines everywhere nowadays, such as when you get those lists of product suggestions “you might also like.” If the recommendation engine has been trained properly, you will get a list of useful hints and recommendations. In this case, you get something you want/like, and the company sells you one more product. It is a win-win situation. Other popular use cases involve demand prediction, customer intelligence, and security. 1) Demand prediction tries to predict the number of requests for a certain offer, be it taxi transportation or electrical energy kWh. Quantifying the exact future demand helps with today’s planning. 2) Customer intelligence is a collection of many different, smaller predictive and descriptive use cases. Predictive use cases try to forecast the future behavior of the customers, whether the customer will churn or refer a new customer; descriptive use cases try to describe the customers from many different points of view. This is why we talk about a 360-degree view of the customer, encompassing loyalty, buying power, shopping habits, satisfaction, and more. 3) Security use cases are also abundant. A classic one is in fraud detection of credit card transactions. Detecting fraudulent usage of credit cards can potentially save a considerable amount of money. 
  • 1) Internal use cases, build our own system, take all data sources CRM, logs, third-party. Use own to build customer-360 and do ML on to score leads, segment, and account scoring. 2) Logs from products ML customer configurations that aren’t optimal and provide a recommendation to improve performance. 3) Metastream Rush University Medical Center 7 mm records in 1.5 days to search through patient information. 10 years of clinical data. 3) Komatsu mining equipment – challenges for equipment to dig deeper with environmental concerns. Predictive analysis of how to make equipment better, heartier, more efficient, meet requirements. Redesign gearbox to improve delivery. 
  • 1) Understanding dozens of systems. It takes time to build a data warehouse with new data emerging in new systems not already integrated into the data warehouse. Able to build different data sets together and quickly to provide a 360-view of the customer. 2) The challenge of making data in the data lake more accessible to a wider group of people and integrating with other systems on the fly. 3) Data science where the quality of data is critical to effective models and eliminating bias. Helps data engineers become more adaptive and fast moving with data from different systems. 
  • Computer Vision adds human video, and the camera collects image data on humans make decisions about objects in the picture. Get information from images naturally. Add to visual data targets for you're what trying to learn and provide the right answers. Used to train the ML algorithm. Autonomous vehicles and learning to self-drive. Analyzing drone footage flying over crops or forests. 
  • Here are a few use cases that are focused on the real-time analytics aspect of big data: 1) Real-Time Personalization – Personalizing web pages for media based on preference and changing titles based on real-time feedback.  2) Real-Time A/B Testing and Offers– In mobile gaming, testing new features, making the game easier for new users and sharing offers at the right time all need real-time analytics and decision making. 3) Fraud Prevention for Mobile Roaming and Credit Cards  Detecting credit card fraud through Machine Learning-based rules that get enacted in real time on an incoming stream of credit card events. 4) Compliance Management on Trade Data  Regulatory compliance for traditional and bitcoin exchanges to make sure the risk for/against a security trade is within control. 5) Dynamic Policy and Charging and Rules Function (PCRF) for Telcos – All customer phone calls need to be validated against a dynamic set of policies for proper operations and billing. 6) Telecom Application Servers (TAS) – Customer applications that allow users to effectively manage their phone and data plans by restricting usage or enabling features based on real-time analytics (and not month end billing analytics). 
  • 1) In the financial services verticals, new regulations have imposed real-time reporting requirements on many companies. For example, the Fundamental Review of the Trading Book (FRTB) regulations require financial services companies to calculate their portfolio value and risk exposure in real-time. To enable this, many companies are moving to in-memory computing which can provide hybrid transactional/analytical processing (HTAP) capabilities which means that companies can both record transactions in the in-memory layer as well as run the required regulatory calculations in real-time on their operational dataset. 2) In the online services space, many online travel websites respond to travel inquiries by site visitors by pulling price and availability data from multiple online sources such as hotel and airline sites. Margins are then calculated, results are sorted, and then the available options are displayed for the site visitor. Many online travel companies use in-memory computing solutions to meet their speed and scalability requirements and produce results quickly enough to keep visitors satisfied with their website’s responsiveness. 3) In the transportation and logistics space, major airlines have to respond to each change in flight status by recalculating the impact on variables such as airplane and crew availability, passenger connections, luggage handling, and gate availability in real-time. A single delayed flight may impact flights and passengers throughout the plane's network. Many major airlines and logistics companies use in-memory computing solutions to maintain the relevant data in RAM and recalculate the impact of each change on the entire chain in real-time and drive real-time responses to minimize the impact of the change. 
  • The most common Big Data analytical use cases that we see are 1) Communication and network analytics to optimize network performance and capacity planning so that businesses can lower infrastructure costs and prevent disruption of service. 2) Customer behavior analytics to better understand and engage with customers so that businesses can predict and prevent churn and gain a 360-degree of the business to uplift sales and increase Net Promoter Scores. 3) Fraud monitoring and risk management to identify, detect, and prevent fraudulent behavior and understand and manage risk for regulatory compliance. 4) IoT analytics to tap the business potential of a range of use cases from predictive maintenance to vehicle telematics to smart buildings to smart metering and more, so that companies can reduce service costs, differentiate based on connected products and generate entirely new revenue sources based on the value of the data. 
  • 1) A $10B legacy media organization automated the consolidation of IT asset data dispersed across hundreds of silos to enable 360° analytics. What previously required too much manpower, technical expertise, cost, and time (years) to execute was accomplished in a matter of days. 2) A fortune 10 retailer migrated a complex, ML, near real-time business process that synchronizes with Teradata every 10 minutes with a 15-minute data-availability SLA. A 6-month project that two engineers implemented in 19 days. 3) A large oil and gas company implemented a predictive analytics solution exploration in just two weeks. The solution streams data from oil wells in near real-time, integrating with historical data to deliver an up-to-the-minute drilling dashboard with preventive maintenance analytics. 4) A fortune 100 retailer enables a highly flexible, data-driven loyalty program that required management of data across three environments. The company is able to migrate a large amount of historical legacy data from Azure to Google while streaming in new data from other on-premises systems like Oracle Apps. The customer was able to combine all this disparate, complex data into a data lake and then move it out to Google Cloud where it can conduct ongoing analysis to upsell and cross-sell loyal customers on Google Big Query. 
  • One use case that has the highest positive ROI impact for our clients is improving acquisition program performance through customer modeling. A primary business objective of clients is to acquire customers at an established cost per acquisition. In many cases, prior to coming to us, clients will have achieved their acquisition volume and cost goals but experience high churn rates from those recently acquired customers. We work with these clients to analyze their customer data to determine MVC (most valuable customer) and LTV (lifetime value) models. We then apply these models to our prospect data and modeling to reach and convert new customers that are also profitable. The result is a decrease in the client’s CPA, while also decreasing churn rates as our clients are acquiring higher quality/profitable new customers.


  • One of the hottest areas for fast distributed transactional databases right now is in the area of distributed ledgers and the related subcategory of the blockchain. How does that relate to big data? Again, big data isn’t confined to analytics — companies have a lot of customer data they are using for interactions with those customers and every time you call them or interact with them online, data is changing, i.e. it is being updated. Or at least it should be. How many times have you called your bank or internet service provider and realized they didn’t have any information on the last complaint you logged online or transaction you had at their local office? Those are recorded somewhere, but they aren’t made consistent across all their different data repositories for immediate benefit and action. There is a strong need for modern enterprises to move away from centralized ledgers (systems of record) toward using distributed ledgers across the world being driven by the economic advantages of distributed computing across low-cost commodity servers, whether provided by public cloud platforms or through a private cloud infrastructure. 

Here’s who we spoke to:

big data, big data use cases, data analysis, real-time data streaming

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}