Big Data Use Cases

DZone 's Guide to

Big Data Use Cases

Big data is being successfully implemented in at least ten verticals, including legal, retail, financial management, fraud detection, healthcare, and more.

· Big Data Zone ·
Free Resource

To gather insights on the state of big data in 2018, we talked to 22 executives from 21 companies who are helping clients manage and optimize their data to drive business value. We asked them, "What are real-world problems you, or your clients, are solving with data?"

Here's what they told us:


  • We are archiving voicemail for a large insurance company in the U.K. We scan as we ingest against their policies for CSRs and regulatory requirements. They used to monitor one in 28 calls. We monitor 100% and use AI/ML to flag conversations that fail to meet performance or regulatory standards. In legal, we are capturing emails, documents, and contracts, and using ML to search for e-discovery. A client may have 100,000 files to categorize, highlight keywords or phrases, and flag those that meet a specific requirement. These 100,000 files used to be reviewed by ten paralegals over the course of a month. Today, we’re using ML to understand how we think, pre-categorize the 100,000 documents, and use predictive analytics to go through all of the documents in about an hour. We streamline processes so we are using people as efficiently as possible. 
  • A retailer was running an old IBM DB2 on-premise and moved to microservices on Cassandra and GCP. They could not go to production without the backup and recovery we provided. An Internet dining site was running an on-premise database to reduce cost and complexity. Cloud-based managed service for backup was very expensive. We provided backup on premise.  
  • financial management client has data going back to the 70s that they want to use for financial modeling using MapR on a Hadoop Cluster that runs on Luster. 
  • Technical, i.e. data lake enablement and data warehouse re-platforming. Broad connectivity. An accelerator for the data warehouse. Some are cloud-oriented. Serve as a streaming enabler to migrate data to the cloud. IoT at the energy, oil, and gas wellhead. Retail brick-and-mortar malls trying to attract tenants by providing information of who’s shopping at their malls and what the patterns are. Manufacturing is pushing out data ingestion to the edge for real-time analytics with fog computing and ML decisioning. Cybersecurity with rapid pace forensics detecting what happened, so that you can build more robust models more quickly, as well as anomaly detection. 
  • European retailer looking to transform the online CX customized based on real-time information on orders and inventory. Collapses many silos to have real-time responses. IoT with the connected car. Fabric to individual cars processing in the trunk with data shared regionally for traffic and globally for performance. 3Oil and gas, medical equipment, and manufacturing; learning globally so we can inject intelligence back out to the edge to act locally. 
  • Data with sensitive time value for financialfraud detection. Spark within main data grid to provide the organization with a real-time data pipeline for anomalies and fraud. Finance using medium data — structured and transactional consistency with analytic real-time risk results store. Ingest market data from many sources. Transportation; analysis and simulation of railways with IoT do correlation and live information from the field and preventive maintenance. 
  • A chip manufacturer has many different systems and relational databases. We made it so more than 1,000 non-technical users could leverage data. A financial services company wanted to provide access to data anywhere. A Hadoop user with thousands of end users and lots of applications on many databases. Enabled employees to access a lot of data from different parts of the bank empowering a broader community of users to be independent. Mercedes is collecting data from its self-driving vehicles to inform ML models. A large cloud provider in Europe with more than one million customers and 200,000 servers is tracking login data and using for predictive maintenance. 
  • Apache Flink processes data in real-time and can be applied to unbounded datasets to address a variety of real-world problems and use cases, including optimization of e-commerce search results in real-time. Alibaba’s search infrastructure team uses Flink to update product detail and inventory information in real-time, improving relevance for users. The business impact that Flink delivers is substantial: Alibaba saw a 30% increase in conversion during Singles Day in China on Nov. 11, 2016, which generated $17.8 billion worth of gross merchandise volume (GMV) in one day. Alibaba is running Flink on many thousands of machines in production.
    • Stream processing-as-a-service for data science teams: King (the creators of Candy Crush Saga) makes real-time analytics available to its data scientists via a Flink-powered internal platform, dramatically shortening the time to insights from game data. Flink powers streaming jobs that consume more than 40 billion events per day, maintaining over 10 TBs of user state with strong consistency guarantees that Flink provides. Uber and Netflix have also built Flink-powered stream processing-as-a-service platforms for their organizations.
    • Fraud and anomaly detection: ING uses Flink to deploy new fraud detection machine learning models with zero downtime, ensuring that its detection methods are up-to-date and that its system is running 24/7.
    • ETL for business intelligence infrastructure: Zalando uses Flink to transform data for easier loading into its data warehouse, converting complex payloads into relatively simple ones, and ensuring that analytics end users have faster access to data. Zalando also uses Flink for business process monitoring, automatically detecting customer shipments that are delayed and alerting logistics teams when appropriate. 
  • We have clients in many industries, ranging from financial services to healthcare, and the use cases for natural language generation (NLG) are diverse. Overall, they can all be connected by using our Wordsmith NLG engine to help solve the problem of having tons of data, but either underutilizing it or limiting its abilities to change business actions by requiring a certain level of data expertise. 
  • We provide a data hub for financial services companies, as well as McKesson, to help them refine data from telemetry or products (vibration and heat sensors) to refine and share with their data warehouse systems for business service opportunities. We provide the ability to understand the data landscape. 
  • We improve performance in real-time use cases like disease outbreaks. We work with Oxford University Medical helping with clinical trials of 500,000 patients with 5,000 data points and track response in real time. Fraud detection analytics for credit cards before processing. We facilitate trading for eBay and financial services clients who need decisions on purchasing in less than ten milliseconds on millions of records. 
  • Know your customer. We’re able to provide an understanding of the financial situation of each customer and automatically tailor the recommendation for their next best offer. We are the basis of recommendation systems that transform the financial services industry.
    • A client builds hard hats with electronic sensors for geo-fencing for safety and detectors for carbon monoxide. Real-time stream processing for actionable alerts. 10,000 sources of data with many dimensions. We solve so many distinct problems that it would be very hard to enumerate in this conversation, but some of the most interesting are around:
      • Identifying fraud that relies on collusion to avoid detection (our ability to build and manipulate social graphs with the open-source platform makes these problems easily tractable).
      • Those that build risk scores considering hundreds or thousands of linear and non-linear features (think of, for example, driving behavior or risk of trips on autonomous cars)
      • Prediction problems when you are trying to foresee with a certain degree of confidence if something will or will not happen in the future.
    • Several of our applications are progressively requiring real-time updates and analyzing data while streaming. It has also become an interesting challenge that the open-source platform is well-equipped to address. 
  • We help operational professionals successfully deal with time constraints. Through bi-temporal indexing and analytics, along with a temporal data structure, we help businesses to understand what is happening at the moment, what has just occurred, and what can potentially happen in the future. By navigating naturally in the time continuum, they can discover and understand precursory warning signals.
    • In the financial services area, we help customers to improve payments processing, by using analytics to meet customer demand for fast digital payments, identify unusual processing issues such as the risk of missing cut-off times or decreases in response times, empower payment operations staff to take corrective actions, such as injecting funds, before the business and customers are impacted, avoid penalties and interest claims.
    • In the retail iIndustry, we help improve visibility into omnichannel order fulfillment, with analytics dashboards to meet customer expectations for a coherent, seamless experience across digital and physical sales and delivery channels; proactively identify and resolve process issues prior to impacting business and customers, track the overall process of order fulfillment by capturing, counting, evaluating, and measuring against service level agreements (SLAs) — no matter the customer’s path to purchase, rigorously measure labor costs, productivity, and outcomes to gain business insight that will drive success for your digital business.
    • In healthcare, we improve efficiency and accuracy of claims processing with errors and delays reduction by providing end-to-end monitoring and traceability for all protected data exchanges and eliminating erroneous billing through real-time eligibility, claims status, electronic fund transfer (EFT), healthcare payment, and remittance advice (ERA), effectively onboarding and managing trading partner community to reduce costs and improve cash-flow management through integrated care pathways and paperless flow. 
  • Predictive maintenance of data centers to meet SLAs, ensuring accurate telecom billing to eliminate loss of revenue in major U.S. and European CSPs, hyper-personalization of in-game player experience in mobile games, or detecting and preventing DDoS attacks at a leading ISP in Japan are some examples of our customers. One problem that we are particularly proud of having solved is with our customer White Ops, who detected and stopped the Methbot operation that was defrauding advertisers of millions of dollars per day.


  • Hybrid cloud and on-prem provide a consistent way to monitor two sets of data via an API in a central location to use SQL skills to ensure the consistency of the data.
  • Clients who move to stream datasets analytics use complex large datasets to ingest trillions of data points in real-time. Break apart large datasets into subsections. Varies by industry. In marketing, use responses to an ad campaign for market segmentation at scale. Also, A/B test website logs to optimize performance.
  • Communicate data to constituents and stakeholders. We help clients go from 30 to 40 page PDFs with tools to create engaging web content. 
  • More targeted sales and marketing activities. Improved customer satisfaction through proactive churn detection. Reduced fraud through more precise detection of fraudulent activity. Improved compliance with regulations through proactive identification of non-compliant activity. 
  • Machine learning/deep learning/advanced analytics for trading, anti-fraud, customer success, and health care
  • Data warehouses are not for managing the larger data that’s required for machine learning (ML) and deep learning (DL). Data warehouses are where BI tools run, but if it’s too slow, there will be a problem fulfilling the expectations of the C-suite. Accelerated analytics, advanced analytics with geospatial (IoT) and ML. High-speed ingestion with real-time analytics is needed in healthcare and retail right now. Ultimately, it will be needed in every industry. Business analytics is in a silo and data scientists are in a silo and they need to be integrated into a single place bringing AI to BI.

Here’s who we spoke to:

  • Emma McGrattan, S.V.P. of Engineering, Actian
  • Neena Pemmaraju, VP, Products, Alluxio, Inc.
  • Tibi Popp, Co-founder and CTO, Archive360
  • Laura Pressman, Marketing Manager, Automated Insights
  • Sébastien Vugier, SVP, Ecosystem Engagement and Vertical Solutions, Axway
  • Kostas Tzoumas, Co-founder and CEO, Data Artisans
  • Shehan Akmeemana, CTO, Data Dynamics
  • Peter Smails, V.P. of Marketing and Business Development, Datos IO
  • Tomer Shiran, Founder and CEO and Kelly Stirman, CMO, Dremio
  • Ali Hodroj, Vice President Products and Strategy, GigaSpaces
  • Flavio Villanustre, CISO and V.P. of Technology, HPCC Systems
  • Fangjin Yang, Co-founder and CEO, Imply
  • Murthy Mathiprakasam, Director of Product Marketing, Informatica
  • Iran Hutchinson, Product Manager and Big Data Analytics Software/Systems Architect, InterSystems
  • Dipti Borkar, V.P. of Products, Kinetica
  • Adnan Mahmud, Founder and CEO, LiveStories
  • Jack Norris, S.V.P. Data and Applications, MapR
  • Derek Smith, Co-founder and CEO, Naveego
  • Ken Tsai, Global V.P., Head of Cloud Platform and Data Management, SAP
  • Clarke Patterson, Head of Product Marketing, StreamSets
  • Seeta Somagani, Solutions Architect, VoltDB
  • Topics:
    big data, business intelligence, data analytics, data science, machine learning, predictive analytics, real-time data

    Opinions expressed by DZone contributors are their own.

    {{ parent.title || parent.header.title}}

    {{ parent.tldr }}

    {{ parent.urlSource.name }}