Hadoop Opens New Doors for Big Data in the Retail Industry
Hadoop Opens New Doors for Big Data in the Retail Industry
Over the past few years, global retailers have been trying to capitalize off of big data. Many of their efforts have been delayed, due to limitations with their big data analytics infrastructure. Enter Hadoop to save the day!
Join the DZone community and get the full member experience.Join For Free
Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.
Over the past few years, global retailers have been trying to capitalize off of big data. Many of their efforts have been delayed, due to limitations with their big data analytics infrastructure.
Hadoop has opened new doors for these retailers. It will solve many of the big data challenges they have faced over the last few years.
Hadoop: a Multi-Language Solution to Bridge Gaps in Big Data
The technology behind Hadoop was originally developed by Google about 10 years ago. The core code is primarily written in Java, but some elements were written in C. However, it runs in a programming model known as MapReduce, which allows developers to create new Hadoop code in other languages.
Since the MapReduce environment can accept code from different programming languages, it is very versatile. It can extract, analyze and manipulate big data from many different sources. It uses a variety of algorithms for associative rule learning, clustering, classification and regression. These algorithms rely on a variety of functions, including Naïve Bayes, Expectation Maximization and FP-Growth methods.
Mike Olson, the CEO of Cloudera, told O’Reily Media that Hadoop is still in its infancy, but it is already shaping the way the retail and financial sectors use big data.
“The Hadoop platform was designed to solve problems where you have a lot of data — perhaps a mixture of complex and structured data — and it doesn’t fit nicely into tables. It’s for situations where you want to run analytics that are deep and computationally extensive, like clustering and targeting… In online retail, if you want to deliver better search answers to your customers so they’re more likely to buy the thing you show them, that sort of problem is well addressed by the platform Google built.”
Aashish Chandra, divisional vice president of Sears Holding Company, has said that Hadoop has helped the company reduce operating costs and boost sales considerably. Chandra said that previous big data extraction tools lacked the functionality they needed.
Mining Point-of-Sale (POS) Big Data With Hadoop
Point-of-sale data plays a very important role in the retail industry. Companies rely on big data from point-of-sale purchase to forecast future sales, manage inventory and project staffing needs.
There are many point-of-sale tools that aggregate sales information and store them in big data setes . However, retailers have had difficulty mining big data from PoS with conventional tools, even when it is stored in a SQL database.
Hadoop makes it much easier for retailers to access information from the central customer database. This data can be converted into a new format and merged with data sets in other files.
John Soto of New Horizons CLC claims that Hadoop is a major game changer for the retail industry.
“Large retailers could never have done this new analysis with its legacy big data
infrastructure. It would have been too expensive to store so much historical
data, and the new data is complex and needs considerable preparation
to allow it to be combined with the PoS transactions. Hadoop solves both
problems, and runs much more sophisticated analyses than were possible in
the older system.”
What Big Data and Predictive Analytics Challenges Does Hadoop Address for Retailers?
Hadoop has eliminated some of the barriers retailers faced in their quest to utilize big data. Here are some of the benefits the technology has brought to the table:
- Superior data mining capabilities. Many retailers store many terabytes of data. These data sets are often difficult to extract, because they are so deeply nested. Hadoop has very sophisticated indexing algorithms, so it can extract data that was previously unattainable for many big data applications.
- Compatible with different data formats. Retailers store data in many different formats. Internal financial data is often stored in .csv files. Retailers have struggled to conduct audits, because they can’t compare data from structured and unstructured data sets. Hadoop can extract data in multiple formats, conduct an analysis and present it in a more cohesive format. It enables big data analytics experts to look for correlations between data sets from multiple sources.
Retailers have already found numerous benefits of using Hadoop:
- Staples uses Hadoop to analyze big data and forecast future sales, which helps them allocate resources to personnel and inventory more efficiently. Staples has reportedly reduced their promotional costs by 25% since using Hadoop.
- Amazon has used Hadoop to improve its fraud detection models. They have reportedly reduced credit card fraud by 50%, because they can identify red flags more easily.
- Brands have much more detailed information on their customers, which has helped them improve their marketing strategies. Retailers that use Hadoop and predictive analytics have increased their sales by 73%.
Retailers are only beginning to recognize the potential of Hadoop and big data. According to DeZyre, one of the biggest advantages of Hadoop is that it helps retailers identify and address challenges in real time. It will be particularly important for fraud prevention, since criminals are always developing new scams.
“Manipulators are always inventing novel tools and technologies for fraud, and retailers must employ retail analytics to identify fraudulent activities and prevent them before they take place. With a swarm of big data technologies like Hadoop, MapReduce and Spark it is possible to perform analysis on more than 50 Petabytes of data to accurately predict the risks which was previously impossible.”
Opinions expressed by DZone contributors are their own.