What is Data? And What Are You Doing?
Speaker: Russell Foltz-Smith from RFS Productions
Abstract: We all talk about and do business with “data”, but what exactly is data? Is it physical? Is it metaphysical? Is it “a thing” that can be studied unto itself or only in relation to the “stuff” data is about? For example, when a machine performs machine learning what exactly is it learning – just something about the data or about the phenomenon the data represents? While this line of questioning might seem a bit metaphysical, it’s a fundamental question with possible answers that might change how all of us create, collect, and use data. This talk is an interactive, highly visual exploration of the nature of data and its implications.
Disney/ABC Television Group’s Data Security Strategy with Ranger, Kerberos, and Knox
Speakers: Haribalan Raghupathy from eSage Group and Matt Olsen from Disney/ABC Television Group
As the need to store and process large volumes of heterogeneous data has increased, the adoption of a data lake as a centralized repository and a source of truth has risen. Data security has become a key component in deploying enterprise Hadoop platforms. Proper data access management and data protection should be in place to guard against unauthorized access. This presentation will provide an overview of how Disney/ABC Television Group with eSage Group, leveraged Apache Ranger, Ranger-Key Management Server (KMS), and HDFS Transparent Data Encryption as a part of the Hortonworks Data Platform to address vital security components such as authentication, authorization, audit, and data protection. Of course, no security talk in Hadoop is complete without Kerberos, so we will briefly cover Kerberos and how Knox can make the end user access experience better. We will also share some best practices, as well as some challenges we have experienced during these implementations.
A Data Lake and a Data Lab to Optimize Operations and Safety Within a Nuclear Fleet
Speakers: Marie-Luce Picard and Jean-Marc Rangod from EDF
As the world’s leading electricity company, EDF operates 58 nuclear plants in France. The maintenance policy of its generation fleet is optimized to ensure reliability and safety of equipment and systems (e.g. through better diagnosis) and strengthen competitiveness (by improved performance and availability). This policy is based on the analysis of data and documents so far stored in silos, and not always comprehensively analyzed. The use of Big Data Technology allows centralized, fast, and low-cost access to all this information in order to improve operations and maintenance businesses.
In this talk, we will present the Data Lake built to archive and analyze operating data coming from thousands of sensors, enriched by other data (chemical, test results, etc.). A model has been built on top of HBase in order to provide efficient access to time-series data through Phoenix queries or using specific GUIs or analytics tools. We will also present the added value of data science algorithms analyzing data from the entire fleet for predictive maintenance, or control of contractual agreements within the energy market. Finally, we will present the creation of a Data Lab structure in order to leverage the benefits of the Data Lake already in production.
Prescient Keeps Travelers Safe With Natural Language Processing and Geospatial Analytics
Speaker: Mike Bishop from Prescient
Analysis of traveler safety is a real-time, big data challenge in a constantly changing world, and timely insight can make the difference between safety and tragedy. What constitutes a legitimate threat to a person or company and when is the appropriate time to warn them about it? How is danger to an international traveler influenced by cultural differences? When can inference be used to safely fill gaps in sparse datasets? What insights from clandestine intelligence operations can benefit the private sector? Learn how one startup is leveraging Hadoop, NIFI, SAP HANA, and MongoDB to answer these questions and create a first-of-its-kind Traveler Safety capability.
The Industrial Internet: Big Data, Intelligent Machines, and a Smarter Workforce
Speaker: Uday Tennety from GE Digital
The Industrial Internet is transforming the way people and machines interact by using data and analytics in new ways to drive efficiency gains, accelerate productivity and achieve overall operational excellence. The advent of networked machines with embedded sensors and advanced analytics tools has greatly influenced the industrial ecosystem. Today, the Industrial Internet allows you to combine data from the equipment sensors, operational data, and analytics to deliver valuable new insights that were never before possible. The results of these powerful analytic insights can be revolutionary for your business by transforming your technological infrastructure, help reduce unplanned downtime, improve performance, and maximize profitability and efficiency. In this session, we will explore the forces driving the Industrial Internet and the kinds of business problems the Industrial Internet is solving for verticals such as Aviation, Transportation, Oil, and Gas. We will also explore the current state of this ecosystem while understanding its promise for the future.
Customer Journey – Sentiment Analysis for Fashion Retail
Speakers: Eric Thorsen from Hortonworks and Steve Howard from EXPRESS
Express is a specialty apparel and accessories retailer of women’s and men’s merchandise, targeting the 20 to 30 year old customer. The Company has over 30 years of experience offering a distinct combination of fashion and quality for multiple lifestyle occasions at an attractive value addressing fashion needs across work, casual, jeanswear, and going-out occasions. Steve Howard, Enterprise Architect for EXPRESS will explain how Hadoop has supported their journey with sentiment analysis, social listening, and improved customer loyalty.
How Macy’s Operationalized BI Insights On Hadoop and Came Out On Top
Speaker: Seetha Chakrapany from Macy’s
No matter what your industry, you face the dilemma of how to capitalize on the explosion of data across your business. With Hadoop, you can efficiently and effectively manage all your Big Data. But now, Analysts and business users are banging at your door for self-service BI access to all that valuable data in Hadoop. What do you do? In this session, Seetha Chakrapany, Director of Marketing Analytics, will share how Macy’s successfully kept the top spot as the largest US department store by innovating with interactive, self-service Business Intelligence directly on Hadoop.
Join to learn how they capitalized on Hortonworks as the central source for online data, including paid search and advertising, and how they let all their existing BI tools, including Tableau, Excel, and SAS access all that live Hadoop data. Hear how they, and you too, can ‘operationalize insight’ on Big Data to drive immediate, measurable ‘BI on Big Data’ ROI. In this session you will learn how:
You can deliver screaming fast Business Analytics on Hortonworks Hadoop
You can provide self-service BI access, while maintaining control
You can leverage existing skillsets to deliver more value to more users faster
LEGO: Data Driven Growth Hacking Powered By Big data
Speakers: Kamal Duggireddy and Prashant Gokhale from Salesforce.com
Growth Hackers, Data Scientists, Product Managers, and Executive in most organizations struggle to derive actionable insights from the large volumes of diverse data flowing through the enterprise. LEGO was created to help tackle this problem at Salesforce.com, LEGO is a big data analysis and visualization platform. Utilizing Hadoop, Kafka, Splunk, and Salesforce Wave, it ingests and integrates all kinds of data, from unstructured log files to structured dimensional datasets.
Several open source analytic tools (Hive, Pig, Spark, and Panda) are used to enrich, normalize, and harmonize the data, increase its discoverability, and enable self service feature creation. Actionable data is made available at different levels in efficient data stores ranging from Hive tables to Search indexes. LEGO enables users to explore, discover, and create actionable data features, enabling data-driven decision making at all levels in the organization. Integration with the Salesforce Wave platform enables users to analyze and visualize hundreds of millions of rows per dataset in minutes. This presentation will describe how LEGO works, its architecture, and how it's used at Salesforce.com.
Successes, Challenges, and Pitfalls Migrating a SAAS Business to Hadoop
Speaker: Shaun Klopfenstein from Marketo
As our world becomes increasingly connected, the number of activities that marketers want to track has exploded. Marketo’s platform currently collects and processes billions of activities each day, and this number will continue to increase in coming years. To address this need, we successfully migrated from a traditional lamp stack to Hadoop while maintaining constant uptime and with little to no impact to customer workflows. This has been a major initiative for Marketo, and with it has come many challenges.
In this talk we will cover some of the difficulties we faced and the solutions we developed: business requirements, lowering COGs while scaling up, encryption, authorization, and authentication; near real-time activity processing, high-level architecture, our version of multi-tenancy and why, brownout protection, fairness, management challenges and solutions, coordination, deployment and job management, migrating customer data with zero downtime, managing application and infrastructure upgrades with zero downtime, future work, back pressure with Spark streaming, dynamically managing Kafka resources in a multi-tenant environment, and upscaling and downscaling multi-tenant Spark Streaming jobs
Self-Service Analytics on Hadoop: Lessons Learned
Speaker: Andrew Leamon from Comcast
With complexities in hardware, cloud, Hadoop distributions, installations, configurations, tuning, data ingestion, and curation, many organizations are finding it difficult to deliver on the promise of Big Data. A key reason is that until now, big data has been driven by complex technology and limited to IT specialists and data scientists. In this session, I will share an example of how we overcame these challenges for our IP Telephony team by building processes, tools, and resources to empower our analysts with self-service analytics to operationalize their data. Big data is not just about the technology. Developing internal processes and enabling the analysts are equally as important.
I have led our team at Comcast to architect a solution that embodies all three components in our big data strategy and execution plan. I will share the lessons learned as we considered different types of technology architecture and deployment, centralized or federated analyst resource model, and various internal processes to achieve greater analysis efficiencies. Finally, I will show how our IP Telephony analysts used our Self-Service solution every day to solve challenging problems and improved business results in the areas of network fraud detection, capacity planning, peering, and FCC compliance.