DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
Securing Your Software Supply Chain with JFrog and Azure
Register Today

Trending

  • Integrating AWS With Salesforce Using Terraform
  • Effortlessly Streamlining Test-Driven Development and CI Testing for Kafka Developers
  • Knowing and Valuing Apache Kafka’s ISR (In-Sync Replicas)
  • Cucumber Selenium Tutorial: A Comprehensive Guide With Examples and Best Practices

Trending

  • Integrating AWS With Salesforce Using Terraform
  • Effortlessly Streamlining Test-Driven Development and CI Testing for Kafka Developers
  • Knowing and Valuing Apache Kafka’s ISR (In-Sync Replicas)
  • Cucumber Selenium Tutorial: A Comprehensive Guide With Examples and Best Practices
  1. DZone
  2. Data Engineering
  3. Databases
  4. MySQL Applier for Hadoop

MySQL Applier for Hadoop

Rozanna Simus user avatar by
Rozanna Simus
·
Jun. 07, 13 · Interview
Like (0)
Save
Tweet
Share
6.25K Views

Join the DZone community and get the full member experience.

Join For Free
Enabling Real-Time MySQL to HDFS Integration

Big Data is transforming the way organizations harness new insights from their business, and Apache Hadoop is at the center of that transformation. 
Batch processing delivered by Map/Reduce remains central to Hadoop, but as the pressure to gaincompetitive advantage from "speed of thought" analytics grows, so Hadoop itself isundergoing significant evolution.
The development of technologies allowing real time queries, suchas Apache Drill, Cloudera Impala and the Stinger Initiative are emerging, supported by new generations of resource management with Apache YARN.
For enabling real-time MySQL to HDFS integration have that suport or mysql To support this growing emphasis on real-time operations, MySQL is releasing a new HadoopApplier to enable the replication of events from MySQL to Hadoop / Hive / HDFS (HadoopDistributed File System) as they happen. 
The Hadoop Applier complements existing batch-based Apache Sqoop connectivity. This developer article gives you everything you need to get started in implementing real-time MySQL to Hadoop integration.
MySQL in Big Data MySQL is playing a key role in many big data platforms. Based on estimates from a leading Hadoop vendor, MySQL is a core component of the big data pipeline in over 80% of deployments, including those implemented by the likes of FaceBook and Twitter. 
Recent research by Intel1 finds that business transactions stored in relational databases are the leading source of data for big data analytics. This structured data is typically joined with unstructured content such as web logs, clickstreams, social media, email, sensors, images, video, etc, enabling business analysts to seek answers to questions that were previously impossible to ask.
As an example, on-line retailers can use big data from their web properties to better understand site visitors’ activities, such as paths through the site, pages viewed, and comments posted - captured from clickstreams and weblogs. 
This data can be combined with user profiles and purchasing history to gain a deeper insight into its customers, and enable the delivery of highly targeted offers. Of course, it is not just in the web that big data can make a difference. 
Many other business activities can benefit, with other common use cases including: Sentiment analysis;Marketing campaign analysis;Customer churn modeling;Fraud detection;Research and Development;Risk Modeling;And more.Today many users integrate data from MySQL to Hadoop using Apache Sqoop, allowing bulk transfers of data between MySQL and HDFS, or related systems such as Hive and HBase. 
Apache Sqoop is a well-proven approach for bulk data loading. However, with the growing number of use-cases for streaming real-time updates from MySQL into Hadoop for immediate analysis, we need to look at complementary solutions. In addition, the process of bulk loading to and from tables can place additional demands on production database infrastructure, potentially impacting performance, measured in predictable throughput and latency. 
The Hadoop Applier is designed to address these issues to perform real-time replication of events between MySQL and Hadoop. MySQL Applier for Hadoop Replication via the Hadoop Applier is implemented by connecting to the MySQL master and reading binary log events as soon as they are committed, and writing them into a file in HDFS. "Events" describe database changes such as table creation operations or changes to table data. MySQL to HDFS Integration 
Figure 1: MySQL to HDFS Integration The Hadoop Applier uses an API provided by libhdfs, a C library to manipulate files in HDFS. The library comes precompiled with Hadoop distributions. It connects to the MySQL master to read the binary log and then: Fetches the row insert events occurring on the masterDecodes these events, extracts data inserted into each field of the row, and uses content handlers to get it in the format required Appends it to a text file in HDFS. Databases are mapped as separate directories, with their tables mapped as sub-directories with a Hive data warehouse directory. Data inserted into each table is written into text files (named as datafile1.txt) in Hive / HDFS. Data can be in comma separated format; or any other, that is configurable by command line arguments. 
Mapping between MySQL and HDFS Schema 
Figure 2: Mapping between MySQL and HDFS Schema The installation, configuration and implementation are discussed in detail in the Hadoop Applier blog. Integration with Hive is also documented. We also have a Hadoop Applier video demo which shows the integration. In this first version WRITE_ROW_EVENTS are supported, i.e. only insert statements are replicated. Deletes and updates, and DDLs may be handled in the future releases. It would be great to get your requirements - please use the comments section in the Hadoop Applier blog. Summary The Hadoop Applier is a major step forward in providing real-time integration between MySQL and Hadoop. With the growth in big data projects and Hadoop adoption, it would be great to get your feedback on how we can further develop the Applier to meet your real-time integration needs. 
hadoop MySQL Big data Database Relational database

Published at DZone with permission of Rozanna Simus. See the original article here.

Opinions expressed by DZone contributors are their own.

Trending

  • Integrating AWS With Salesforce Using Terraform
  • Effortlessly Streamlining Test-Driven Development and CI Testing for Kafka Developers
  • Knowing and Valuing Apache Kafka’s ISR (In-Sync Replicas)
  • Cucumber Selenium Tutorial: A Comprehensive Guide With Examples and Best Practices

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com

Let's be friends: