DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Data Engineering
  3. Databases
  4. How-to: Use the New Apache Oozie Database Migration Tool

How-to: Use the New Apache Oozie Database Migration Tool

How to use the new a migration tool for the Apache Oozie database with Cloudera Manager.

Robert Kanter user avatar by
Robert Kanter
·
Atiila Sasvari user avatar by
Atiila Sasvari
·
Robert Justice user avatar by
Robert Justice
·
Nov. 12, 16 · Tutorial
Like (1)
Save
Tweet
Share
5.71K Views

Join the DZone community and get the full member experience.

Join For Free

This tool makes Oozie migrations off Apache Derby (or any other supported database) easy, in addition to streamlining upgrades.

The Apache Oozie server is a stateless web application by design, with all information about running and completed workflows, coordinator jobs, and bundle jobs stored in a relational database. Prior to Cloudera Manager 5.4, Oozie was configured to use the embedded Apache Derby database for this purpose by default. However, while Derby can safely be used in very small or test/dev clusters, it is not recommended for production Oozie installations—with the main reason being that Derby suffers from known locking issues at high scale and with large amounts of data.

Furthermore, to provide additional scalability and fault tolerance, Oozie introduced high availability (HA) starting in CDH 5. Unfortunately, Derby cannot be used in an HA setup, as it does not support multiple concurrent connections.

In this post, we’ll describe how Cloudera has addressed these issues for users in a forthcoming Oozie release (and soon to ship in Cloudera Enterprise).

Problem Statement

Here at Cloudera, our support team frequently analyzes customer workloads to help ensure their long-term success. During this process, we discovered that many Oozie customers would benefit from migrating off Derby before they begin to scale up and possibly experience locking problems.

The Oozie database contains binary large object (BLOB) columns, so previous migration attempts using free or open source ETL tools failed either partially or completely. The best fix, in most cases, was to start over with a new empty database. But in complex production installations with numerous bundles, coordinators, and workflows running, starting from scratch and locating the properties to resubmit hundreds of jobs is often impractical.

For those reasons, Cloudera’s support and Oozie engineering teams—working in tandem with other members of the Oozie developer community—on a solution to help customers safely migrate their Oozie databases without history or job loss: The addition of a new database dump and load tool to Oozie, which is accessible to admins via Cloudera Manager. Targeted for the upcoming Oozie 4.3 release (and shipping soon in Cloudera Enterprise), the new Oozie Database Migration Tool (OOZIE-2632) helps users more easily migrate from one database to another (and, as a bonus, makes upgrades easier as well).

Using the Tool

The oozie-setup.sh dump and load utility in OOZIE-2632 solves the problem of dumping and loading the Oozie database in a database-agnostic way: It uses OpenJPA (the library that lets Oozie talk to databases) to dump objects to compressed JSON files using GSON. This approach not only supports data extraction from a Derby database, but from any other database that Oozie supports.

Dumping the Source Database

Prior to dumping the database, the Oozie server (in HA mode: all of the servers) must be stopped and the oozie-site.xml configured to the source database from which the dump will be performed. During the dump process, the database content is fetched and written to a compressed zip of json files using the command:

oozie-setup.sh export /export/path/oozie_db_dump.zip

Loading the Target Database

Once the dump is performed, you can configure the oozie-site.xml to point to a new target database. The stock, initial Oozie database tables should be created using the command:

oozie-setup.sh db create 

To then load the previously dumped data, run the command:

oozie-setup.sh import /export/path/oozie_db_dump.zip

Integration With Cloudera Manager

Oozie and Cloudera Manager engineers also worked together to simplify this process via a few clicks (see screenshots below). As previously explained, you’ll see this feature in a forthcoming release.

cm-oozie-db “Dump Database” and “Load Database” actions in Cloudera Manager

Oozie_Load_Database_outputSample output of the “Load Database” action

Oozie_Dump_Database_output

Sample output of the “Dump Database” action

Conclusion

Dumping and loading the Oozie database is conveniently available to Oozie admins either from command line or via the easy-to-use UI in Cloudera Manager. It expedites the process of moving from one database to another for migration or upgrade projects. Helpfully, workflows, coordinators, and bundles can continue to run with only minimal interruption.

As a caution, keep in mind that migration of an exceptionally large database may require more resources and time. That said, due to the scalability problems described previously, Derby will likely have problems long before it reaches a size that would make this issue a common concern. (We may address it on the roadmap should there be enough demand to do so.)

We hope you will find this tool very helpful. Please give us your feedback!

Acknowledgements

The authors would like to thank Peter Bacsko and Peter Cseh for implementing and reviewing the tool. Special thanks to Aravind Selvan, Paolo Milani, and Darren Lo for their guidance and suggestions throughout the Cloudera Manager integration, and to the rest of the Oozie community for additional reviews and feedback.

Attila Sasvari is a Software Engineer at Cloudera, and is currently working on the Oozie team.

Robert Justice is a Customer Operations Engineer at Cloudera.

Robert Kanter is a Software Engineer at Cloudera, a committer and PMC Member for Apache Hadoop, and the VP/PMC Chair for Oozie.

Database dump Relational database Apache Oozie

Published at DZone with permission of Robert Kanter, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Top Five Tools for AI-based Test Automation
  • How to Quickly Build an Audio Editor With UI
  • DevOps Roadmap for 2022
  • PostgreSQL: Bulk Loading Data With Node.js and Sequelize

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: