DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
  1. DZone
  2. Testing, Deployment, and Maintenance
  3. DevOps and CI/CD
  4. Adaptive Data Integration and Operations on Oracle Cloud Using StreamSets

Adaptive Data Integration and Operations on Oracle Cloud Using StreamSets

A data expert discusses how this partnership will be useful for continuous data flows built with StreamSets's DataOps platform.

Clarke Patterson user avatar by
Clarke Patterson
·
Oct. 26, 18 · Analysis
Like (2)
Save
Tweet
Share
5.86K Views

Join the DZone community and get the full member experience.

Join For Free

StreamSets is pleased to announce a new partnership with Oracle Cloud Infrastructure (OCI). As enterprises move their big data workloads to the cloud, it becomes imperative that their Data Operations are more resilient and adaptive to continue to serve the business's needs. This is why StreamSets Data Collector™ is now easily deployable on OCI.

What led us to this point? There are fundamental questions such as 'What good is an Enterprise Data Hub (EDH) without the most current data?' 'What good is the EDH without lots of data sources feeding it?' which leads to the follow up questions of 'How do you manage data engineering as quickly as software development in a fast-paced DevOps world?' 'How do you manage change-data-capture (CDC) from Oracle, streaming log files, and batch SFTP dumps without using large and confusing toolsets?'

To answer all of these questions, StreamSets has created the first complete DataOps (DevOps for data integration) platform to compliment the fail-fast world of DevOps toolsets that are commonly found in places like a cloud-based EDH deployment. Running StreamSets in the Oracle Cloud to support a Cloudera Enterprise Data Hub (EDH) provides an excellent example of DevOps being applied to data to harness the value of a big data project.

Before we get to what this example looks like and how it operates so well together, it might be helpful to explain the why this unlikely trio would be assembled in the first place and how to answer; 'Why the Oracle Cloud to run Cloudera?'

As OCI becomes more popular, a wider range of use cases presents itself and we see Hadoop deployments becoming great fit for OCI. This is because the Oracle Cloud does have a few pretty significant tricks up its sleeve that are unique to a second gen cloud provider. The old saying about, "Pioneers get the arrows and the settlers get the land..." it turns out, also applies to cloud computing. First, there are some serious performance incentives, like OCI's combination of bare metal compute and 50TB of local NVMe storage per node (or one petabyte of block storage per node) offers about 40% faster performance when compared to traditional cloud VMs, or that OCI is the only cloud provider that offers a guaranteed 25Gbps connection between any two nodes (SLA here). Second, OCI incorporates Oracle's Identity and Access Management (IAM) suite and the unique use of 'compartments' (which are essentially sub-clouds for greater security and billing that scale across regions). Finally, the unique partnership between Oracle and Cloudera is an added bonus. Specifically, the cloud portion of this partnership, enshrined in their ongoing support for a repository of Terraform scripts that enable a rapid and supported start-up and management of a large amount of nodes for development or production.

The importance of Terraform and Oracle's/Cloudera's ongoing support for its openly available scripts to rapidly provision environments cannot be understated. Terraform gives users the ability to declaratively create immutable infrastructure and is fundamentally different than what you might find in a procedural, agent-based configuration management tool like Chef or Puppet. For those unfamiliar with Terraform, it is an open-source, high-level configuration language which can create and execute plans to build a potentially unlimited amount of infrastructure via APIs in any popular cloud or on-premise environment. Using the Terraform scripts supported by Oracle and Cloudera, deploying a high-performance N-node EDH becomes as simple as making any changes to the scripts as deemed necessary, and writing into the CLI "terraform init && terraform plan && terraform apply."

Now that we understand how easy it is to provision and deploy to those environments the next issue is how to move the data into/out of the EDH? Or how to ingest change data capture or streaming web logs to keep the EDH current?"

The answer to that question lies in the value of the partnership between Cloudera and StreamSets. StreamSets makes data ingestion and data movement easy via its DataOps platform. Tools like StreamSets Data Collector (data execution plane) and StreamSets Control Hub™ (control plane) work in tandem so an organization can centrally develop data pipelines and automate a distributed implementation of the same pipelines inside or outside the Hadoop cluster. Additionally, tools like StreamSets Data Protector™ and StreamSets Dataflow Performance Manager™ will discover and protect sensitive data in-stream, or provide service level agreements around streaming data availability and/or quality. All of these tools brought together allow for the rapid iteration of data movement that is a secure, predictable, and scalable way to ensure the ongoing value of the EDH to business users.

Experience the speed and power of DevOps plus DataOps using this repository as a packaged offering of StreamSets, Oracle, and Cloudera. Once you have provisioned your EDH cluster with StreamSets via Terraform on the Oracle Cloud, the next step can be an adventure of your choosing! You can create data pipelines as a microservice, stream CDC logs to your EDH, or even stream data from Salesforce APIs for visualization in Minecraft. The world of DataOps awaits your exploration!

This is the first of a 3 part series where next we will do a deep dive of how we got StreamSets up and running on OCI and value of Terraform and finally how StreamSets and Cloudera perform on bare metal OCI vs other cloud vendors. In the meantime, you can read Oracle's view on the integration here.

Big data Cloud Data integration Integration Terraform (software) Change data capture

Published at DZone with permission of Clarke Patterson, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Popular on DZone

  • Why It Is Important To Have an Ownership as a DevOps Engineer
  • What Was the Question Again, ChatGPT?
  • The Role of Data Governance in Data Strategy: Part II
  • Taming Cloud Costs With Infracost

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com
  • +1 (919) 678-0300

Let's be friends: