DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Keep Your Application Secrets Secret
  • Simulating Events in Ansible EDA: A Practical Use Case of ansible.eda.generic
  • Phased Migration Strategy for Zero Downtime in Systems
  • Launching Pega Web Mashup Forms on a Secure Static Website With AWS S3

Trending

  • Scaling DevOps With NGINX Caching: Reducing Latency and Backend Load
  • Unlocking the Benefits of a Private API in AWS API Gateway
  • Implementing Explainable AI in CRM Using Stream Processing
  • Docker Base Images Demystified: A Practical Guide
  1. DZone
  2. Testing, Deployment, and Maintenance
  3. Deployment
  4. Deploying Databricks Asset Bundles

Deploying Databricks Asset Bundles

This article provides details on the Databricks Asset Bundle, including its benefits and a step-by-step guide for deploying it in the Azure cloud environment.

By 
Soumya Barman user avatar
Soumya Barman
·
Dec. 03, 24 · Tutorial
Likes (3)
Comment
Save
Tweet
Share
1.8K Views

Join the DZone community and get the full member experience.

Join For Free

Disclaimer: All the views and opinions expressed in the blog belong solely to the author and not necessarily to the author's employer or any other group or individual. This article is not a promotion for any cloud/data management platform. All the images and code snippets are publicly available on the Azure/Databricks website.

In my other blogs, I have provided details on Databricks, how to create Unity Catalog, etc. in Azure cloud. In this blog, I will provide information on the Databricks Asset Bundle, when to use it, and how to deploy it in a Databricks workspace in Azure using the Databricks CLI.

What Is Databricks Asset Bundle (DABs)?

Databricks Assets Bundles are an Infrastructure as Code (IaC) approach to managing your Databricks objects. Since bundles are defined and managed through YAML templates and files you create and maintain alongside source code, they map well to scenarios where IaC is an appropriate approach.

The Databricks Asset Bundle is a tool that makes it easier to move things like code, data pipelines, machine learning models, and settings from one Databricks workspace to another. Imagine you have built some data tools or projects in one place and want to set them up in another; the Asset Bundle helps "package" everything together so you can move and reuse it without having to recreate each part. It's like zipping up a folder with all your work to send somewhere else, making it much faster and more organized to share or deploy. 

How Databricks workspaces work

Image Source: Databricks


Why Use Databricks Asset Bundle (DABs)?

Databricks Asset Bundles make it easy to move and manage data projects across different workspaces or environments. 

  • Simplifies Deployment: If you’ve developed code, models, or data workflows in one environment (like a development environment), Asset Bundles let you deploy everything to another (like production) Databricks workspace without redoing the setup, following DevOps best practices.
  • Easy Collaboration: Teams can share complete data projects, including all necessary components, in a structured way. This makes it easy for others to set up and use those projects.
  • Version Control and Consistency: Asset Bundles help ensure that all parts of a project stay consistent and up-to-date across environments so no steps are missed.
  • Reduces Setup Time: By packaging everything into a single bundle, you save time on configuration, making it faster to roll out updates or set up projects in new workspaces.

How to Deploy a Databricks Job Using Databricks Asset Bundle (DABs)

Prerequisites

  • Databricks workspace is already configured in Azure cloud.
  • The user running the CLI commands has access to the workspace and is able to create jobs in it.
  • Databricks CLI is installed and configured on the local machine. For information on how to install it, please refer to this website and follow the instructions based on your OS.

DAB Demo

Step 1: Validate That the Databricks CLI Is Installed Correctly

Run the following command in the command prompt or terminal:

Shell
 
databricks -v


You should see an output similar to the one below (your version could be different):

Shell
 
Databricks CLI v0.221.1


Step 2: Log In to the Workspace

Run the following command to initiate OAuth token management locally for each target workspace:

Shell
 
databricks auth login --host <workspace-url>


The CLI prompts for the Databricks Profile Name. Press enter to accept the default or enter a name to change the profile name. The profile information is saved in the  ~/.databrickscfg file (on Mac).

Step 3: Initialize the Bundle

Run the following command to initiate the bundle creation:

Shell
 
databricks bundle init


Enter a unique name for this project, such as demo. Select "no" for all other questions. After the command runs successfully, you should see the following folders created.

Folders created after the 'databricks bundle init' command is run


Step 4: Update the Bundle

Create a new file demo.yaml inside the Resources folder, and copy and paste the content below. This file contains the Databricks job definition for a Python notebook task. Also it contains the job cluster (required compute) definition. Replace the notebook path with the existing notebook in your workspace.

For more asset bundle configuration options, refer here.

YAML
 
resources:
  jobs:
    sb_demo_job:
      name: "sb demo job"
      tasks:
        - task_key: demo_task
          notebook_task:
            notebook_path: <REPLACE THIS BLOCK WITH AN EXISTING NOTEBOOK IN YOUR WORKSPACE>
            source: WORKSPACE
          job_cluster_key: Job_cluster
      job_clusters:
        - job_cluster_key: Job_cluster
          new_cluster:
            spark_version: 15.4.x-scala2.12
            azure_attributes:
              first_on_demand: 1
              availability: ON_DEMAND_AZURE
              spot_bid_max_price: -1
            node_type_id: Standard_D4ds_v5
            spark_env_vars:
              PYSPARK_PYTHON: /databricks/python3/bin/python3
            enable_elastic_disk: true
            data_security_mode: SINGLE_USER
            runtime_engine: PHOTON
            num_workers: 1
      queue:
        enabled: true


Step 5: Validate the Bundle

Run the following command to validate the bundle. Make sure you are running this command within your bundle folder, where the databricks.yml file is present.

Shell
 
databricks bundle validate


Any error in the bundle configuration will show as an output of this command.

If you have multiple profiles in your .databrickscfg file, put the -p <PROFILE NAME> in the command as a parameter.

Step 6: Deploy the Bundle

Run the following command to deploy the bundle in the Databricks dev workspace using -t parameter. You can find all the targets in the databricks.yml inside your bundle folder.

Shell
 
databricks bundle deploy -t dev


If you have multiple profiles in your .databrickscfg file, then put the -p <PROFILE NAME> in the command. 

You should see similar prompts as below:

Prompts when the '-p <PROFILE NAME> command is put in the .databrickscfg file


Step 7: Validate the Bundle Deployment

Log into the Databricks workspace and click on the "Workflows" menu in the left menu panel. Search with the job name in your bundle YAML file; the job should appear like the image below. This validates that you have successfully configured and deployed a Databricks job with a notebook task in the workspace.  

Log into the Databricks workspace, then click on the Workflows menu


Now, you can commit and push the YAML file to your favorite Git repository and deploy it in all environments using the CI/CD pipeline.

Conclusion

Automating deployment through the Databricks Asset Bundle ensures that all the jobs and Delta Live Table pipelines are deployed consistently in Databricks workspaces. It also ensures that the configurations are codified, version-controlled, and migrated across environments following DevOps best practices. 

The above steps demonstrate how anyone can easily develop, validate, and deploy a job in a Databricks workspace using Databricks CLI from a local workstation during the development and unit testing phases. Once validated, the same YAML file and the associated notebook(s) can be pushed to a version control tool (e.g., Github), and a CI/CD pipeline can be implemented to deploy the same job in test and production environments.

Workspace YAML Software deployment

Opinions expressed by DZone contributors are their own.

Related

  • Keep Your Application Secrets Secret
  • Simulating Events in Ansible EDA: A Practical Use Case of ansible.eda.generic
  • Phased Migration Strategy for Zero Downtime in Systems
  • Launching Pega Web Mashup Forms on a Secure Static Website With AWS S3

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!