DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • AI Paradigm Shift: Analytics Without SQL
  • Beyond Partitioning and Z-Order: A Deep Dive into Liquid Clustering for Unity Catalog Managed Tables
  • One Query, Four GPUs: Tracing a Distributed Training Stall Across Nodes
  • Why We Chose Iceberg Over Delta After Evaluating Both at Scale

Trending

  • DevOps and Platform Engineering Readiness Checklist: Everything Needed for a Scalable, Secure, High-Velocity Delivery Platform
  • Architecting an Embedded Efficiency Layer: A Platform Deep Dive into Day-Two Operational Tuning
  • Building Enterprise-Grade Real-Time IoT Dashboards with Vue 3, MQTT, and Kafka
  • The Agentic Agile Office: Streamlining Enterprise Agile With Autonomous AI Agents
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Migrating Traditional Workloads From Classic Compute to Serverless Compute on Databricks

Migrating Traditional Workloads From Classic Compute to Serverless Compute on Databricks

This tutorial explains the migration of Databricks workloads from Classic Compute to Serverless Compute for efficiency and cost effectiveness.

By 
Prasath Chetty Pandurangan user avatar
Prasath Chetty Pandurangan
·
Jul. 17, 25 · Tutorial
Likes (0)
Comment
Save
Tweet
Share
4.3K Views

Join the DZone community and get the full member experience.

Join For Free

This article walks us through the process of how to migrate traditional workloads using Classic Compute to Serverless Compute for efficient cluster management, cost effectiveness, better scalability and optimized performance.

Overview

As data engineering evolves, so do the infrastructure needs of enterprise workloads. With growing demands for agility, scalability, and cost-efficiency, Databricks Serverless Compute provides a compelling alternative to classic clusters. In this article, we explore a practical roadmap to migrate your pipelines and analytics workloads from classic compute (manual clusters or job clusters) to Databricks Serverless Compute, with specific attention to data security, scheduling, costs, and operational resilience.

Why Migrate to Serverless Compute?

Before dwelling into migration steps, let’s compare why serverless computing is better and efficient than Classic Compute for workloads:

Feature

Classic Compute

Serverless Compute

Cluster Management

Manual or automated

Fully managed by Databricks

Cost Control

Prone to idle costs

No charge for idle compute

Scalability

Manual configuration

Auto-scales per workload needs

Security Isolation

Shared VMs unless isolated

Secured, runtime-isolated compute

Performance Optimization

User-optimized

Databricks-optimized runtime & IO


For data pipeline tasks that involve scheduled ETL jobs, monthly reconciliations, or ledger computations, serverless compute offers elasticity and reduced maintenance burden—ideal for small-to-medium batch workloads with predictable patterns.

Pre-Requisites: Assess the Assets of Current Workloads

Let us start by auditing your existing classic cluster workloads:

  • Identify job types: ETL pipelines, reporting scripts, reconciliation logic.
  • Data sources: Delta tables, JDBC, cloud storage (e.g., S3, ADLS).
  • Schedule and frequency: How often do jobs run? Nightly, monthly, ad-hoc?
  • Dependencies: Are there shared libraries, secrets, or initialization scripts?
  • Execution environment: Python, SQL, Scala, or notebooks?

Create an inventory and tag each workload with compute and runtime needs (e.g., memory, cores, run time).

Migration Process Flow Walkthrough

Step 1:  Set Up Serverless Compute in Databricks

a. Enable Serverless in Your Workspace

  • Go to Admin Console → Compute.
  • Ensure Serverless Compute is enabled.
  • If required, contact your Databricks support team to enable it in your workspace (may depend on cloud provider and plan).

b. Create a Serverless SQL Warehouse (Optional)

If your workloads are SQL-heavy (e.g., ledger queries, reporting dashboards):

  • Navigate to SQL → SQL Warehouses.
  • Click Create → Choose Serverless → Configure autoscaling, timeouts, and permissions.

For Python/Scala jobs, proceed to the next step. 

Step 2: Migrate Jobs to Serverless Compute

a. Job Migration Steps (Databricks Workflows)

If you're using Job Clusters:

  • Open the existing job from Workflows.
  • Click Edit Job Settings.
  • Under Cluster Configuration→ change the cluster type to:
    • "Shared" Serverless Job Cluster, or
    • Use existing serverless pool (if set up).

If you're using notebooks or workflows:

  • Set the attached compute to a Serverless Job Cluster.
  • Ensure libraries are installed using Init Scripts or Workspace Libraries (avoid cluster-level installs).

b.  Validate Environment Compatibility

  • Make sure all libraries (e.g., Pandas, PySpark) work under the Databricks Runtime supported by serverless.
  • If using legacy Hive or JDBC connectors, confirm this work or migrate to Unity Catalog / native Delta connections.
  • Review any init scripts or file paths that assume a VM or disk context—they may not behave identically in serverless.

Step 3: Schedule Jobs and Monitor Performance

Databricks allows job scheduling and retry logic via Workflows:

  • Go to Workflows → Create Job.
  • Set the notebook/script path, parameters, and schedule (e.g., "Every first of the month at 3 AM").
  • Configure email/Slack alerts for success/failure.
  • Enable retry policy (e.g., up to 3 retries on failure).

Use Job Metrics UI to compare performance:

  • CPU and memory usage per task.
  • Runtime per job before and after serverless migration.
  • Cost estimation dashboards (if enabled).

Step 4: Secure Access to Data

Most data is sensitive. Make sure to:

  • Enable Unity Catalog for fine-grained access control.
  • Use credential passthrough or service principals for access to cloud storage.
  • Store secrets using Databricks Secrets and access them securely in jobs.

Example:

Python
 
python

import os

import pyspark.sql.functions as F

db_pass = dbutils.secrets.get(scope="-secrets", key="db-password")


Step 5: Optimize and Scale

Once migrated, apply these optimization steps:

  • Use Delta Lake for all tables to benefit from caching and ACID compliance.
  • Apply Z-Ordering on frequent columns (e.g., account_id, period).
  • Use photon runtime in serverless SQL for faster computation.
  • Monitor for underutilized compute—tune autoscaling thresholds accordingly.

Step 6: Example Use Case: Monthly Accounting Reconciliation

Suppose your classic cluster runs a notebook like this:

Python
 
python

# Load entries

df = spark.read.table("Ledger_2024")

# Summarize per account

summary = df.groupBy("account_id").agg(F.sum("debit"), F.sum("credit"))

# Write to delta

summary.write.format("delta").mode("overwrite").save("/mnt/ledger/summary")

To migrate:

  • Move this notebook to a scheduled workflow with a serverless job cluster.
  • Replace paths like /mnt/... with Unity Catalog references if possible.
  • Ensure access to Ledger_2024 via catalog permissions.

Key Considerations and Limitations

Consideration

Notes

Cold Start Time

First request may have slight delay (~10s)

External Libraries

Prefer libraries installed via PyPI or workspace libraries

Job Isolation

No direct access to DBFS root or cluster-local files

Networking Constraints

If you rely on VPC peering or private endpoints, check compatibility with serverless network architecture


Post-Migration Lookouts

  • Cost Monitoring: Serverless charges are usage-based. Regularly monitor cost via Databricks billing dashboards.
  • Audit Logging: Ensure audit logs are configured to track access and execution.
  • Security Hardening: Apply appropriate workspace controls, token lifetimes, and access levels for production environments.

Conclusion

Migrating from classic compute to serverless compute in Databricks significantly improves cost efficiency, manageability, and scalability especially for structured workloads like Accounting.  By following a structured migration path starting with inventory, compute setup, job conversion, and optimization you can ensure a smooth transition without sacrificing performance or security.

This migration is a strategic step toward modernizing your data and AI infrastructure. As the transition introduces architectural and operational changes, the benefits in agility, cost savings, and scalability are significant. By following the prerequisites and adopting a methodical migration strategy, your team can fully leverage the power of Databricks Serverless Compute.

We should approach the migration incrementally and strategically by starting with non-critical workloads at first and expanding serverless usage to core and critical data pipelines and jobs.

sql Serverless computing data pipeline

Opinions expressed by DZone contributors are their own.

Related

  • AI Paradigm Shift: Analytics Without SQL
  • Beyond Partitioning and Z-Order: A Deep Dive into Liquid Clustering for Unity Catalog Managed Tables
  • One Query, Four GPUs: Tracing a Distributed Training Stall Across Nodes
  • Why We Chose Iceberg Over Delta After Evaluating Both at Scale

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook