Batch Jobs Need Source Control Too
Batch Jobs Need Source Control Too
These best practices will help you apply processes to batch jobs that will help ensure proper planning, testing, and maintenance strategies.
Join the DZone community and get the full member experience.Join For Free
Discover how you can reduce your Kubernetes installation from 22 steps to 1.
Mission-critical batch processes and automated workloads often fall outside the realm of development. But, as any DevOps professional knows, many of the processes that fall just outside the main codebase – file transfers, ETL steps, and builds, for example – have almost as much impact on the delivery of your application as the application itself.
Batch processes require planning, testing, refinement, and maintenance, in the same way a core application does. Yet, many teams leave these batch processes scattered in script files and in native schedulers such as Task Scheduler, SQL Agent, and Cron, each of which only store one version of a job’s definition.
You Get a Scheduler! He Gets a Scheduler! She Gets a Scheduler!
Unlike an app development team, you may not have the opportunity to pick a single set of standards. Operations jobs often rely on multiple platforms and multiple applications. You have to work with whatever platforms or applications your enterprise depends upon. The source of batch jobs and workloads can wind up in all sorts of repos that are managed (or unmanaged) by different groups. DBAs keep database jobs in one repo, and developers keep theirs in another. Maybe you even have some job scripts for a legacy ERP system tucked away in a folder full of .txt files.
Leaving jobs’ source in silos creates security risks, and makes it difficult to scale. A single, small change, such as a new server name, could create hours of busy work and bring your latest build to a standstill.
By centralizing workload automation, DevOps teams bring critical processes under one roof, in the same way a development team organizes many projects into a centrally managed solution.
The consolidation of source for all your jobs and workloads mitigates risks and costs. Having one central repository for job definitions means you can apply consistent security across every job, regardless of execution method. You can apply “just right” access controls to the job source, specifying exactly which users can modify and view. If a change is made, you have a complete audit trail that lets you know exactly what change was made and who made it. In addition, your enterprise is less vulnerable to staff changes. You no longer have to worry about someone leaving the company and leaving jobs buried on a server that no one knows about. Centralized schedulers provide transparency and a central repo for everything in the environment.
Scale Smart With Parameters
Having a centralized job repo cuts down the management costs associated with workload automation. By storing commonly used parameters (e.g. server name) in parameters, you streamline job creation and job editing. For example, if you have 200 jobs that all use the connection string “DataSource=PBest-01” and you later learn that the data store has been moved to “RStarr-02”, you only need to update the value of one parameter, a task that takes considerably less time than 200 manual edits.
Always Be Able to Roll Back to a Previous Working Version
Version control eliminates so much of the overhead of resolving job failure issues. When you can view every change to a job’s source, you know which user made the change, when they made it, and whether they modified its schedule, its parameters, or its dependencies. So, no matter what input prevents a job’s successful execution, you have the ability to roll back to a previous working version.
Opinions expressed by DZone contributors are their own.