From Development to Deployment: Automating Machine Learning

A step-by-step journey of a machine learning model from development to deployment using containers and Infrastructure-as-Code tool Terraform.

Yashraj Behera

Jul. 29, 25 · Tutorial

Likes (4)

Comment

Save

2.5K Views

Building a machine learning (ML) model is both fascinating and complex, requiring careful navigation through a series of steps. The journey from machine learning model development to deployment is the most critical phase in bringing AI to life. A well-trained model, on the right algorithm and relevant data, covers the development stage, then the focus shifts toward deployment.

Deploying a machine learning model can be a tedious process: building APIs, containerizing, managing dependencies, configuring cloud environments, and setting up servers and clusters often require significant effort, but imagine if the entire workflow could be automated. In this article, we’ll talk about how ML deployment automation can unify and simplify all these processes. The deployment process can be simplified by using general tools, preconfigured modules, and easy-to-integrate automated scripts.

In this article, I’ll walk you through how I trained an ML model, containerized it with Docker, and deployed it to the cloud using Terraform, all using automation scripts that make the process reusable and CI/CD friendly.

What Automating ML Deployment Brings to The Table

Automating ML deployment changes the game entirely:

Enables machine learning models to scale efficiently
Pushes models into production within minutes
Removes time-consuming repetitive steps
Reduces human error

Tools Used

To configure the ML model deployment, we need a few essential tools and libraries:

Python 3.4+: the core programming language used to train and host the model, as well as write scripts to fill the gaps
scikit-learn: Python library for machine learning
FastAPI: Python library to host the ML model as a Web API
Docker: runs Terraform and the ML model
Cloud CLI: required installation to interact with cloud platforms like Azure, AWS, and GCP
Terraform: Infrastructure as Code (IaC) to provision cloud resources

Project Setup

Now, let’s set up the project and review each step. The project is majorly divided into three parts:

ML model training
ML workflow automation
IaC with Terraform

And the project can be structured as below:

    Shell
   
 

   ml_deploy/

├── src/
│   ├── app.py                  # FastAPI app that serves the ML model
│   ├── train_model.py          # Trains and serializes the model
│   ├── model.pkl               # Packaged ML model
│   ├── requirements.txt        # Python libraries
│   └── Dockerfile              # Defines the Docker image

├── terraform/
│   ├── main.tf                 # Terraform configuration file
│   ├── variables.tf
│   ├── outputs.tf
│   └── terraform.tfvars        # Holds dynamic values like image name

├── scripts/
│   ├── build_model_and_image.py  # Automates model training + Docker
│   └── install_terraform.py      # Runs Terraform inside Docker
  

Machine Learning Model Training

The first step in the process is model development, training the model and building an API to serve it:

    Python
   
 

   train_model.py/

import pickle
from sklearn.linear_model import LogisticRegression
from sklearn.datasets import load_iris

# Load data
X, y = load_iris(return_X_y=True)

# Initialize and train model
model = LogisticRegression(max_iter=200)
model.fit(X, y)

# Save model to a file
with open("model.pkl", "wb") as f:
    pickle.dump(model, f)
  

In the above example, we trained a logistic regression model on the traditional Iris Species dataset using scikit-learn. Pickle library was used to serialize the model, encapsulating all the dependencies into a model.pkl file. The model and /predict endpoint are then loaded by a FastAPI server in app.py to generate predictions:

    Python
   
 

   app.py/

from fastapi import FastAPI
import pickle
import numpy as np

app = FastAPI()
model = pickle.load(open("model.pkl", "rb"))

@app.get("/")
def root():
    return {"message": "Model running"}

@app.post("/predict")
def predict(data):
    prediction = model.predict(np.array(data).reshape(1, -1))
    return {"prediction": prediction.tolist()}
  

ML Workflow Automation

A trained machine learning model can be made into a service that can deliver in real time and at scale when it is deployed and accessed reliably. Manually training the model, deploying the model by building Docker images, and updating configuration files can become a tedious and error-prone process. Automating not only makes it more efficient but also streamlines the workflow.

We automate these steps using the two Python scripts:

build_model_and_image.py: This Python script automates and combines model training, Docker image building, pushing to DockerHub, and updating the .tfvars Terraform file into a single workflow.View the build_model_and_image.py code on GitHub: https://github.com/yraj1457/MLOps/blob/main/scripts/build_model_and_image.py

    Python
   
 

   build_model_and_image.py/

import subprocess
import sys

# Executes the train model Python code
def train_model():
    print("Training the Model")
    try:
        subprocess.run(["python3", "train_model.py"], check=True, cwd=src_dir)
    except Exception as e:
        print(f"Error Training the Model: {e}")
        sys.exit(1)

# Builds the image after training the model
def build_image():
    print(f"Building the Docker Image: {docker_image}")
    try:
        subprocess.run(["docker", "build", "-t", docker_image, "."], check=True)
    except Exception as e:
        print(f"Error Building the Docker Image: {e}")
        sys.exit(1)
  

install_terraform.py: This Python automation script takes care of provisioning infrastructure by running Terraform in a Docker container, which ensures that Terraform doesn’t have to be installed separately. View the install_terraform.py code on GitHub: https://github.com/yraj1457/MLOps/blob/main/scripts/install_terraform.py

    Python
   
 

   install_terraform.py/

import subprocess
import sys
from pathlib import Path

# Run the Trio, the three Terraform commands
def run_terraform():
    cmd_list = ['init', 'plan', 'apply']

    for cmd in cmd_list:
        print(f"Running Terraform {cmd}")
        try:
            subprocess.run(
                f"docker run --rm -v {Path(terraform_dir).resolve()}:/workspace "
                f"-w /workspace {terraform_image} {cmd}",
                shell=True,
                check=True
            )
        except Exception as e:
            print(f"Error running Terraform {cmd}: {e}")
            sys.exit(1)
  

These automation scripts fill the gaps and make the workflow reusable when plugged into a pipeline.

Infrastructure as Code With Terraform

The production-ready service needs to be deployed. We use IaC with Terraform, which allows us to define our entire cloud setup — including the container that runs our model. It ensures that deployment is not only automated and consistent but also portable across environments.

The infrastructure is provisioned by the four Terraform configuration files: main.tf, variables.tf, outputs.tf, and terraform.tfvars. The Python script uses the official hashicorp/terraform Docker image to run the Terraform commands (init, plan, and apply), which removes the need for maintaining Terraform installations or versions and provides a clear division between development and deployment.

The Terraform snippet below could be an example. It provisions an Azure Resource Group and a Container instance to host the machine learning API.

    Python
   
 

   main.tf/

provider "azurerm" {
  features {}
}

resource "azurerm_resource_group" "ml_rg" {
  name     = var.resource_group_name
  location = var.location
}

resource "azurerm_container_group" "ml_app" {
  name                = "ml-model-api"
  location            = azurerm_resource_group.ml_rg.location
  resource_group_name = azurerm_resource_group.ml_rg.name
  os_type             = "Linux"

  container {
    name   = "mlmodel"
    image  = var.container_image
    cpu    = "1.0"
    memory = "1.5"

    ports {
      port     = 80
      protocol = "TCP"
    }
  }

  ip_address_type = "public"
  dns_name_label  = var.dns_label
}
  

The complete codebase for this approach, including all the scripts and configuration files, is available on GitHub: https://github.com/yraj1457/MLOps

Why This Approach Is More Efficient

The automation scripts tie together processes, resulting in a more efficient approach that minimizes manual intervention and gracefully logs errors. Additionally, we minimize dependencies and guarantee consistency across environments by running the tools inside a Docker container. Best practices from infrastructure automation, DevOps, and MLOps are combined in this architecture.

Conclusion

This article shows how to go from machine learning model training to deployment using minimal tooling, reduced dependencies, and maximum automation, saving hours of repetitive work for data scientists and MLOps engineers. Utilizing the automation scripts written in Python, along with Docker to encapsulate both the model and Terraform, we set up an environment that is reusable, automated, and extendable.

This approach is highly portable and can be plugged into any CI/CD tool, such as GitHub Actions or Azure DevOps. The foundation is set from here, and you can modify as per your requirements.

Machine learning Docker (software) Terraform (software)

Opinions expressed by DZone contributors are their own.

Related

Trending