DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Stop Poisoning Your Models: How I Built a CV Dataset Quality Toolkit I Can Reuse Forever
  • Extracting Clean Excel Tables From PDFs Using Python + Docling
  • A Developer's Practical Guide to Support Vector Machines (SVM) in Python
  • Python Development With Asynchronous SQLite and PostgreSQL

Trending

  • How AI Is Rewriting the Rules of Software Security: Machine-Speed Delivery, Shifting Risk, and New Control Points
  • Lease Coordination Under Serializable Isolation in CockroachDB
  • The Vector Database Lie
  • Why Your RAG Pipeline Will Fail Without an MCP Server
  1. DZone
  2. Popular
  3. Open Source
  4. A Comprehensive Guide to MLflow for Machine Learning Lifecycle Management

A Comprehensive Guide to MLflow for Machine Learning Lifecycle Management

Master MLflow from basics to advanced with practical examples and an end-to-end project for managing ML lifecycles using this comprehensive guide.

By 
Harsh Daiya user avatar
Harsh Daiya
DZone Core CORE ·
Jul. 17, 24 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
5.1K Views

Join the DZone community and get the full member experience.

Join For Free

MLflow is an open-source platform tailored to handle the whole lifecycle of a machine learning process. This guide, starting from novice and ascending to advanced expert, will cover all the vital features while utilizing Python code. By the end of this guide, you will have a comprehensive understanding of MLflow and will be able to manage experiments, package code, manage models, and deploy them.

Introduction to MLflow

Setting up MLflow

From: “MLflow Tracking” to “Querying experiments”

MLflow is an essential tool to cover the lifecycle of a machine learning process; that scope is comprised of an experiment, its reproducibility, and deployment. Following is a rundown of MLflow’s main components:

  • MLflow tracking: For logging and querying an experiment.
  • MLflow projects: Pack ML code so that it can be reusable and reproducible.
  • MLflow models: Deploying and managing models.
  • MLflow model registry: A repository tailored for managing models.

Setting up MLflow

Installation

The following code is used to install MLflow using pip:

Shell
 
!pip install mlflow


Setting up the Tracking Server

The following code sets up an MLflow tracking server with SQLite for backend storage and the directories ./mlflow.db and ./artifacts for artifacts.

Shell
 
!mlflow server --backend-store-uri sqlite:///mlflow.db --default-artifact-root ./artifacts


MLflow can be used to log and query experiments. Logging requires running a program, and to query experiments, run these lines of code:

Python
 
import mlflow

with mlflow.start_run():  # Start a decorator
    mlflow.log_param("param1", 5)  # Log a parameter
    mlflow.log_metric("metric1", 0.85)  # Log a metric
    mlflow.log_artifact("path/to/artifact")  # Log an artifact


Sample Use Cases

End-to-End Project

Python
 
runs = mlflow.search_runs()
print(runs)


MLflow Projects

MLflow Projects are a way to organize and package your code. A project is simply a directory with an MLproject file.

Creating an MLproject File

Here’s an example of an MLproject file:

Python
 
name: MyProject

conda_env: conda.yaml

entry_points:
  main:
    parameters:
      param1: { type: int, default: 5 }
    command: "python train.py --param1 {param1}"


Running Projects

To run a project, use the mlflow run command:

Shell
 
mlflow run . -P param1=10


MLflow Models

MLflow Models are a standard way to package machine learning models. The idea is that you can use MLflow to save models in a number of different formats, such as Python, R, or even Java.

Saving a Model

Here’s how you save a model in Python:

Python
 
from sklearn.ensemble import RandomForestClassifier

model = RandomForestClassifier()
model.fit(X_train, y_train)

mlflow.sklearn.log_model(model, "model")


Loading a Model

Here’s how you can load a saved model:

Python
 
model = mlflow.sklearn.load_model("runs://model")
predictions = model.predict(X_test)


MLflow Model Registry

The MLflow Model Registry is a central repository for managing your models.

Registering a Model

In order to register a model, you need to first log it, and then you can register it:

Python
 
result = mlflow.register_model("runs://model", "MyModel")


Managing Model Versions

You can then manage the different versions of the model by transitioning them between different stages, such as Staging and Production:

Python
 
from mlflow.tracking import MlflowClient

client = MlflowClient()

client.transition_model_version_stage(
    name="MyModel",
    version=1,
    stage="Production"
)


Advanced Features and Integrations

Integrating With GenAI

MLflow has great support for GenAI models, including things like OpenAI, transformers, and LangChain. Here’s an example of how you would log and deploy an OpenAI model:

Python
 
import mlflow.openai

with mlflow.start_run():
    response = openai.Completion.create(
        model="text-davinci-003",
        prompt="Translate the following English text to French: '{}'",
        max_tokens=60
    )
    mlflow.openai.log_model(response, "openai-model")


Prompt Engineering UI

MLflow’s Prompt Engineering UI allows you to develop and evaluate prompts interactively.

Deployment

Deploying models is easy with MLflow. For example, you can serve a model using MLflow’s REST API:

Shell
 
mlflow models serve -m runs://model --port 1234


Sample Use Cases for MLflow

Use Case 1: Experiment Tracking for Hyperparameter Tuning

When you are tuning hyperparameters for your machine learning models, it is important to track the parameters and results of each experiment to know the best model configuration. Before we go further with this use case if it’s your first time using MLflow, the following steps will guide you on installing the MLflow library.

Let’s continue…

Imagine we have a random forest classifier with hyperparameters to tune:

Python
 
import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

# Loading the data
data = load_iris()
X_train, X_test, y_train, y_test = train_test_split(data.data, data.target, test_size=0.2)

# Combining the hyperparameters we would like to test
n_estimators = [10, 50, 100]
max_depth = [5, 10, 20]

# Starting the MLflow experiment
mlflow.set_experiment("RandomForest_Hyperparameter_Tuning")


YAML
 
conda_env: conda.yaml

entry_points:
  train:
    parameters:
      n_estimators: { type: int, default: 100 }
      max_depth: { type: int, default: 6 }
    command: "python train.py {n_estimators} {max_depth}"


Step 1: Create the Conda Environment

Create a file named conda.yaml and add the following content:

YAML
 
name: wine_quality
dependencies:
  - python=3.7
  - pip
  - scikit-learn
  - pandas
  - mlflow


Then, run the following command to create the conda environment:

YAML
 
conda env create -f conda.yaml


Step 2: Implement the Training Script

Create a file named train.py and add the following script to implement the training logic:

Python
 
import mlflow
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score

def load_data():
    # Load and preprocess the wine quality data
    return X_train, X_test, y_train, y_test

n_estimators = [100, 200, 300]
max_depth = [6, 8, 10]

for n in n_estimators:
    for depth in max_depth:
        with mlflow.start_run():
            # Train model
            model = RandomForestClassifier(n_estimators=n, max_depth=depth)
            model.fit(X_train, y_train)

            # Log parameters and metrics
            mlflow.log_param("n_estimators", n)
            mlflow.log_param("max_depth", depth)
            predictions = model.predict(X_test)
            accuracy = accuracy_score(y_test, predictions)
            mlflow.log_metric("accuracy", accuracy)

            # Log model
            mlflow.sklearn.log_model(model, "model")


In this script, we train multiple random forest classifiers with different hyperparameters and log the results. Replace the load_data() function with the code to load and preprocess the actual wine quality data.

YAML
 
conda_env: conda.yaml

entry_points:
  main:
    parameters:
      n_estimators: { type: int, default: 100 }
      max_depth: { type: int, default: 10 }
    command: "python train.py --n_estimators {n_estimators} --max_depth {max_depth}"


You have created the train.py script to train and log the results of an example model. Now you will create the following files to conduct an MLflow run:

  • The conda.yaml file to specify the conda environment.
  • The train.py file to specify an entry point.
  • Modify the existing load_data.py and winequality_dataset.py files to correct a mistake in the path specification.

Step 3: Define the Conda Environment

Create the conda.yaml file to specify the environment dependencies:

YAML
 
name: wine_quality_env
channels:
  - defaults
dependencies:
  - python=3.8
  - scikit-learn
  - pandas
  - mlflow


Step 4: Write the Training Script

Create the train.py script to train the model and log the results:

Python
 
import argparse
import pandas as pd
import mlflow
import mlflow.sklearn
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import accuracy_score
from sklearn.model_selection import train_test_split

def main(n_estimators, max_depth):
    # Load data
    data = pd.read_csv("data/winequality-red.csv", sep=';')
    X = data.drop("quality", axis=1)
    y = data["quality"]
    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

    # Train model
    model = RandomForestClassifier(n_estimators=n_estimators, max_depth=max_depth)
    model.fit(X_train, y_train)

    # Log parameters and metrics
    with mlflow.start_run():
        mlflow.log_param("n_estimators", n_estimators)
        mlflow.log_param("max_depth", max_depth)
        predictions = model.predict(X_test)
        accuracy = accuracy_score(y_test, predictions)
        mlflow.log_metric("accuracy", accuracy)

        # Log model
        mlflow.sklearn.log_model(model, "model")

if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument("--n_estimators", type=int, default=100)
    parser.add_argument("--max_depth", type=int, default=10)
    args = parser.parse_args()
    main(args.n_estimators, args.max_depth)


Step 5: Running the Project

Run the project using the mlflow run command:

Shell
 
mlflow run . -P n_estimators=200 -P max_depth=15


Step 6: Registering and Deploying the Model

After running the project, you can register the model and deploy it:

Python
 
from mlflow.tracking import MlflowClient

client = MlflowClient()
run_id = ""
model_uri = f"runs:/{run_id}/model"
model_details = client.create_registered_model("WineQualityModel")

# Register model
client.create_model_version(
    name="WineQualityModel",
    source=model_uri,
    run_id=run_id
)


Create a serving version of your model with the following command:

Shell
 
mlflow models serve -m models:/WineQualityModel/1


Step 7: Making Predictions

You can make predictions by sending an HTTP request. Here’s how you can use the requests library for this purpose:

Python
 
import requests
import json

url = "http://127.0.0.1:5001/invocations"
data = {
    "columns": [
        "fixed acidity", "volatile acidity", "citric acid", "residual sugar",
        "chlorides", "free sulfur dioxide", "total sulfur dioxide", "density",
        "pH", "sulphates", "alcohol"
    ],
    "data": [[7.4, 0.7, 0.0, 1.9, 0.076, 11.0, 34.0, 0.9978, 3.51, 0.56, 9.4]]
}

response = requests.post(
    url,
    data=json.dumps(data),
    headers={"Content-Type": "application/json"}
)

print(response.json())


Conclusion

In this guide, I have demonstrated the applications of MLflow through a variety of examples and a comprehensive project. You now have all the information you need to maximize the effectiveness of MLflow by enhancing your machine learning project management process with additional efficiency and functionality. Feel free to use the provided project as a base for future projects and ideas. Note that the information presented is a concise version of the official documentation. For more comprehensive information, refer to the MLflow official guide. In this guide, I covered the overview of the key concepts and useful examples in Python. I hope you found this presentation useful and the examples helpful in learning more about MLflow.

Machine learning Python (language) Open source

Opinions expressed by DZone contributors are their own.

Related

  • Stop Poisoning Your Models: How I Built a CV Dataset Quality Toolkit I Can Reuse Forever
  • Extracting Clean Excel Tables From PDFs Using Python + Docling
  • A Developer's Practical Guide to Support Vector Machines (SVM) in Python
  • Python Development With Asynchronous SQLite and PostgreSQL

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook