How To Train, Evaluate, and Deploy a Hugging Face Model

Learn the end-to-end deployment of Hugging Face models.

Banjo Obayomi

Jan. 05, 23 · Tutorial

Likes (1)

Comment

Save

9.3K Views

Building Natural Language Processing (NLP) solutions doesn't have to be hard. With Hugging Face, you can leverage a streamlined developer experience to train, evaluate, and deploy NLP models. In this article, I will walk through an end-to-end example of building a helpful product review classifier from a labeled dataset with an endpoint for inference on AWS.

Prerequisites

For this walk-through, we will be using Python3.7+, Git, and the following pip modules.

   
  
 
   transformers
huggingface_hub
datasets
evaluate
torch
pandas
requests 
  

Collecting Data

In order to build a supervised model, we need data. The AWS Open Data Registry has over 300 datasets ranging from satellite images to climate data. We are going to build a model that can predict whether a review on Amazon.com is helpful by using the Helpful Reviews dataset.

Data Preparation

The data that is provided is in a nonstandard JSON format. Before we can do any training, we have to convert the data to a Pandas Dataframe.

The following code simply gets the raw JSON file and turns the data into an array for processing.

     Python 
   
 
 
   def get_data(url: str):

    """
    Purpose:
        gets the json data
    Args:
        url - location of data
    Returns:
        data_array: array of the data
    """

    data = requests.get(url)
    raw_data = data.text
    data_array = raw_data.split("\n")

    return data_array 
  

This code processes each element of the array by extracting the raw text and then assigns a label of helpful or not helpful based on the score. A Pandas Dataframe is returned with all the elements.

     Python 
   
 
 
   def get_helpful_label_num(helpful_score):
    """
    Purpose:
        gets a label based on score

    Args:
        helpful_score: score of how helpful
    Returns:
        helpful_label: label of how helpful
    """

    if helpful_score < 1:
        return 0
    else:
        return 1

def data_to_df(data_array):
    """
    Purpose:
        turn data to a df
    Args:
        data_array - array of data
    Returns:
        df: df of the data
    """
    
    df_map = {}
    df_map["text"] = []
    df_map["label"] = []

    for item in data_array:
        try:
            item_json = json.loads(item)
        except:
            continue

        df_map["text"].append(item_json["sentence"])

        # Get helpful label from score
        helpful_score = item_json["helpful"]
        helpful_label = get_helpful_label_num(helpful_score)
        df_map["label"].append(helpful_label)
        
    # Create dataframe
    df = pd.DataFrame.from_dict(df_map)

    return df 
  

The following code is how we connect each of the functions to create a data frame from the raw JSON files.

     Python 
   
 
 
   import logging
import json
import requests
import pandas as pd

...

# Code from above

def main():

    """
    Purpose:
        Run the main function
    Args:
        N/A
    Returns:
        N/A
    """

    train_data = "https://helpful-sentences-from-reviews.s3.amazonaws.com/train.json"
    test_data = "https://helpful-sentences-from-reviews.s3.amazonaws.com/test.json"

    train_data = get_data(train_data)
    test_data = get_data(test_data)

    train_df = data_to_df(train_data)
    test_df = data_to_df(test_data)

    train_df.to_csv("helpful_sentences_train.csv", index=False)
    test_df.to_csv("helpful_sentences_test.csv", index=False) 
  

Train Model

Now that we have our data in a standard format, we can begin to train a model. With Hugging Face, we can fine-tune state-of-the-art models without having to train one from scratch. For this example, we will use the DistilBERT base model for our text classification task.

The following code uses a tokenizer to process the text and includes a padding and truncation strategy. Next, the Pandas Dataframe is loaded and transformed into a Hugging Face dataset. Finally, the Datasets map method is applied as a preprocessing function over the entire dataset.

     Python 
   
 
 
   from transformers import TrainingArguments, Trainer
from transformers import AutoModelForSequenceClassification
from transformers import AutoTokenizer
from transformers import DataCollatorWithPadding
import pandas as pd
from datasets import Dataset
from datasets import load_dataset

# Create tokenizer
tokenizer = AutoTokenizer.from_pretrained("distilbert-base-uncased")
data_collator = DataCollatorWithPadding(tokenizer=tokenizer)

# Load Base Model
model = AutoModelForSequenceClassification.from_pretrained(
    "distilbert-base-uncased", num_labels=2
)

def preprocess_function(examples):
    return tokenizer(examples["text"], truncation=True)

# Load Data
test_data = pd.read_csv("helpful_sentences_test.csv")
train_data = pd.read_csv("helpful_sentences_train.csv")

# Create Dataset
test_dataset = Dataset.from_pandas(test_data)
train_dataset = Dataset.from_pandas(train_data)
tokenized_test = test_dataset.map(preprocess_function, batched=True)
tokenized_train = train_dataset.map(preprocess_function, batched=True) 
  

With the datasets prepared, we can finally train the model! The following code loads the accuracy metric, sets up the training arguments, and then finally begins training. Training can take a couple of hours depending on how strong your machine is. This is where leveraging AWS Sagemaker can come in handy. This notebook provides an end-to-end example of how you would train on AWS.

     Python 
   
 
 
   from datasets import load_metric
import numpy as np

# Metrics
metric = load_metric("accuracy")

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    predictions = np.argmax(logits, axis=-1)
    return metric.compute(predictions=predictions, references=labels)

# Training args
training_args = TrainingArguments(
    output_dir="./results",
    learning_rate=2e-5,
    per_device_train_batch_size=16,
    per_device_eval_batch_size=16,
    num_train_epochs=5,
    weight_decay=0.01,
    evaluation_strategy="epoch"
)

# Set up Training job
trainer = Trainer(
    model=model,
    args=training_args,
    train_dataset=tokenized_train,
    eval_dataset=tokenized_test,
    tokenizer=tokenizer,
    data_collator=data_collator,
    compute_metrics=compute_metrics,
)

# Train the model
trainer.train() 
  

Test Model Locally

Once the model has finished training, you can create a Hugging Face pipeline to classify text.

     Python 
   
 
 
   from transformers import (
    AutoTokenizer,
    AutoModelForSequenceClassification,
    TextClassificationPipeline,
)

tokenizer = AutoTokenizer.from_pretrained("results/checkpoint-6000/")
model = AutoModelForSequenceClassification.from_pretrained(
    "results/checkpoint-6000/"
)

pipe = TextClassificationPipeline(model=model, tokenizer=tokenizer)
result = pipe("This was a Christmas gift for my grandson.")
print(result)
#[{'label': 'LABEL_0', 'score': 0.998775064945221}]# This is NOT A HELPFUL comment 
  

Evaluation

Using the Hugging Face evaluate module, we can easily benchmark the model on a variety of predefined metrics such as accuracy or F1 score.

     Python 
   
 
 
   from transformers import (
    pipeline,
)

import pandas as pd
from datasets import Dataset
from evaluate import evaluator
import evaluate

# Define model

pipe = pipeline(
    "text-classification", model="results/checkpoint-6000/"
)

# Define dataset
test_data = pd.read_csv("helpful_sentences_test.csv")
test_dataset = Dataset.from_pandas(test_data)

# Define evaluator
accuracy = evaluate.load("accuracy")

# Evaluate accuracy
eval = evaluator("text-classification")
result = eval.compute(
    model_or_pipeline=pipe,
    data=test_dataset,
    metric=accuracy,
    label_mapping={"LABEL_0": 0, "LABEL_1": 1},
    strategy="bootstrap",
    n_resamples=200,
)

print(result)
# {'accuracy': {'confidence_interval': (0.7991824857067859, 0.837), 'standard_error': 0.009116387310110274, 'score': 0.823}}
# Evaluate F1 score

f1_metric = evaluate.load("f1")

result = eval.compute(
    model_or_pipeline=pipe,
    data=test_dataset,
    metric=f1_metric,
    label_mapping={"LABEL_0": 0, "LABEL_1": 1},
    strategy="bootstrap",
    n_resamples=200,
)

print(result)
#{'f1': {'confidence_interval': (0.8619904693250219, 0.8852541552022207), 'standard_error': 0.006116815390939261, 'score': 0.8729361091170136}} 
  

Upload Model to the Hugging Face Hub

Now we can finally upload our model to the Hugging Face Hub. The new model URL will let you create a new model Git-based repo.

Once the repo is created, you can then clone the repo and push the model artifacts from the result folder. The following code highlights how I pushed my model.

   
  
 
   git lfs install
git clone https://huggingface.co/banjtheman/distilbert-base-uncased-helpful-amazon
cp -r PATH_TO_MODEL/results/checkpoint-6000/* distilbert-base-uncased-helpful-amazon/
cd distilbert-base-uncased-helpful-amazon/
git add .
git commit -am "initial commit"
git push 
  

Here is the model I uploaded. Once in the cloud, it can be used by the community.

Deploy to Serverless Endpoint

Finally, we can deploy the model as a Sagemaker endpoint so it can be used as an API. Using Amazon Sagemaker Studio, we can deploy Hugging Face Hub models from a notebook. The following code sets up the settings of the endpoint.

     Python 
   
 
 
   import sagemaker
from sagemaker.huggingface.model import HuggingFaceModel
from sagemaker.serverless import ServerlessInferenceConfig

# Specify Model Image_uri
image_uri='763104351884.dkr.ecr.us-east-1.amazonaws.com/huggingface-pytorch-inference:1.9.1-transformers4.12.3-cpu-py38-ubuntu20.04'

# Hub Model configuration. <https://huggingface.co/models>
# NOTE: replace with your model

hub = {
    'HF_MODEL_ID':'banjtheman/distilbert-base-uncased-helpful-amazon',
    'HF_TASK':'text-classification'
}

# create Hugging Face Model Class
huggingface_model = HuggingFaceModel(
   env=hub,                      # configuration for loading model from Hub
   role=sagemaker.get_execution_role(), # iam role with permissions                                        
   transformers_version="4.17",  # transformers version used
   pytorch_version="1.10",        # pytorch version used
   py_version='py38',            # python version used
   image_uri=image_uri,          # image uri
)

# Specify MemorySizeInMB and MaxConcurrency
serverless_config = ServerlessInferenceConfig(
    memory_size_in_mb=4096, max_concurrency=10,
)

# deploy the serverless endpoint
predictor = huggingface_model.deploy(
    serverless_inference_config=serverless_config
) 
  

This code shows how you can make predictions by passing in data and using boto3 to invoke the endpoint from a Lambda function.

     Python 
   
 
 
   import boto3
import json

data = {
  "inputs": "This was a Christmas gift for my grandson.",
}

user_encode_data = json.dumps(data, indent=2).encode('utf-8')
client = boto3.client('sagemaker-runtime')
endpoint_name = predictor.endpoint_name
content_type = "application/json"

response = client.invoke_endpoint(
    EndpointName=endpoint_name,
    ContentType=content_type,
    Body=user_encode_data)

result = json.loads(response['Body'].read().decode())
print(result) 
  

Conclusion

In this article, we discussed how to successfully achieve the following:

Extract, Transform, and Load datasets from AWS Open Data Registry
Train a Hugging Face model
Evaluate the model
Upload the model to Hugging Face hub
Create a Sagemaker endpoint for the model
Create an API for inference

The flexibility of Hugging Face allows this workflow to be run locally or in the cloud. With this workflow, you can easily add customizations based on your workload needs, such as changing the modeling task or hyperparameter tuning. The combination of Hugging Face to develop models and AWS for deployment and compute provides a developer-first experience for building NLP solutions.

To watch a presentation on building Hugging Face Models check out my talk at re:Invent 2022.

AWS Cloud

Opinions expressed by DZone contributors are their own.

Related

Trending