DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Classification With XGBoost Algorithm in a Database
  • The Quantum Computing Mirage: What Three Years of Broken Promises Have Taught Me
  • Discover Hidden Patterns with Intelligent K-Means Clustering
  • Predicting Diabetes Types: A Deep Learning Approach

Trending

  • Ujorm3: A New Lightweight ORM for JavaBeans and Records
  • How to Test a PATCH API Request With REST-Assured Java
  • Smart Deployment Strategies for Modern Applications
  • From APIs to Actions: Rethinking Back-End Design for Agents
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Predicting Ad Viewability With XGBoost Regressor Algorithm

Predicting Ad Viewability With XGBoost Regressor Algorithm

This article explores how data-driven decision-making enhances digital advertising, focusing on predicting ad viewability rates using machine learning.

By 
MELTEM TOSUN user avatar
MELTEM TOSUN
·
Apr. 01, 25 · Tutorial
Likes (2)
Comment
Save
Tweet
Share
1.9K Views

Join the DZone community and get the full member experience.

Join For Free

Extracting insights from data and leveraging them in decision-making processes is becoming increasingly prevalent. Today, this approach enables impactful actions across various domains, including digital advertising. In the world of digital advertising, interactions between users exposed to ads and the platforms displaying those ads hold critical importance.

Matches between advertisers and publishers are facilitated by auctions conducted within milliseconds on demand-side platforms (DSPs) and supply-side platforms (SSPs). Accurate matching is vital for both user satisfaction and revenue management. One of the most important KPIs in this regard is the ad viewability rate.

Delivering the right ads to the right audience plays a significant role in maximizing advertising budget efficiency and improving user satisfaction. In this article, we’ll develop a machine learning model to predict ad viewability rates. We’ll use the highly effective XGBoost Regressor algorithm for this task.

Modeling Process

1. Importing Libraries

The first step is to import the Python libraries we’ll use in our project. If you encounter any missing libraries, you’ll need to install them using the pip install command.

Import Python libraries

2. Preparing the Data

Next, we import the dataset we’ll use in our model.

Import the dataset

After importing the data, several preprocessing steps are required. For example, the creative_adsize variable, which represents ad sizes, is initially a categorical variable (e.g., "640x480"). However, since ad size significantly influences the target variable, we transform this feature into a numerical format to enhance the model's understanding of ad dimensions. 

Additionally, we identify and remove outliers within variable values and handle missing data by imputing the mean or applying other appropriate methods based on the variable's characteristics.

Remove outliers within variable values and handle missing data

We also derive month and day variables from the date feature, as seasonal effects and factors like the beginning or end of the month may influence ad viewability rates.


To include categorical variables in our model, we perform encoding:

  • One-hot encoding. Applied to variables with a limited number of categories, such as creative_type, SSP, and device_type.
  • Hash encoding. Used for variables with a larger number of categories, such as browser and ad_unit.

Perform encoding

The final set of features in our model includes:

Final set of features

The column’s descriptions are:

Variables descriptions

Creative_adsize

Advertisement’s size.

Creative_type

Advertisement type, for example, image or video.

Ad_unit

Advertisement’s placement ID in the website.

Ssp

Supply side platform.

Browser

User’s website browser.

Device_type

User’s device type, for example, phone, personal computer, tablet.

Rate_viewability

The target variable in this model shows the advertisement’s rate of viewability.

Month

The month of the user’s visit.

Day

The day of the user’s visit.


3. Training the Model

After completing the encoding and feature engineering process, we split the data into training and testing sets. Also, the XGBoost algorithm includes numerous parameters designed to improve the model's ability to generalize better. 

To determine which parameter values are most suitable for your specific dataset and task, you can use hyperparameter optimization methods such as Grid Search, Random Search, or Bayesian Optimization. These methods help systematically explore different parameter combinations to find the optimal settings that balance model performance and prevent overfitting. 

Split the data into training and testing sets

4. Evaluating Model Performance

To evaluate the model's performance, we use metrics like Mean Squared Error (MSE) and R².

Evaluate the model's performance

When evaluating R² and Mean Square Error scores, we expect the R² value to be high and the Mean Square Error to be low. However, an excessively high R² value is not always a good result, as the model might be overfitting. This aspect should also be considered, and we will discuss how to assess the risks of overfitting and underfitting in this text.

An R² value of 0.74 currently seems sufficient for our model. Our MSE value is also 0.03, which is not bad. To improve the results, we can increase the dataset size, add new features, or fine-tuning model parameters for better performance.

To ensure the model performs consistently across different samples, we use K-Fold Cross Validation. We determine the optimal value of K using the Elbow Method, setting K to 4.

Elbow Method

K-Fold Cross Validation

For all 4 folds, we observe a balanced distribution of R² scores. They are 0.75, 0.74, 0.75, and 0.74. 

If our test results using k-cross validation had been unbalanced, the clusters with values significantly different from the model's R² value would have needed to be examined separately.

5. Feature Importance Analysis

To understand which variables have the most significant impact on the model, we use the Permutation Importance method. This analysis helps visualize how independent variables affect the dependent variable.

Permutation Importance method

Using the Permutation Importance method, we find that the most impactful features are ad_unit, SSP, browser, creative_adsize.

Meanwhile, the least impactful features include month, device_type, day, and creative_type.

To analyze which feature has a greater impact on model performance, there are methods other than the Permutation method. Gain, Weight, and SHAP are some of them. If a detailed analysis is needed, SHAP values can be used.

6. Overfitting and Underfitting Checks

To check if the model suffers from overfitting or underfitting, we use Learning Curve Analysis. The visualizations show no signs of overfitting or underfitting. Moreover, the test model performance improves as the data size increases. 

Although there is overfitting at the beginning, as the amount of data increases, we see that the overfitting decreases and the model generalizes better.

Learning Curve Analysis

Learning Curve Analysis 2

Conclusion

In this article, we developed a model to predict ad viewability rates using the XGBoost Regressor algorithm. We hope this study proves helpful for you as well. Feel free to share your thoughts and questions! 

Machine learning XGBoost Algorithm

Opinions expressed by DZone contributors are their own.

Related

  • Classification With XGBoost Algorithm in a Database
  • The Quantum Computing Mirage: What Three Years of Broken Promises Have Taught Me
  • Discover Hidden Patterns with Intelligent K-Means Clustering
  • Predicting Diabetes Types: A Deep Learning Approach

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook