DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
View Events Video Library
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Integrating PostgreSQL Databases with ANF: Join this workshop to learn how to create a PostgreSQL server using Instaclustr’s managed service

Mobile Database Essentials: Assess data needs, storage requirements, and more when leveraging databases for cloud and edge applications.

Monitoring and Observability for LLMs: Datadog and Google Cloud discuss how to achieve optimal AI model performance.

Automated Testing: The latest on architecture, TDD, and the benefits of AI and low-code tools.

Related

  • Improving Inventory Management Using Machine Learning and Artificial Intelligence
  • A Comprehensive Guide to Hyperparameter Tuning: Exploring Advanced Methods
  • Advantages of Python as an AI and ML Development Language
  • Revolutionizing Inventory Management With Artificial Intelligence: A Comprehensive Guide

Trending

  • Docker and Kubernetes Transforming Modern Deployment
  • Best Practices for Writing Clean Java Code
  • Revolutionizing Software Testing
  • REST vs. Message Brokers: Choosing the Right Communication
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Time Series Analysis: VARMAX-As-A-Service

Time Series Analysis: VARMAX-As-A-Service

VARMAX-As-A-Service is an MLOps approach for the unification and reuse of statistical models and machine learning models deployment pipelines.

Daniela Kolarova user avatar by
Daniela Kolarova
CORE ·
Aug. 29, 23 · Tutorial
Like (3)
Save
Tweet
Share
3.37K Views

Join the DZone community and get the full member experience.

Join For Free

VARMAX-As-A-Service is an MLOps approach for the unification and reuse of statistical models and machine learning models deployment pipelines. It is the first of a series of articles that will be built on top of that project, representing experiments with various statistical and machine learning models, data pipelines implemented using existing DAG tools, and storage services, both cloud-based and alternative on-premises solutions. 

But what is VARMAX and statistical models in general and how are they different from machine learning models?

Statistical Models

Statistical models are mathematical models, and so are machine learning models. A statistical model is usually specified as a mathematical relationship between one or more random variables and other non-random variables. As such, a statistical model is "a formal representation of a theory" and belongs to the field of statistical inference. Some statistical models can be used to make predictions using predefined mathematical formulas and estimating coefficients based on historical data. Statistical models explicitly specify a probabilistic model for the data and identify variables that are usually interpretable and of special interest, such as the effects of predictor variables. In addition to identifying relationships between variables, statistical models establish both the scale and significance of the relationship.

Machine Learning Models

Machine learning models can be interpreted as mathematical models, too. ML models can also build predictions based on historical data without explicit programming. However, those models are more empirical. ML usually does not impose relationships between predictors and outcomes nor isolate the effect of any single variable. The relationships between variables might not be understandable, but what we are provided after are predictions.

Now, let us focus on predictions or forecasts applied to time series data. Time series data can be defined as  (successive measurements made from the same source over a fixed time interval) using statistical modeling and machine learning. Both are represented by an underlying mathematical model; they need data to be trained/parameterized, and they produce new time series representing foretasted values.  Having those similarities, we will apply the approach used to expose machine learning models as services to a statistical model used for time series forecasting called VARMAX.

VARMAX

VARMAX is a statistical model or, generally speaking, a procedure that estimates the model parameters and generates forecasts. Often, economic or financial variables are not only contemporaneously correlated with each other but also correlated with each other’s past values. VARMAX procedure can be used to model these types of time relationships.  

This article is based on an application called VARMAX-As-A-Service that can be found on a dedicated GitHub repository. It is comprised of two main components:

  • Runtime component — a dockerized deployable REST service
  • Preprocessing component — a set of Python functions responsible for data loading, model optimization, model instantiation, and model serialization, enabling its future reuse

The architecture of the runtime component comprising the application is depicted in the following picture:

VARMAX-AS-A-Service

The user sends in a request via a browser to an Apache Web Server hosting the model. Behind the scenes, this is a Python Flask application that is calling a previously configured and serialized pickle file model. Note Pickle is a serialization module in Python's standard library. 

Flask is a lightweight WSGI web application framework. It is designed to make getting started quick and easy, with the ability to scale up to complex applications. Flask is a good choice as a framework for the implementation of a web service exposing the statistical model API, and it also provides a web server for testing. However, for deployment in production, we need a web server and gateway interface. The Docker image created in that project is deploying the Flask application using Apache httpd and WSGI (Web Server Gateway Interface) on a Linux-based system. Apache is a powerful and widely-used web server, while WSGI is a standard interface between web servers and Python applications. Apache httpd is a fast, production-level HTTP server. Serving as a “reverse proxy,” it can handle incoming requests, TLS, and other security and performance concerns better than a WSGI server. 

The Docker image, as well as the model code, can be found on a dedicated GitHub repository.

REST services can be easily integrated into existing web applications as part of an algorithm or as a step in a DAG of a prediction data pipeline (see Apache Airflow, Apache Beam,  AWS Step Functions, Azure ML Pipeline). The integration as a part of the pipeline will be the focus of an upcoming article while this one exposes the service as Swagger documented endpoint and Swagger UI for testing and experimenting with various input datasets. 

After deploying the project, the Swagger API is accessible via <host>:<port>/apidocs (e.g., 127.0.0.1:80/apidocs). There are two endpoints implemented, using the user's input parameters and sending an input file:

SwaggerAPI

Internally, the service uses the deserialized model pickle file:

model pickle file

 Requests are sent to the initialized model as follows:

requests sent

Prior to the implementation of the REST service and its deployment, the actual model needs to be prepared. In the picture below, the step needed to prepare the model for deployment is called a preprocessing step. It should not be confused with the term data processing from data analysts. In the example project, the data set used to optimize the model parameters is called United States Macroeconomic data and is provided by the Python library statsmodels without the need to apply additional data processing. The preprocessing algorithm is comprised of the following steps:

  • Load data
  • Divide data into train and test data set 
  • Prepare exogenous variables
  • Find optimal model parameters (p, q) 
  • Instantiate the model with the optimal parameters identified
  • Serialize the instantiated model to a pickle file 

And finally, the steps needed to run the application are:

  1. Execute model preparation algorithm:  python varmax_model.py
  2. Deploy application: docker-compose up -d
  3. Test model: http://127.0.0.1:80/apidocs

The presented project is a simplified workflow that can be extended step by step with additional functionalities like:

  • Store model files in a remote repository (e.g., Relational Database, MinIO Service, S3 Storage)
  • Explore standard serialization formats and replace the pickle with an alternative solution
  • Integrate time series data visualization tools like Kibana or Apache Superset
  • Store time series data in a time series database like Prometheus, TimescaleDB, InfluxDB
  • Extend the pipeline with data loading and data preprocessing steps
  • Track model versions
  • Incorporate metric reports as part of the 
  • Implement pipelines using specific tools like Apache Airflow or AWS Step Functions or more standard tools like Gitlab or GitHub
  • Compare statistical models' performance and accuracy with machine learning models
  • Implement end-to-end cloud-integrated solutions, including Infrastructure-As-Code
  • Expose other statistical and ML models as services

Some of these future improvements will be the focus of the next articles and projects. The goal of this article is to build the basic project structure and a simple processing workflow that can be extended and improved over time. However, it represents an end-to-end infrastructural solution that can be deployed on production and improved as part of a CI/CD process over time.

Machine learning Time series Web server Python (language)

Opinions expressed by DZone contributors are their own.

Related

  • Improving Inventory Management Using Machine Learning and Artificial Intelligence
  • A Comprehensive Guide to Hyperparameter Tuning: Exploring Advanced Methods
  • Advantages of Python as an AI and ML Development Language
  • Revolutionizing Inventory Management With Artificial Intelligence: A Comprehensive Guide

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends: