DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Related

  • Designing Self-Healing AI Infrastructure: The Role of Autonomous Recovery
  • How to Use AI for Anomaly Detection
  • Inventory Predictions With Databricks
  • Machine Learning and AI in IIoT Monitoring: Predictive Maintenance and Anomaly Detection

Trending

  • From AI Chaos to Control: Building Enterprise-Grade LLM Gateways With MuleSoft Anypoint
  • One Query, Four GPUs: Tracing a Distributed Training Stall Across Nodes
  • Catching Data Perimeter Drift Before It Reaches Production
  • The Hidden Cost of Overprivileged Tokens: Designing Messaging Platforms That Assume Compromise
  1. DZone
  2. Data Engineering
  3. AI/ML
  4. Leveraging Snowflake’s AI/ML Capabilities for Anomaly Detection

Leveraging Snowflake’s AI/ML Capabilities for Anomaly Detection

Explore the AI/ML capabilities of Snowflake, focusing on leveraging the SNOWFLAKE.ML.ANOMALY_DETECTION function to detect anomalies in superstore sales.

By 
Kapil Kumar Sharma user avatar
Kapil Kumar Sharma
·
Aug. 08, 24 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
5.6K Views

Join the DZone community and get the full member experience.

Join For Free

Anomaly detection is the process of identifying the data deviation from the expected results in a time-series data. This deviation can have a huge impact on forecasting models if not identified before the model creation. Snowflake Cortex AL/ML suite helps you train the models to spot and correct these outliers in order to help improve the quality of your results. Detecting outliers also helps in identifying the source of the deviations in processes.

Anomaly detection works with both single and multi-series data. Multi-series data represents multiple independent threads of events. For example, if you have sales data for multiple stores, each store’s sales can be checked separately by a single model based on the store identifier. These outliers can be detected in time-series data using the Snowflake built-in class SNOWFLAKE.ML.ANOMALY_DETECTION.

Please follow the steps below to implement the anomaly detection in a time-series dataset.

  • Create an anomaly detection object by passing the training data. This object fits a model to the training data that you provide.
  • Using this anomaly detection model object, call the DETECT_ANOMALIES function to identify anomalies by passing the data to analyze.

In this article, I will be focusing on leveraging the SNOWFLAKE.ML.ANOMALY_DETECTION function to detect anomalies in superstore sales.

Data Setup and Exploration

In this article, we will be using the historical Technology sales data for a superstore. The following code can be used to explore the technology sales. 

SQL
 
select * from superstore.superstore_ml_functions.superstore_sales where category = 'Technology';


select * from superstore.superstore_ml_functions.superstore_sales where category = 'Technology'; results

Having explored the historical sales, let’s create a table to store the last year of sales. This data will be used as training data.

SQL
 
CREATE OR REPLACE TABLE superstore_tech_sales_last_year AS (
    SELECT
     to_timestamp_ntz(Order_Date) AS timestamp,
        Segment,
        Category,
        Sub_Category,
        Sales
    FROM
        superstore_sales
    WHERE
        Order_Date > (SELECT max(Order_Date) - interval '1 year' FROM superstore_sales where category = 'Technology')
        and category = 'Technology'
    GROUP BY
        all
);

CREATE OR REPLACE TABLE superstore_tech_sales_historical AS (
    SELECT
        to_timestamp_ntz(Order_Date) AS timestamp,
        Segment,
        Category,
        Sub_Category,
        Sales
    FROM
        superstore_sales
    WHERE
        Order_Date <= (SELECT max(Order_Date) - interval '1 year' FROM superstore_sales where category = 'Technology')
        and category = 'Technology'
    GROUP BY
        all
);


Identifying Anomalies

In this section, we will focus on creating training datasets, analysis datasets, and models to detect the anomalies in a time series dataset.

The following code can be used to create training datasets for 6 months of historical sales to enrich the forecast models to detect the anomalies.

SQL
 
CREATE OR REPLACE VIEW superstore_tech_sales_historical_training
  AS SELECT timestamp,sum(sales) as sales  FROM superstore_tech_sales_historical where timestamp <= '2022-06-30' and timestamp >= '2022-01-01' group by timestamp;


After creating the training dataset, let's create the model to detect the anomalies using SNOWFLAKE.ML.ANOMALY_DETECTION class, key parameters to this function are as follows.

  • Model name: anomaly_basic_model
  • Training dataset: superstore_tech_sales_historical_training

Along with these two key attributes, we also need to specify the timestamp column and key metric column in the dataset. In our use, SALES is the key metric where we want to identify the outliers. This call might take a few minutes to build the models.

SQL
 
  CREATE OR REPLACE SNOWFLAKE.ML.ANOMALY_DETECTION anomaly_basic_model(
  INPUT_DATA => TABLE(superstore_tech_sales_historical_training),
  TIMESTAMP_COLNAME => 'TIMESTAMP',
  TARGET_COLNAME => 'SALES',
  LABEL_COLNAME => '');


Instance ANOMALY_BASIC_MODEL successfully created

Now that we have the basic model ready, we will create the data to analyze using this model. You can use the code below to create a view. This view will be the source for the anomaly detection.

SQL
 
CREATE OR REPLACE VIEW superstore_tech_sales_for_analysis
  AS SELECT timestamp,sum(sales) as sales  FROM superstore_tech_sales_last_year where timestamp <= '2023-01-07' group by timestamp;


After creating the training data, models, and analysis datasets, the final stage in this process is to identify the anomalies. Please use the code below to look for the outliers in our analysis dataset.

SQL
 
CALL anomaly_basic_model!DETECT_ANOMALIES(
  INPUT_DATA => TABLE(superstore_tech_sales_for_analysis),
  TIMESTAMP_COLNAME =>'TIMESTAMP',
  TARGET_COLNAME => 'SALES'
);


The above code will forecast the actuals, lower band, and upper band, and also call out if there are any anomalies in the data.

Results forecasting actuals, lower and upper bands, and call out of anomalies in data

While detecting the anomalies, you can also provide the labeled data to the model. For example, if you want to identify a few abnormal sales and want the forecast models to consider them as outliers while forecasting the sales, you can use labeled data as part of the LABEL_COLNAME parameter. This is called supervised anomalies detection.

The following code block will create a new training model with an additional attribute called LABEL. This will be a boolean type to identify the outliers. Any sale of $1,000 or more is being labeled here. 

SQL
 
CREATE OR REPLACE VIEW superstore_tech_sales_historical_training_with_label
  AS SELECT DATE_TRUNC('day',timestamp) as timestamp,sum(sales) as sales, case when sum(sales) > 1000 then true else false end as label  FROM superstore_tech_sales_historical where timestamp <= '2022-01-30' and timestamp >= '2022-01-01' group by DATE_TRUNC('day',timestamp);

CREATE OR REPLACE SNOWFLAKE.ML.ANOMALY_DETECTION anomaly_labeled_model(
  INPUT_DATA => TABLE(superstore_tech_sales_historical_training_with_label),
  TIMESTAMP_COLNAME => 'TIMESTAMP',
  TARGET_COLNAME => 'SALES',
  LABEL_COLNAME => 'LABEL');

CALL anomaly_labeled_model!DETECT_ANOMALIES(
  INPUT_DATA => TABLE(superstore_tech_sales_for_analysis),
  TIMESTAMP_COLNAME =>'TIMESTAMP',
  TARGET_COLNAME => 'SALES'
); 


The following is the output for a supervised forecast model.

Output for a supervised forecast model

Conclusion

In this article, we have explored the Snowflake AI/ML capabilities to uncover the anomalies by creating forecast models. As a next step, I would recommend continued learning the Snowflake Cortex framework. You can explore designing anomaly visualizations, and create an automated anomaly detection pipeline for recurring training and execution.

AI Anomaly detection Time series Data (computing) Label

Opinions expressed by DZone contributors are their own.

Related

  • Designing Self-Healing AI Infrastructure: The Role of Autonomous Recovery
  • How to Use AI for Anomaly Detection
  • Inventory Predictions With Databricks
  • Machine Learning and AI in IIoT Monitoring: Predictive Maintenance and Anomaly Detection

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

  • RSS
  • X
  • Facebook

ABOUT US

  • About DZone
  • Support and feedback
  • Community research

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 215
  • Nashville, TN 37211
  • [email protected]

Let's be friends:

  • RSS
  • X
  • Facebook