DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workloads.

Related

  • Optimizing Performance in Azure Cosmos DB: Best Practices and Tips
  • .NET 9 and C# 13: New Features and Improvements
  • How to Create and Publish Azure Functions in Java
  • Rust, WASM, and Edge: Next-Level Performance

Trending

  • Event-Driven Microservices: How Kafka and RabbitMQ Power Scalable Systems
  • Distributed Consensus: Paxos vs. Raft and Modern Implementations
  • After 9 Years, Microsoft Fulfills This Windows Feature Request
  • Introduction to Retrieval Augmented Generation (RAG)
  1. DZone
  2. Software Design and Architecture
  3. Performance
  4. Indexed View for Aggregating Metrics

Indexed View for Aggregating Metrics

In this article, we will learn how to utilize the Indexed View capability of Azure SQL Server for tuning query performance and reducing latency.

By 
Ankur Peshin user avatar
Ankur Peshin
·
Feb. 10, 25 · Tutorial
Likes (1)
Comment
Save
Tweet
Share
3.6K Views

Join the DZone community and get the full member experience.

Join For Free

Microsoft Azure SQL is a robust, fully managed database platform designed for high-performance querying, relational data storage, and analytics. For a typical web application with a backend, it is a good choice when we want to consider a managed database that can scale both vertically and horizontally.

An application software generates user metrics on a daily basis, which can be used for reports or analytics. Azure SQL is a great choice to consider for storing and querying this data under certain conditions:

  1. The analytical queries require joins with other tables (applying filters on UI)
  2. You want to combine historical and transactional data
  3. The data volume is not extremely large, and query performance can be managed by tuning

Let's consider an example of a hotel booking site running Azure SQL in the backend. We want to see a UI dashboard for tracking user activity, such as clicks on the site, visits to the hotel description page, bookings made, etc.

Let's assume all this telemetry data is dumped for each user on a daily basis in unstructured storage, and we are pulling this data into our database using background jobs, such as Apache Airflow.

Below is the schema for users table and a table to store daily metrics.

MS SQL
 
CREATE TABLE users (
    id INT AUTO_INCREMENT PRIMARY KEY,
    first_name VARCHAR(100) NOT NULL,       
    last_name VARCHAR(100) NOT NULL,        
    email VARCHAR(255) UNIQUE NOT NULL,     
    password_hash VARCHAR(255) NOT NULL, 
  
    age INT ,               
    city VARCHAR(100),                      
    country VARCHAR(100),                  
    currency VARCHAR(10),   
    last_login VARCHAR,
    hotel_preference VARCHAR(100)
);

CREATE TABLE daily_user_metrics (
    id BIGINT IDENTITY PRIMARY KEY,       -- Unique identifier for the record
    user_id BIGINT NOT NULL,              -- Foreign key to the users table
    
    clicks INT DEFAULT 0,                 -- Number of total site clicks
    visits INT DEFAULT 0,                 -- Number of visits to the hotel page
    bookings INT DEFAULT 0,          	  -- Number of bookings
    reviews INT DEFAULT 0,      	  -- Number of reviews
    cancellations INT DEFAULT 0,          -- Number of cancellations

    date_created DATE, 			  -- Daily metrics for each user
);


You can draw many insights from the above tables.

Let's consider one particular example. We need to aggregate daily activity metrics grouped by date in descending order for customers between the ages of 30 and 40 located in New York City. Below is the query:

MS SQL
 
SELECT
    date_created,
    SUM(clicks) AS total_clicks,
    SUM(visits) AS total_visits,
    SUM(bookings) AS total_bookings,
    SUM(reviews) AS total_reviews,
    SUM(cancellations) AS total_cancellations,
FROM
    daily_user_metrics m
INNER JOIN users u on m.user_id = u.id
WHERE
   u.age BETWEEN 30 and 40
   and u.city ='New York'   
   and m.date_created BETWEEN :startDate and :endDate
GROUP BY
	date_created 
ORDER BY
    date_created DESC


Now, we can analyze the daily trends from this group of users who are in New York and between the ages of 30 and 40. The table is mostly performant, and we are able to easily perform range queries spread across multiple months. Eventually, our requirements grow. We now want to track user behavior in a weekly or monthly range. But our table stores the data on a daily basis. We now have two options:

  • Query the table and group the date_created weekly or monthly, depending on the ask.
  •  Create a couple of views that aggregate the data on a weekly or monthly basis per user. See the query below:
MS SQL
 
CREATE VIEW weekly_user_metrics AS
SELECT
    DATEADD(DAY, -(DATEPART(WEEKDAY, date) - 1), date) AS week_start, -- Start of the week (Sunday)
    SUM(clicks) AS total_clicks,
    SUM(visits) AS total_visits,
    SUM(bookings) AS total_bookings,
    SUM(reviews) AS total_reviews,
    SUM(cancellations) AS total_cancellations,
FROM
    daily_user_metrics m
INNER JOIN users u on m.user_id = u.id
WHERE
   u.age BETWEEN 30 and 40
   and u.city ='New York'   
   and m.date_created BETWEEN :startDate and :endDate    
GROUP BY
    DATEADD(DAY, -(DATEPART(WEEKDAY, date) - 1), date) -- Group by week start
ORDER BY
	DATEADD(DAY, -(DATEPART(WEEKDAY, date) - 1), date) DESC -- Sort by latest week


However, one important thing to consider is that views just provide an abstraction to the underlying query which simply queries the underlying table. Materialized Views are the next thought that comes to mind. However, they need to be refreshed manually or on a schedule, due to which real-time data is not available.

To address these issues, Azure SQL Server offers a great feature known as Indexed View. An Indexed View is a physical representation of a view stored in the database with a unique clustered index. Changes to the underlying tables automatically update the indexed view to keep it in sync. It uses a clustered index that organizes the data in the view based on the order of the index keys.

The indexed view is ideal for scenarios where we need real-time data, and our query involves complex multi-table joins. It is also suitable for our use case where existing data is rarely updated but queried often, and we have range-based queries and want to do ordered retrieval.

There are some things to consider before deciding whether you want to go for indexed views. Indexed views cannot have non-deterministic functions. A non-deterministic function is a function that does not always return the same result for the same input, even when executed with identical arguments and under the same database conditions. Also, an indexed view is an actual structure that requires storage, similar to a Materialized View.

The syntax for creating an Indexed View is similar to the View creation query above. However, we cannot have non-deterministic functions while creating an indexed view. The line DATEADD(DAY, -(DATEPART(WEEKDAY, date) - 1), date) AS week_start in the view query above depends on the session-specific SET DATEFIRST setting, which determines the first day of the week. This is considered non-deterministic as it will produce different results for different conditions.

Keeping the above things in mind, we can proceed to eliminate the non-deterministic computation by making the column deterministic. We add a week_start column to the underlying table and precompute and fill the week_start value in the table for daily data pulls. So, the rows with dates D1 through D7 belong to W1, D8 to D14 belong to W2, and so on. 

Now, we can proceed to create an indexed view with the SQL below.

MS SQL
 
ALTER TABLE daily_user_metrics ADD week_start DATE;-- Populate this column with first day of the week going forward

CREATE VIEW dbo.weekly_user_metric_aggregations_view
WITH SCHEMABINDING
AS
SELECT
    user_id,
    week_start,
    SUM(clicks) AS total_clicks,
    SUM(visits) AS total_visits,
    SUM(bookings) AS total_bookings,
    SUM(reviews) AS total_reviews,
    SUM(cancellations) AS total_cancellations,
    COUNT_BIG(*) AS row_count --SQL Server requires COUNT_BIG(*) in indexed views to handle scenarios where the count exceeds the range of an INT data type.
FROM
    dbo.daily_user_metrics
GROUP BY
    user_id,
    week_start;
    
    
CREATE UNIQUE CLUSTERED INDEX IX_weekly_user_metric_aggregations_view
ON dbo.weekly_user_metric_aggregations_view (user_id, week_start);    


After this indexed view is created, we can query it as follows:

MS SQL
 
SELECT
    week_start,
    SUM(total_clicks) AS total_clicks,
    SUM(total_visits) AS total_visits,
    SUM(total_bookings) AS total_bookings,
    SUM(total_reviews) AS total_reviews,
    SUM(total_cancellations) AS total_cancellations,
FROM
    weekly_user_metric_aggregations_view mv
INNER JOIN users u on mv.user_id = u.id
WHERE
   u.age BETWEEN 30 and 40
   and u.city ='New York'   
   and m.date_created BETWEEN :startDate and :endDate
GROUP BY
	week_created 
ORDER BY
    week_created DESC


Conclusion

An indexed view will have a significantly faster query time than a basic view or querying the table directly for large datasets where data has grown up to a million rows or more. The end user will have a low latency experience, and queries to the database will be optimized. Since we aggregated a week's worth of data in 1 row, we have cut the aggregation time taken to 1/7th.

Apache Airflow azure Data Types Performance

Opinions expressed by DZone contributors are their own.

Related

  • Optimizing Performance in Azure Cosmos DB: Best Practices and Tips
  • .NET 9 and C# 13: New Features and Improvements
  • How to Create and Publish Azure Functions in Java
  • Rust, WASM, and Edge: Next-Level Performance

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!