DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

SBOMs are essential to circumventing software supply chain attacks, and they provide visibility into various software components.

Related

  • Monitoring Apache Ignite Cluster With Grafana (Part 1)
  • A Comprehensive Guide to Database Sharding: Building Scalable Systems
  • Navigating the Divide: Distinctions Between Time Series Data and Relational Data
  • Time Series for Developers: What the Heck Is Time-Series Data?

Trending

  • Docker Model Runner: A Game Changer in Local AI Development (C# Developer Perspective)
  • The Rise of the Intelligent AI Agent: Revolutionizing Database Management With Agentic DBA
  • Scrum Smarter, Not Louder: AI Prompts Every Developer Should Steal
  • Microservice Madness: Debunking Myths and Exposing Pitfalls
  1. DZone
  2. Data Engineering
  3. Databases
  4. Simplifying InfluxDB: Retention Policy Best Practices

Simplifying InfluxDB: Retention Policy Best Practices

This article takes a look at what a retention policy is as well as some general guidelines on creating the best retention policy for your use case with InfluxDB.

By 
Margo Schaedel user avatar
Margo Schaedel
·
Jun. 22, 18 · Opinion
Likes (1)
Comment
Save
Tweet
Share
18.5K Views

Join the DZone community and get the full member experience.

Join For Free

Retention policies can often be tricky even at the best of times, but when you’re dealing with time series data, setting up the appropriate retention policy to automatically expire (delete) unnecessary data can save you loads of time in the long run. This post will walk through some general guidelines on creating the best retention policy for your use case with InfluxDB.

Wait…What’s a Retention Policy?


influxdb retention policies

Data doesn’t remain useful forever.

Before we start talking about best practices around retention policies, it’s important to understand just what they are. Although its name is somewhat explanatory, an InfluxDB retention policy is defined in the documentation as:

"The part of InfluxDB’s data structure that describes for how long InfluxDB keeps data (duration), how many copies of those data are stored in the cluster (replication factor), and the time range covered by shard groups (shard group duration). RPs are unique per database and along with the measurement and tag set define a series.

When you create a database, InfluxDB automatically creates a retention policy called autogen with an infinite duration, a replication factor set to one, and a shard group duration set to seven days."

So, in a nutshell, a retention policy dictates for how long data will be kept and stored and if you’re using InfluxDB Enterprise, how many of copies of that data to store. Because time series data tends to pile up really quickly, you’re definitely going to want to discard or downsample data from InfluxDB once it’s no longer as useful. If you need further convincing, just check out these blog posts:

  • Optimizing Data Queries for Time Series Applications
  • Simplifying InfluxDB: Shards and Retention Policies

General Guidelines

There are a few key things to consider when you’re setting up your database’s retention policy. First and foremost, you’ll need to consider how long your use case requires that you retain the data. Do you need it for a week? A month? A year? This decision will specifically guide to what amount of time you set your retention policy duration and isn’t really negotiable.

But wait — you’re not done yet. Another integral part of setting up a retention policy involves designating the shard group duration for all data that will follow this retention policy. This is where things get tricky. Since shards really represent the core physical part of the database, tuning the shard group duration to just the right setting can really maximize performance and so, it’s important to get it right.

Setting the duration on the higher side will result in larger collections of data within each shard. This could cause problems when querying the database. For example, if you’re querying the database for a shorter time window than the shard group time span, the database may need to decode longer blocks of data in order to read a subset of the time range of the shard and that process will require greater effort and time.

On the other hand, if you set the shard group duration on the shorter side, the result is a greater number of shard groups. Due to Time Series Indexing, each shard will have some extra overhead in the form of this index and metadata, so having thousands of shards with little data on each is by no means efficient.

retention policies

It can sometimes be difficult to determine the right setting for your shard group duration.

My recommendation is to be like Goldilocks and try them all out until you hit the perfect spot!

Okay, all joking aside — we at InfluxData recommend setting the shard group duration as follows:

  • The shard group duration should be twice your longest typical query’s time range — yep, that means you’ll need to think about what kinds of queries you’ll be running on InfluxDB.
  • The shard group duration should be set so that each shard group ends up with at least 100,000 points per group — you want more data per shard, but not too much data.
  • The shard group duration should be set so that each shard group has at least 1,000 points per series.

Summary

If you’re new to using InfluxDB, setting up your database schema and retention policies can sometimes feel like a daunting task. Especially in more exceptional cases like working with very large clusters (Influx Enterprise) or with very long or short retention periods. You’ll definitely want to spend some time tweaking retention duration and shard group duration until you find the right fit. After all, it took Goldilocks three tries, right? 

InfluxDB Shard (database architecture) Data (computing) Database Time series

Published at DZone with permission of Margo Schaedel, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Related

  • Monitoring Apache Ignite Cluster With Grafana (Part 1)
  • A Comprehensive Guide to Database Sharding: Building Scalable Systems
  • Navigating the Divide: Distinctions Between Time Series Data and Relational Data
  • Time Series for Developers: What the Heck Is Time-Series Data?

Partner Resources

×

Comments

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • [email protected]

Let's be friends: