DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Refcards Trend Reports Events Over 2 million developers have joined DZone. Join Today! Thanks for visiting DZone today,
Edit Profile Manage Email Subscriptions Moderation Admin Console How to Post to DZone Article Submission Guidelines
View Profile
Sign Out
Refcards
Trend Reports
Events
Zones
Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones AWS Cloud
by AWS Developer Relations
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Partner Zones
AWS Cloud
by AWS Developer Relations
Securing Your Software Supply Chain with JFrog and Azure
Register Today

Trending

  • Competing Consumers With Spring Boot and Hazelcast
  • Microservices With Apache Camel and Quarkus (Part 2)
  • Health Check Response Format for HTTP APIs
  • Writing a Vector Database in a Week in Rust

Trending

  • Competing Consumers With Spring Boot and Hazelcast
  • Microservices With Apache Camel and Quarkus (Part 2)
  • Health Check Response Format for HTTP APIs
  • Writing a Vector Database in a Week in Rust
  1. DZone
  2. Data Engineering
  3. Databases
  4. Simplifying InfluxDB: Retention Policy Best Practices

Simplifying InfluxDB: Retention Policy Best Practices

This article takes a look at what a retention policy is as well as some general guidelines on creating the best retention policy for your use case with InfluxDB.

Margo Schaedel user avatar by
Margo Schaedel
·
Jun. 22, 18 · Opinion
Like (1)
Save
Tweet
Share
17.86K Views

Join the DZone community and get the full member experience.

Join For Free

Retention policies can often be tricky even at the best of times, but when you’re dealing with time series data, setting up the appropriate retention policy to automatically expire (delete) unnecessary data can save you loads of time in the long run. This post will walk through some general guidelines on creating the best retention policy for your use case with InfluxDB.

Wait…What’s a Retention Policy?


influxdb retention policies

Data doesn’t remain useful forever.

Before we start talking about best practices around retention policies, it’s important to understand just what they are. Although its name is somewhat explanatory, an InfluxDB retention policy is defined in the documentation as:

"The part of InfluxDB’s data structure that describes for how long InfluxDB keeps data (duration), how many copies of those data are stored in the cluster (replication factor), and the time range covered by shard groups (shard group duration). RPs are unique per database and along with the measurement and tag set define a series.

When you create a database, InfluxDB automatically creates a retention policy called autogen with an infinite duration, a replication factor set to one, and a shard group duration set to seven days."

So, in a nutshell, a retention policy dictates for how long data will be kept and stored and if you’re using InfluxDB Enterprise, how many of copies of that data to store. Because time series data tends to pile up really quickly, you’re definitely going to want to discard or downsample data from InfluxDB once it’s no longer as useful. If you need further convincing, just check out these blog posts:

  • Optimizing Data Queries for Time Series Applications
  • Simplifying InfluxDB: Shards and Retention Policies

General Guidelines

There are a few key things to consider when you’re setting up your database’s retention policy. First and foremost, you’ll need to consider how long your use case requires that you retain the data. Do you need it for a week? A month? A year? This decision will specifically guide to what amount of time you set your retention policy duration and isn’t really negotiable.

But wait — you’re not done yet. Another integral part of setting up a retention policy involves designating the shard group duration for all data that will follow this retention policy. This is where things get tricky. Since shards really represent the core physical part of the database, tuning the shard group duration to just the right setting can really maximize performance and so, it’s important to get it right.

Setting the duration on the higher side will result in larger collections of data within each shard. This could cause problems when querying the database. For example, if you’re querying the database for a shorter time window than the shard group time span, the database may need to decode longer blocks of data in order to read a subset of the time range of the shard and that process will require greater effort and time.

On the other hand, if you set the shard group duration on the shorter side, the result is a greater number of shard groups. Due to Time Series Indexing, each shard will have some extra overhead in the form of this index and metadata, so having thousands of shards with little data on each is by no means efficient.

retention policies

It can sometimes be difficult to determine the right setting for your shard group duration.

My recommendation is to be like Goldilocks and try them all out until you hit the perfect spot!

Okay, all joking aside — we at InfluxData recommend setting the shard group duration as follows:

  • The shard group duration should be twice your longest typical query’s time range — yep, that means you’ll need to think about what kinds of queries you’ll be running on InfluxDB.
  • The shard group duration should be set so that each shard group ends up with at least 100,000 points per group — you want more data per shard, but not too much data.
  • The shard group duration should be set so that each shard group has at least 1,000 points per series.

Summary

If you’re new to using InfluxDB, setting up your database schema and retention policies can sometimes feel like a daunting task. Especially in more exceptional cases like working with very large clusters (Influx Enterprise) or with very long or short retention periods. You’ll definitely want to spend some time tweaking retention duration and shard group duration until you find the right fit. After all, it took Goldilocks three tries, right? 

InfluxDB Shard (database architecture) Data (computing) Database Time series

Published at DZone with permission of Margo Schaedel, DZone MVB. See the original article here.

Opinions expressed by DZone contributors are their own.

Trending

  • Competing Consumers With Spring Boot and Hazelcast
  • Microservices With Apache Camel and Quarkus (Part 2)
  • Health Check Response Format for HTTP APIs
  • Writing a Vector Database in a Week in Rust

Comments

Partner Resources

X

ABOUT US

  • About DZone
  • Send feedback
  • Careers
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 600 Park Offices Drive
  • Suite 300
  • Durham, NC 27709
  • support@dzone.com

Let's be friends: