DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Because the DevOps movement has redefined engineering responsibilities, SREs now have to become stewards of observability strategy.

Apache Cassandra combines the benefits of major NoSQL databases to support data management needs not covered by traditional RDBMS vendors.

The software you build is only as secure as the code that powers it. Learn how malicious code creeps into your software supply chain.

Generative AI has transformed nearly every industry. How can you leverage GenAI to improve your productivity and efficiency?

Related

  • Popular JavaScript TreeGrid Components for Productive Data Management
  • How Trustworthy Is Big Data?
  • Enhancing Avro With Semantic Metadata Using Logical Types
  • A Deep Dive into Apache Doris Indexes

Trending

  • Optimizing Serverless Computing with AWS Lambda Layers and CloudFormation
  • How to Introduce a New API Quickly Using Micronaut
  • Useful System Table Queries in Relational Databases
  • Is Big Data Dying?
  1. DZone
  2. Data Engineering
  3. Data
  4. Time Series Compression Algorithms and Their Applications

Time Series Compression Algorithms and Their Applications

In this article, learn more about time series compression algorithms along with their role in real-world applications in different sectors.

By 
Rosana de Oliveira Gomes user avatar
Rosana de Oliveira Gomes
·
Sep. 26, 22 · Analysis
Likes (8)
Comment
Save
Tweet
Share
7.8K Views

Join the DZone community and get the full member experience.

Join For Free

This is an article from DZone's 2022 Database Systems Trend Report.

For more:


Read the Report

Time series is present in our daily lives in multiple sectors of society, such as finance, healthcare, and energy management. Some of these domains require high data volume so that insights from analysis or forecasting the behavior of target variables can be obtained. Transferring and processing high data rates and volume across platforms with several users requires storage and computer power availability. Compression techniques are a powerful approach to avoid overwhelming systems. In what follows, time series compression algorithms will be discussed along with their role in real-world applications in different sectors.

What Is Time Series?

Time series is defined as a sequence of values of a quantity obtained at successive times, often with equally spaced intervals. We experience the use of timestamped data from when we monitor our exercises with a fitness app to when we track our pizza delivery traveling through the city all the way to our doorstep. Time series is relevant to problems when understanding the evolution of a variable over time is needed, such as understanding the time profile of a variable or forecasting their values. 

Time series most commonly appear in the form of timestamped numerical data in a tabular format. Audio data itself is already represented as a time series, as it is defined in terms of frequencies. However, although time series are themselves a data type, they can also be combined with other data types in order to produce more complex entities, which contain an embedded temporal aspect such as: 

  • A sequence of images over time defines a video.
  • A time series of coordinate pairs from geospatial data defines a tracking path. 

Time series data requires specific techniques in order to obtain insights not only from the different patterns in data, but also among those over time. In a data science pipeline, these techniques are employed during the preprocessing, analysis, and modeling steps. The most common time series techniques are illustrated in Figure 1 below: 

Time series methods

Figure 1: Time series methods

Data Compression

Data compression is the process of transforming data in order to reduce the number of bits necessary to represent it. This process is done via the alteration of the data through encoding or the rearrangement of its bits structure. Compression is a valuable technique utilized in scenarios where resources are crucial for storage, processing, and transmission of data. Two extremes of such scenarios are: 

  • Limited resources scenario, where the available storage and processing are limited by costs
  • Big data scenario, where a high-frequency data influx requires efficient data management 

The process of data compression involves encoding the data into a smaller format. In order to perform the reverse process, a decoder is needed to decompress the data, as illustrated in Figure 2: 

Data compression scheme

Figure 2: Data compression scheme

Compression expresses the same information present in the data in a smaller format with less bits. Since it searches for patterns in data that can be encoded, it is a computationally expensive procedure that may demand time and memory. State-of-the-art compression techniques are: 

  • Lossless compression identifies and removes data redundancyin a way that no information is lost in the process. The decoded data is restored exactly to its original state.
    • Common uses: databases, emails, spreadsheets, documents, and source code 
  • Lossy compression identifies and permanently removes redundant data bits, not making it possible to recover the original data after decompression. 
    • Common uses: audio, images, graphics, videos, and scanned files

The trade-off between accuracy and compression, present in the attempt to preserve the data while still addressing the storage bound, is an often-encountered challenge in data compression. Lossless data compression can only shrink data to a certain extent, having Shannon’s information as a threshold. For high-frequency data, lossy compression is needed in order to perform an effective reduction in size. 

Table 1:

ADVANTAGES AND DISADVANTAGES OF DATA COMPRESSION
Advantages of Compression Disadvantages of Compression
Reduction of file size and storage usage costs Time-consuming for large data volume
Increase data reading/writing speed due to the reduction of memory gaps during disk storage Algorithms need intensive processing from the system, which becomes costly for large data volume
Faster file transfer via the internet, requiring less computational resources Quality of decompressed data may depend on level of compression
Algorithms can be used to approximate and/or predict the data, as well as identify noise Requires a decoder program in order to decompress the files

Compression Methods for Time Series

The rise of big data and use of smart devices reveal a demand for powerful compression techniques able to fulfill the processing needs of industries that rely on time series data. In the case of high frequencies (around 10kHz), even databases that specialize in time series data can get overloaded. Compression algorithms are widely explored due to their high-value returns. The quality of a compression technique is measured by its compression ratio (between compressed and original files), speed (measured in cycles per bite), and the accuracy of the restored data. 

Time series compression algorithms take advantage of specific characteristics in time series produced by sensors — such as the fact that some time series segments often repeat themselves in the same or other related time series (redundancy), or the possibility to recover a time series via approximating it by functions or predicting them through neural network models. The state-of-the-art methods are listed below: 

Table 2:


TIME SERIES COMPRESSION ALGORITHMS
Algorithm Description Common Methods Performance
Dictionary-based
  • Represents time series through a series of common segments, using a dictionary to translate the segments into content.
  • The segments’ size determines accuracy and compression.  
  • TRISTAN is an algorithm divided into a learning and a compression phase, with a dictionary that may contain typical patterns or that learns them from a training set.
  • CORAD is an extension of the latter that considers autocorrelations to improve compression and accuracy
  • Effective for datasets with high redundancy. 
  • Can be lossy and lossless.
Function approximation
  • Divides the time series into segments and applies a function to approximate each of them.
  • Each method follows a different family of functions.  
  • Piecewise polynomial approximation (PPA) and Chebyshev polynomial transform (CPT) are two lossy techniques that split a time series into several segments and fit polynomial functions to them.
  • The Discrete Wavelet Transform (DWT) method approximates time series to wavelet functions.
Suitable for smooth time series, low compression ratios, and high accuracy.
Sequential algorithms
  • Sequential combination of several compression techniques.
  • The most common are Huffman coding, delta encoding, run-length encoding, and Fibonacci binary coding.
  • Delta encoding, run-length, and Huffman (DRH) is a method that requires low computational power.
  • Spritz is designed for high decompression speed and low energy consumption.  
  • Run-Length Binary Encoding (RLBE) is developed for low memory and computational resources devices.
  • RAKE is an algorithm with a preprocessing and a compression phase that utilizes sparsity to compress the data.
  • Majority of methods are lossless and computationally efficient.
  • Suitable for Internet of Things (IoT) devices that have limited computational resources.
Autoencoders
  • Neural network architectures composed of a symmetric pair of encoder and decoder.
  • Trained to generate an output that reproduces the input passed to it.
Recurrent Neural Network Autoencoder (RNNA) methods consider a time-dependent neural network that has a lossy compression and a loss threshold parameter. Accuracy and compression ratio depend strongly on the ability of the RNN of finding patterns in the training set.

Time Series Applications

The compression algorithm to be chosen for a certain problem depends on the domain of the application and data in question. Applications of compression can be found in many sectors, with multimedia through the compression of images, video, and audio data being the most popular. In particular, time series compression is used in crucial industries. Time series use cases in different sectors and the highlights on compression in such applications are shown in Table 3. In all use cases, the advantages presented in Table 1 are also applicable. 

Table 3:

TIME SERIES USE CASES

Medicine Maintenance Energy Economics
Use case Monitoring of multiple life signals of patients integrated into a warning system to guarantee full time assistance. Monitoring of industrial equipment and further automated report of equipment status, ensuring safety and efficient production. Short-term forecast of energy consumption by smart meters. Data collected at high frequencies reports the status of stock market statistics in real time.
Compression added value Faster data processing for performing calculations, such as triggering warnings. Easier and cheaper storage of large data volumes, making it affordable for manufacturing companies to adopt data-driven solutions. Encoding algorithms help gather insights from the data, like noise and behavior, making more accurate forecasts. Faster transmission of information through a large network, permitting users to make decisions in real-time.

Conclusion

We live in the era of big data, where over 250 exabytes of data are produced every day, from which a large portion is present in the form of time series in a broad range of industries (note: 250 exabytes = 250×10 to the 18th power). Time series compression techniques are a powerful approach to efficiently collect, store, manipulate, and transfer data, which is crucial for database maintenance and the implementation of robust data management pipelines. 

The great adoption of smart devices has also increased the need for compression techniques that are suitable for scenarios of low computational resources, as in IoT. This article presented both lossy and lossless techniques that are suitable for multiple time series profiles and applications scenarios and discussed the limitations and strengths of such methods. 

This is an article from DZone's 2022 Database Systems Trend Report.

For more:


Read the Report

Big data Data management Time series applications Database

Opinions expressed by DZone contributors are their own.

Related

  • Popular JavaScript TreeGrid Components for Productive Data Management
  • How Trustworthy Is Big Data?
  • Enhancing Avro With Semantic Metadata Using Logical Types
  • A Deep Dive into Apache Doris Indexes

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!