DZone
Thanks for visiting DZone today,
Edit Profile
  • Manage Email Subscriptions
  • How to Post to DZone
  • Article Submission Guidelines
Sign Out View Profile
  • Post an Article
  • Manage My Drafts
Over 2 million developers have joined DZone.
Log In / Join
Please enter at least three characters to search
Refcards Trend Reports
Events Video Library
Refcards
Trend Reports

Events

View Events Video Library

Zones

Culture and Methodologies Agile Career Development Methodologies Team Management
Data Engineering AI/ML Big Data Data Databases IoT
Software Design and Architecture Cloud Architecture Containers Integration Microservices Performance Security
Coding Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks
Culture and Methodologies
Agile Career Development Methodologies Team Management
Data Engineering
AI/ML Big Data Data Databases IoT
Software Design and Architecture
Cloud Architecture Containers Integration Microservices Performance Security
Coding
Frameworks Java JavaScript Languages Tools
Testing, Deployment, and Maintenance
Deployment DevOps and CI/CD Maintenance Monitoring and Observability Testing, Tools, and Frameworks

Modernize your data layer. Learn how to design cloud-native database architectures to meet the evolving demands of AI and GenAI workkloads.

Secure your stack and shape the future! Help dev teams across the globe navigate their software supply chain security challenges.

Releasing software shouldn't be stressful or risky. Learn how to leverage progressive delivery techniques to ensure safer deployments.

Avoid machine learning mistakes and boost model performance! Discover key ML patterns, anti-patterns, data strategies, and more.

Related

  • AI, ML, and Data Science: Shaping the Future of Automation
  • Snowflake vs. Databricks: How to Choose the Right Data Platform
  • MLOps: How to Build a Toolkit to Boost AI Project Performance
  • Comparing Pandas, Polars, and PySpark: A Benchmark Analysis

Trending

  • Building Scalable and Resilient Data Pipelines With Apache Airflow
  • Google Cloud Document AI Basics
  • Unlocking the Benefits of a Private API in AWS API Gateway
  • A Developer's Guide to Mastering Agentic AI: From Theory to Practice
  1. DZone
  2. Data Engineering
  3. Big Data
  4. Adaptive Sampling in an IoT World

Adaptive Sampling in an IoT World

Reduce power usage and bandwidth utilization with Adaptive Sampling techniques for IoT applications.

By 
Ankur Agarwal user avatar
Ankur Agarwal
·
Jan. 02, 23 · Analysis
Likes (2)
Comment
Save
Tweet
Share
3.6K Views

Join the DZone community and get the full member experience.

Join For Free

The Internet of Things (IoT) is now an omnipresent network of connected devices that communicate and exchange data over the internet. These devices can be anything from industrial machinery monitoring, weather and air quality monitoring systems, and security cameras to smart thermostats and refrigerators to wearable fitness trackers. As the number of IoT devices increases, so does the volume of data they generate. A typical application of this data is to improve the performance and efficiency of the systems being monitored and gain insights into their users' behavior and preferences.

However, the sheer volume makes it challenging to collect and analyze such data. Furthermore, a large volume of data can overwhelm both the communication channels and the limited amounts of power and processing on edge devices. This is where adaptive sampling techniques come into play. These techniques can reduce workload, maximize resource utilization requirements, and improve the accuracy and reliability of the data.

Adaptive Sampling

Adaptive sampling techniques "adapt" their sampling or transmission frequency based on the specific needs of the device or changes in the system of interest. For example, consider a device on a limited data plan, a low-power battery, or a compute-restricted platform.

Examples: 

  • A temperature monitoring sensor may collect data more frequently when there are rapid changes in temperature and less frequently when the temperature remains stable.
  • A security camera captures images at a faster frame rate or higher resolution when there is some activity in the field of view.
  • An air particulate meter increases its sampling rate when it notices the air quality deteriorating.
  • A self-driving car constantly senses the environment but may send special edge cases back to a central server for edge case discovery.

What and Where to Sample

Your expected resource utilization improvements guide what to and where to sample. There are two sites to implement sampling: At measurement or transmission.

A diagram showing how information moves from system of interest to cloud via IoT sensor

  • Sampling at measurement:

    • The edge device will only measure (or update measurement frequency) when an algorithm (either running on the edge device or on a server) deems fit.
    • Reduces power and computing.
    • Periodically improves network bandwidth utilization.
  • Sampling at transmission:

    • The edge device measures continuously and processes this with some algorithm running locally. If the sample is high entropy, upload data to the cloud/ server.
    • Power and compute at measurement unaffected.
    • Reduces network bandwidth utilization.

Identifying Important and Useful Data 

We have often heard the term "data, data, data." But is all data equal? Not really. Data is most useful when it brings information. This is true even for Big Data applications that are admittedly data-hungry. For example,  Machine Learning and Statistical systems all need "high quality" data, not just large quantities.

So how do we find high-quality data? Entropy!

Entropy

Entropy is the measurement of uncertainty in the system. In a more intuitive explanation, entropy is the measure of "information" in a system. For example, a system with a constant value or constant rate of change (say temperature). In optimal working conditions, there is no new information. You will get the expected measurement every time you sample; this is low entropy. 

On the other hand, if the temperature changes "noisily" or "unexpectedly," the entropy in the system is high; there is new and interesting information.  The more unexpected the change, the larger the entropy and the more important that measurement.

Equation of entropy in Information Theory

Entropy in Information Theory.

When the probability of occurrence 'p(x)' is low, entropy is high, and vice versa. A measurement probability of 1 (something we really expect is going to happen) yields 0 entropy, and rightly so.

This principle of "informational value" is central to adaptive sampling.

Some State of the Art Techniques

The basic logic flow in all adaptive techniques is:

Logic flow of adaptive sampling techniques

Using "Model Predictions" to understand the information contained 
in the new measurements (sampled data).

These "Model Prediction" algorithms analyze past data and identify patterns that help to predict if a high entropy event is likely to occur, allowing the system to focus its data collection efforts. The magic lies in how well we can model our predictions.

  • Adaptive Filtering methods:

    • These methods apply filtering techniques on measurements to estimate measurements in the next time steps.
    • These could be FIR (Finite Impulse Response) or IIR (Infinite Impulse Response) techniques like:
      • Weighted moving average (can be made more expressive with probabilistic or exponential treatment)
      • Sliding window-based methods
    • They are relatively low in complexity but may have a non-trivial memory footprint to buffer past measurements.
    • Need small amounts of data for configuring.
  • Kalman Filter methods:

    • Kalman Filters are efficient and have small memory footprints.
    • They can be relatively complex and hard to configure but work well when tuned correctly.
    • Need small amounts of data for configuring.
  • Machine Learning methods:

    • Using past collected data, we can build machine learning models to predict the next state of the system under observation.
    • These are the more complex but also generalize well.
    • Depending on the task and complexity, large amounts of data may be needed for training.

Major Benefits

  • Improved efficiency: By collecting and analyzing data from a subset of the available data, IoT devices can reduce workload and resource requirements. This helps improve efficiency and performance and reduces data collection, analysis, and storage costs.
  • Better accuracy: By selecting the data sources that are most likely to provide the most valuable or informative data, adaptive sampling techniques can help to improve the accuracy and reliability of the data. This can be particularly useful for making decisions or taking actions based on the data.
  • Greater flexibility: Adaptive sampling techniques allow IoT devices to adapt to changes in the data sources or the data itself. This can be particularly useful for devices deployed in dynamic or changing environments, where the data may vary over time.
  • Reduced post-processing complexity: By collecting and analyzing data from a subset of the available data sources, adaptive sampling techniques can help to reduce the complexity of the data and make it easier to understand and analyze. This can be particularly useful for devices with limited processing power or storage capacity or teams with limited data science/ engineering resources.

Potential Limitations

  • Selection bias: By selecting a subset of the data, adaptive sampling techniques may introduce selection bias into the data. This can occur if the models and systems are trained on a specific type of data, which is not representative of the overall data population, leading to inaccurate or unreliable conclusions.
  • Sampling errors: There is a risk of errors occurring in the sampling process, which can affect the accuracy and reliability of the data. These errors may be due to incorrect sampling procedures, inadequate sample size, or non-optimal configurations.
  • Resource constraints: Adaptive sampling techniques may require additional processing power, storage capacity, or bandwidth, which may not be available on all IoT devices. This can limit adaptive sampling techniques on specific devices or in certain environments.
  • Runtime complexity: Adaptive sampling techniques may involve the use of machine learning algorithms or other complex processes, which can increase the complexity of the data collection and analysis process. This could be challenging for devices with limited processing power or storage capacity.

Workarounds

  • Staged deployment: Instead of deploying a sampling scheme on all devices, deploy on small but representative test groups. Then the "sampled" data from these groups can be analyzed against the more expansive datasets for biases and domain mismatches. Again, this can be done in stages and iteratively, ensuring our system is never highly biased.
  • Ensemble of Sampling techniques: Different devices can be armed with slightly different sampling techniques, varying from sample sizes and windows to different algorithms. Sure, this increases the complexity of post-processing, but it takes care of sampling errors and selection biases.
  • Resource constraints and Runtime complexity are hard to mitigate. Unfortunately, that is the cost of implementing better sampling techniques.
  • Finally, test, test, and more tests.

Takeaways

  • Adaptive Sampling can be a useful tool for IoT if one can model the system being observed.
  • We briefly introduced a few modeling approaches with varying complexities.
  • We discussed some benefits, challenges, and solutions for deployment.
Big data Data science IoT Machine learning Cloud

Opinions expressed by DZone contributors are their own.

Related

  • AI, ML, and Data Science: Shaping the Future of Automation
  • Snowflake vs. Databricks: How to Choose the Right Data Platform
  • MLOps: How to Build a Toolkit to Boost AI Project Performance
  • Comparing Pandas, Polars, and PySpark: A Benchmark Analysis

Partner Resources

×

Comments
Oops! Something Went Wrong

The likes didn't load as expected. Please refresh the page and try again.

ABOUT US

  • About DZone
  • Support and feedback
  • Community research
  • Sitemap

ADVERTISE

  • Advertise with DZone

CONTRIBUTE ON DZONE

  • Article Submission Guidelines
  • Become a Contributor
  • Core Program
  • Visit the Writers' Zone

LEGAL

  • Terms of Service
  • Privacy Policy

CONTACT US

  • 3343 Perimeter Hill Drive
  • Suite 100
  • Nashville, TN 37211
  • support@dzone.com

Let's be friends:

Likes
There are no likes...yet! 👀
Be the first to like this post!
It looks like you're not logged in.
Sign in to see who liked this post!