With Software becoming an integral part of every device and instrument – watches, TV, fridge, automobiles, etc. - the next step seems to connect all these devices via software and build a universe – Internet of Things! Imagine one being able to monitor and manage home and office devices while travelling in a car – the AC at your home is automatically switched on once your car reports that you are 10 minutes away! The possibilities could be endless and hence, the entire buzz around IoT!
Some broad level steps to achieve IoT can be represented as such:
But, with all this digitization and automation, the amount of digital data being generated is multiplying exponentially!
For instance, in the earlier example, the car –
- Continuously generates data with respect to current position.
- Keeps calculating time required to reach home based on distance and route considerations like traffic, weather, etc.
- Informs home AC to start when the ETA is 10 minutes.
This leads to various forms of data being generated –
- Raw data – current car position, weather, traffic, etc.
- Analyzed data – Time to reach home.
- Action/Conclusion – Home AC to be switched on when time to reach home is less than 10 minutes.
Now, there are various discussions around which of the above data need to be stored! A general buzz is that as IoT generates huge amounts of data, an equally huge infrastructure to store all this data is required!
There is no doubt that storage resources are going to be constrained as IoT adoption increases, but many factors need to be considered while determining the expansion of storage resources. The factors are not very different from the current ones like purpose (future needs, logging, etc.), time period, etc. but the data sets involved here are huge!
The safest option is store all data that is generated, as one is not sure how the data can be used in the future! But, this will need huge storage infrastructure and hence, will have some cost!
Many people are of the opinion that raw data need not be stored, it can be redirected to a big data analytics platform, where data will be analyzed and this analyzed output can then be stored along with the decision/conclusion.
Some believe that just the decision/conclusion actions should be logged for audit purposes!
In the car and AC example above, Home AC was switched on at 8:00 pm on 5th August 2014 as instructed by car can be saved for logging purpose! But, what time did the car send the instruction, at what distance and other information cannot be referred to in the future! This info can be required for triaging or enhancements like accuracy, performance, etc. Hence, there has to be a trade-off!
The application and the use case also need to be considered here. For instance, with the advent of wearable devices, lot of healthcare data about the individual’s health parameters would be generated and saving all this information could always be useful in future! But, in the car and AC example, slightly lower performance and less accuracy can be tolerated!
Hence, one should carefully consider all aspects and decide on the data that needs to be stored. It is very important to not get biased with the hype and start expanding and replacing storage systems!
Once the data sets are decided, the next step is to plan for the storage resources – existing and new for expansion.
The data generated here would be small in size but the volume would be huge! Consider the data generated by car every minute with respect to its position! Such data could be generated from various sensors on the car! For instance - traffic conditions, weather conditions, etc. Hence, to store this data, storage needs to scale with time.
Object storage is being considered as a choice for storing such data, mainly because of its scalable and distributed nature. However, object storage still poses the following challenges –
- Storing large number of small files – Currently, the file system based object storage platforms could pose a limitation here.
- Linear performance – IoT would need quick data access for real time decisions.
Hence, the basic characteristics of a storage system for IoT would be –
- Scalable – Need to scale out linearly as the data set size would keep growing with time.
- Distributed – The architecture needs to be distributed to ensure performance and load distribution.
- Tiering – Tiering and categorization of data becomes important with the amount of data being dealt with here.
- Archiving – Archiving old data to ensure storage resources are not loaded unnecessarily.
- Old data management – Ensuring that old unwanted data is deleted to ensure optimal use of storage resources.
Lot of discussions around storage requirements for IoT and data sets that need to be stored are being carried out, and this is will continue as IoT evolves! But, the basic idea should be to store relevant data that could impact the IoT ecosystem and, in turn, the user.