You may have noticed that we are in the middle of a revolution. Many new technologies are forcing old, well-known industries to re-invent themselves. In many cases, the jokes about new names for old ideas are true (for example, the old ASP versus the new SaaS). In other cases, true transformations are bringing new solutions to the market.
A perfect example is the so-called Internet of Things, or IoT. Gartner estimates this will be a market of $546 billion in 2016, only in endpoint spending, with more than 6 billion connected devices! And that’s only the beginning: this market is estimated to grow up to $1.5 trillion by 2020, with more than 20 billion connected devices.
A Trip to the Gas Pump
One can argue that the IoT is not new. If you look back into the past, you can find plenty of examples of devices that were truly connected. Take a very prosaic example, our good old friend, the gas pump. Automated gas pumps have been intelligent devices for many years, with complex software and hardware running inside those machines allowing you to fill up your tank and pay for it quickly and easily. Controlling several processes simultaneously, such as selecting and dispensing the product (gasoline, diesel, ethanol, etc.), billing, payment, and online marketing—with all the discount coupons that many private labels offer—is not a simple task and must be done completely online, due to the credit card authorization system. They have evolved considerably since the first online payment systems pumps offered back in the 90s. This poses the question: How is a gas pump different from the new IoT devices that are so hyped now?
The answer is simple and goes back to the origins of the three V’s from Big Data: variety, volume, and velocity.
The gas pump is a single system dealing with clearly defined set of well-structured data—it lacks two of the three V’s: variety and volume. In spite of the inherent complexity of deploying a dozen point-of-sales terminals on every major street corner, this is something that has been refined over the years to fine-tune applications and components. Nowadays these issues have been completely addressed. The main challenge of such systems today is coping with a highly volatile oil prices and even they tend to change no more than once a day—so velocity qualifies as low.
IoT brings a different category of challenge: processing unstructured data (variety), being pushed by geographically distributed devices that generate huge amounts of data (volume) in milliseconds or nanoseconds (velocity). Monetizing this data is way beyond a simple Electronic Transfer Funds (ETF) transaction, requiring real-time analysis to support real-time business decisions. This analysis cannot be performed locally and, therefore, the current generation of IoT is strongly supported by cloud services that work as concentrators to capture and store this immense stream of data.
Enter the Unruly World of Unstructured Data
The key point from the technology perspective is that IoT data is essentially unstructured. It’s being generated by different types of devices, equipped with different sensors. If you need to concentrate it somewhere to process it and extract information in real time, you immediately find yourself facing an even bigger challenge: what technology should I use to store such data?
The immediate answer is, of course, a database. But traditional relational SQL databases, with their overwhelmingly heavily-structured tables and queries, cannot handle unstructured data very well. In fact, they can’t handle it at all. The answer must reside in another area that is becoming more and more part of many traditional systems: NoSQL databases.
If you are familiar with the NoSQL market, it seems there are more NoSQL vendors than gas pumps. There are all sorts of implementations, such as graph, key-value store, document store, columnar, and the list goes on, and on. 451 Research has a map showing how complex this scenario is today. The Non-Relational group has close to one hundred alternatives (Disclosure: includes FairCom c-treeACE). Can we conclude that there is one single type of NoSQL database that is better suited to tackle the IoT challenge?
I say yes. In fact, the answer is one of the examples listed above: key-value store. Let’s examine my reasoning.
Key-Value Store Is the Technology RDBMSs Are Built on
First, key-value stores have been the main technology beneath many other traditional databases, including the more common relational database management systems (RDBMS). Many people don’t know this, but the fact is that in the very recent past, there were many different types of non-SQL, or non-relational, databases. Good examples are the hierarchical databases, such as IBM’s IMS, still very popular in the high-end mainframe customer base. IMS is designed to take full advantage of high-speed indexing systems, and its main power comes from a combined usage with VSAM, the best-selling file storage access method for the mainframe. To oversimplify, VSAM is essentially a key-value store engine that provides high-speed access for the IMS database. The challenge of having fast indexes is solved in most technologies by using the B-tree approach, and that is, in essence, a key-value store engine. This design is used in most, if not all, traditional SQL databases, such as Oracle, DB2 or MS SQL Server.
Simultaneous Persistence and Indexing
Second, key-value stores have been proven to be the best way to handle persistence and indexing simultaneously—with ultra-high speed. It’s a very popular technology for solutions that require both of these properties, such as Time Series, Electronic Transfer Funds, Complex Event Processing, and Fund Management. Combining this technology with high-speed disks is the preferred architecture of most market leaders in these areas and is a proven solution to assess their typical challenges.
Unstructured Data Poses No Problem
Third, key-value stores can handle unstructured data in a fantastic manner. The ability to have complex indices (keys), associated with a data buffer (value), empowers the application to manipulate this data in different schema formats depending on their needs. Other databases, such as a Document Store, have the flexible schema ability as well. However, handling sensor data as documents is an overwhelming approach, since in most cases such information comprises a stream of small chunks of data that needs to be quickly indexed and stored. A Document Store requires the data to be organized in a structured manner, typically XML, which is just not efficient, both from the amount of data required to store it, and from the perspective of processing it later. Key-value stores avoid all that, giving the developers the ability to design the most efficient systems to process such data.
The final point is related to simplicity. Currently, most of the intelligence in processing IoT data is centralized in the cloud. In the near future, hardware evolution will allow such intelligence or processing to evolve into a combination of local and remote processing. This means those devices will aggregate the value of the information they collect and not just push it over to a remote place. To do this, they will have to provide the ability to store and process data locally, in real-time. A key-value store provides by far the simplest engine on the market, having the ability to be reduced to mere kilobytes of size. You can’t beat that in terms of simplicity. Local and remote storage engines that use similar data structures will be an important competitive advantage for such systems.
In conclusion, the IoT still has a long way to go to handle such a huge amount of data, especially in the direction of integration and infrastructure. But from the database point of view, the future of IoT is NoSQL—moreover, it is a key-value store.