Is Translytics Just Rebranded OLTP?
Is Translytics Just Rebranded OLTP?
Well, is it? Or is there some actual substance behind this idea? Read on for some insight.
Join the DZone community and get the full member experience.Join For Free
Recently, Forrester group has started to talk about “translytics” and translytical databases: databases that can “deliver analytics at the speed of transactions.” Is this more marketing hype? Or is there something to it?
A traditional OLTP database handles lots of rapid-fire transactions but doesn’t really do much more than that. An OLTP database application generally involves:
Shared finite resources: Something of value is being measured, allocated, or used. Sometimes, multiple transactions will try to use the same shared finite resource. In a legacy RDBMS, this is handled with row-level locking.
Large numbers of similar ACID transactions.
Low latency, generally in the single-digit millisecond range.
The term “translytics” refers to OLTP with additional features — so OLTP is a subset of translytics, which adds:
- Counting: A translytical database will need to accurately keep running totals that are continually being changed by arbitrary groups of transactions. “How many cars are in this zip code right now?” is an example.
- Aggregating has to do with taking a stream of incoming transactions and generating new data that consolidates many pieces of data across multiple transactions. How much bandwidth your home DSL connection uses each hour is an example of aggregation, as we turn an unpredictable number of input records into 24 output records per day per subscriber. In some use cases, we can get away with guesswork, in others we may have a legal requirement for 100% accuracy.
- Telling is the process of sending data to downstream systems. At some point, the rest of the world needs to know about the wonderful OLTP stuff you are doing, but how does the outside world know what to check? It’s common to see scenarios in which enormous amounts of time and energy are spent polling an OLTP system to see what has changed.
Now surely, all this is easy. On a small scale of a few transactions per second, it’s not an issue. But as workloads go up, this turns into a significant computational problem.
Legacy RDBMS Products Can’t Handle Significant Translytical Workloads
When you issue a query in a legacy RDBMS, it attempts to give an answer based on what the data looked like at the moment you executed the query. For fast-running queries, this isn’t a factor, but the longer the query keeps running, the harder it is for the server to remember what all the rows looked like at the moment the query was issued, as they may have been changed repeatedly since the moment the query was issued. Doing this for one query is manageable, but when hundreds of these queries are being issued every second for the purposes of counting and aggregating, the overheads become totally unmanageable.
NoSQL Products Can’t Do Accurate Aggregation and Counting in Real-Time at Scale
NoSQL products find “counting” and “aggregating” even harder, as their implementations of ACID are generally limited to a single record or key/value pair. This makes it really hard to count or aggregate more than one ‘thing’ with any degree of accuracy. “Telling” can also be extremely difficult, as some products (such as Cassandra) don’t like the kind of range queries you need to issue to get blocks of records to feed to downstream systems.
Published at DZone with permission of David Rolfe , DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.