Stream Views: The Untold Story
VoltDB recently introduced stream views, which gives you a better way to see your stream tables and are updated with each new INSERT.
Join the DZone community and get the full member experience.Join For Free
Stream views are a recent addition to VoltDB. With stream views, we get the utility of the aggregations — real-time analytics — without the memory footprint of their source tables.
A stream view is a materialized view of data that’s inserted into your stream tables. With VoltDB stream views, the aggregations and other columns in the materialized view are updated with each SQL INSERT into the source tables, same as with any materialized view. But with streams, there’s no underlying, persistent set of rows, as there is with traditional views.
First, let’s explain streams with materialized views. While there are other uses of streams as connections to external data consumers, in this use case, inserting rows into the stream, as mentioned above, drives the aggregations and columns in the materialized view without any of the memory overhead of the source data. Streams are defined like any other table – each stream has a schema with column names and data types – and data is inserted by normal SQL statements or stored procedures. Insert a row into a VoltDB stream table and when the transaction is committed, the materialized view has been updated.
And consistent with VoltDB semantics, the stream manager keeps things reliable: if the consumer disappears due to error or link failure, the stream manager will resume with the first uncommitted – that is, undelivered – row when the consumer returns. If VoltDB stops for some reason, similarly, the stream manager will reconnect to the consumer and resume streaming.
Combine that stream with a materialized view and we have a stream view. A materialized view is typically an aggregation of data from one or more tables. Aggregate functions include minimum, maximum, counts, sums, averages – held in other tables. VoltDB provides the convenience of updating the aggregations as rows of data as the underlying sources are inserted, updated or deleted. Stream views maintain the same aggregations, but the underlying data source is ephemeral. So as noted above, we get the utility of the aggregations – real-time analytics – without the memory footprint of the source.
Putting the stream and the view together is quite simple. See the “auction” project in VoltDB documentation for a working example. We won’t dissect that entire project here but will look at the stream and its view. Here is the underlying stream table.
CREATE STREAM BID_STREAM PARTITION ON COLUMN ITEMID EXPORT TO TARGET newfile ( BIDID INTEGER DEFAULT '0' NOT NULL, ITEMID INTEGER DEFAULT '0' NOT NULL, BIDDERID INTEGER DEFAULT '0' NOT NULL, BIDTIME TIMESTAMP DEFAULT CURRENT_TIMESTAMP NOT NULL, BIDPRICE FLOAT DEFAULT NULL, );
We specify the destination for the rows inserted into BID_STREAM by “TARGET”, in this case, “newfile”. That connection is completed in the VoltDB deployment file “export” section.
<export> <configuration target="newfile" enabled="true" type="file"> <property name="type">csv</property> <property name="nonce">Bids</property> <property name="period">3</property> <property name="skipinternals">true</property> </configuration> </export>
In this case, the data goes to a file in “csv” format. It’s important to understand that the data doesn’t have to “go” anywhere. If there’s not a “target” in the stream definition, or no export section in the deployment file, the stream materialized view still maintains its aggregate view of the data that’s been inserted into its source table or tables, if the populating SQL “joins” tables.
The final piece in the stream view is the definition of the materialized view itself:
CREATE VIEW V_BID_MIN_MAX ( ITEMID, BIDS, MAX_PRICE, MIN_PRICE ) AS SELECT ITEMID, COUNT(*), MAX(BIDPRICE), MIN(BIDPRICE) FROM BID_STREAM GROUP BY ITEMID;
The stream view maintains the auction item identifier, the number of bids for it and highest and lowest prices.
The stream view has one row that aggregates the “action” for each auction item, so we’re retaining the salient data for each item without saving all the bids in its source table. So a simple select shows the current status of all the auctions:
2> select * from v_bid_min_max; ITEMID BIDS MAX_PRICE MIN_PRICE ------- ----- ---------- ---------- 2 905 2730.53 2000.0 4 525 418.83 10.0 1 876 711.64 0.5 3 645 527.17 10.0 (Returned 4 rows in 0.01s)
This stream view is pretty minimal. All the power of SQL is available so if we want the item name rather than its number, we create a join with the item table:
5> SELECT itemname, bids, max_price, min_price FROM item i, v_bid_min_max iv WHERE i.itemid = iv.itemid; ITEMNAME BIDS MAX_PRICE MIN_PRICE --------------------- ----- ---------- ---------- 2003 Camry LE 905 2730.53 2000.0 Futurama Season 3 525 418.83 10.0 Dune 876 711.64 0.5 MacBook Power Supply 645 527.17 10.0 (Returned 4 rows in 0.00s)
To recap, VoltDB stream views allow you to maintain real-time aggregations on data that has been streamed out of VoltDB. But with stream views, there’s no underlying, persistent set of rows, as there is with traditional views. This is a great way to maintain “historical” aggregations without the overhead of traditional materialized views and without having to reference an external system for that information.
Published at DZone with permission of Peter Shaw, DZone MVB. See the original article here.
Opinions expressed by DZone contributors are their own.