Data Modelling: Counter Table
Data Modelling: Counter Table
Sometimes, you need to avoid failure scenarios where in-memory systems aren't suitable. In those cases, I recommend using an aggregate counter table.
Join the DZone community and get the full member experience.Join For Free
Built by the engineers behind Netezza and the technology behind Amazon Redshift, AnzoGraph™ is a native, Massively Parallel Processing (MPP) distributed Graph OLAP (GOLAP) database that executes queries more than 100x faster than other vendors.
A counter table is a table that keeps counts of particular items or of certain keys. This can range from page count on your blog to keep track of the limit the user is allowed to have in terms of a particular item or service.
Usually, a counter table would be better kept in something like Memcached or Redis, as frequent increment updates are better-suited to those in-memory systems.
MySQL and InnoDB, in particular, have many stringent systems to make sure that your data has been reliably written to disk. Just going through those systems alone can make having a counter table unsuitable, not even considering the speed it takes to update the actual table.
However, sometimes, there is a need for certain assurances from failure scenarios where in-memory systems may not be suitable — as when they crash, the data kept in memory is cleared out.
In those cases, I recommend that you use what I consider an aggregate counter table. The idea here is to replace doing lots of increment updates with simply counting the original base table you are interested in having counts for.
In short, instead of:
INSERT INTO base_table; UPDATE counter_table set value=value+1 where key=key1;
You would do:
INSERT INTO base_table; On interval (like 1 to 5 seconds): - INSERT INTO counter_table - SELECT key1, count(1), max(primarykey) FROM base_table - WHERE last_count_position - GROUP BY key1 - ON DUPLICATE KEY UPDATE value=value+recent_count
In order to be able to aggregate the
base_table correctly, you need to keep some sort of record of the last time or position you read for the base table. I recommend that you consider the primary key, assuming it's an integer, as well as having a
last_updated timestamp column.
Below is an example of a counter table that keeps the last ID of the primary key that it counted from the base table:
CREATE TABLE counter_table ( key_id int(10) unsigned NOT NULL, counts int(10) unsigned DEFAULT '0', lastprimary_id int(10) unsigned DEFAULT '0', PRIMARY KEY (key_id), KEY idx_camp (lastprimary_id) ) ENGINE=InnoDB)
In order to run your
refresh query, you first need to query the
counter_table like this:
SELECT max(lastprimary_id) from counter_table;
Then, populate the counter table by including this in your
INSERT INTO SELECT statement:
WHERE base_table.primarykey > lastprimary_id
This should be very fast and will prevent the many database-attacking update queries that can become serious bottlenecks to your performance in the long run.
As for the downsides, this method doesn't factor in whether the rows in the base table were
DELETE-ed. It just counts the row number. If this is a requirement, you can revert to using
UPDATE statements for:
UPDATE counter_table SET value=value-1
...with the understanding that this will happen infrequently.
You also now need to maintain a procedure and monitor that it is running at the set intervals that you need. Fortunately, MySQL has scheduled Events to help with that.
Published at DZone with permission of Jonathan Levin . See the original article here.
Opinions expressed by DZone contributors are their own.