SQL Tips and Tricks: Counting Rows
SQL Tips and Tricks: Counting Rows
Getting total row counts of data in tables across various dimensions is a useful technique to have in one’s tool belt of SQL tricks.
Join the DZone community and get the full member experience.Join For Free
Getting total row counts of data in tables across various dimensions (per-table, per-schema, and in a given database) is a useful technique to have in one’s tool belt of SQL tricks. While there are a number of use cases for this, my scenario was to get the per-table row counts of all tables in PostgreSQL and YugabyteDB as a first sanity check after migrating an application with the pre-existing data from PostgreSQL to YugabyteDB.
This blog post outlines how to get the following row counts of tables in a database:
- Row counts broken down per table in the schema
- Aggregate row counts per schema of the database
- Aggregate row count across all tables in the database We will create an example database, import two popular SQL datasets – Northwind and SportsDB, and run through the above scenarios on these example databases.
The examples in this blog post, which are essentially dynamic SQL queries on the system catalog tables, must be done with superuser privileges. Also, note that the programmatic generation of SQL queries using catalog tables needs to handle exotic names properly. An instance of a table and a column with an exotic name is shown below.
While this post does not explicitly discuss the challenges posed by the example above, the SQL functions below handle these cases correctly and do so by incorporating some of the important and well-known techniques necessary to prevent SQL injection attacks.
In order to create a test setup, I simply installed YugabyteDB on my laptop, created a database
example, and loaded the Northwind dataset – all of which only took a few minutes to do. For the purpose of simplicity, we’re going to use the default
yugabyte user for the operations below. However, creating a dedicated user for each of these datasets with the appropriate privileges is the recommended best practice.
You can verify that the tables have been created by running the following command.
Next, let’s import the SportsDB dataset into a new schema named sportsdb as shown below.
Recall that YugabyteDB re-uses the native PostgreSQL codebase for its query layer (or the SQL processing layer) of the database. This means that the high-level approach to solving this problem is identical in the case of both PostgreSQL and YugabyteDB.
We’ll solve this problem by first creating a user defined function (UDF),
count_rows_of_table which counts the number of rows in a single table. Note that this function must be owned by a suitably privileged user, in our example we will use the
yugabyte user. This function can subsequently be used in various types of queries to print the desired row counts in the various scenarios. The function definition is shown below.
You can test the above function by passing in a table (for example, the
orders table loaded from the Northwind dataset) as shown below.
Per-Table Row Counts in a Given Database
information_schema.tables table in the system catalog contains the list of all tables and the schemas they belong to. Because we are mainly interested in the user tables, we filter out all tables belonging to
information_schema, which are system schemas. We then call the function we defined in the previous section to get the row count for each table.
The query above outputs a table that contains the row counts of all tables across the various schemas, first sorted by the
table_schema column and for each table schema, sorted by the tables with the largest number of rows. If we run the above query on our test database, we should see the following output.
Next, let us say we want to get the total row count across all tables broken down per schema. This can be achieved by using the following query.
The above uses a subquery to first compute the totals row count per table and performs a
GROUP BY operation to get the total number of rows in each schema of the current database. The resulting output is sorted by the schema with the maximum number of rows.
The query below simply sums the row counts of the individual tables from the previous step to get a total row count across all the tables. This is done by running the per-table row count as a subquery called
per_table_count_subquery and performing a
SUM across all the row counts that are the output of that subquery.
Running this on the Northwind example dataset produces the following output.
This post shows some important SQL techniques for computing row counts for PostgreSQL-compatible databases like YugabyteDB. Please remember that writing programs that generate and execute dynamic SQL must be done with caution, rigorous testing, and thorough peer reviews. Errors bring the risk of wrong results. But far worse than this, they expose your code, and therefore your entire database, to the risks of SQL injection attacks.
The code in the examples above works exactly the same way across PostgreSQL and YugabyteDB. This is by design, due to the reuse of the PostgreSQL query layer in YugabyteDB. Try out your favorite PostgreSQL feature on YugabyteDB, and let us know how it goes on our Community Slack. If you run into any issues, just file an issue on GitHub.
Published at DZone with permission of Karthik Ranganathan . See the original article here.
Opinions expressed by DZone contributors are their own.