Over a million developers have joined DZone.

Why Analyze Raw MySQL Query Logs?

DZone's Guide to

Why Analyze Raw MySQL Query Logs?

These are just some examples of what you can find out by querying raw slow query logs. They contain a ton of information about query execution.

· Big Data Zone ·
Free Resource

Hortonworks Sandbox for HDP and HDF is your chance to get started on learning, developing, testing and trying out new features. Each download comes preconfigured with interactive tutorials, sample data and developments from the Apache community.

In this post, I’ll examine when looking at raw MySQL query logs can be more useful than working with tools that only have summary data.

In my previous post, I wrote about analyzing MySQL Slow Query Logs with ClickHouse and ClickTail. One of the follow-up questions I got is when do you want to do that compared to just using tools like Percona Monitoring and Management or VividCortex, which provide a beautiful interface for detailed analyses (rather than spartan SQL interface).

MySQL Logs

A lot of folks are confused about what query logs MySQL has, and what you can use them for. First, MySQL has a General Query Log. As the name implies, this is a general-purpose query log. You would think this is the first log you should use, but it is, in fact, pretty useless:

2018-03-31T15:38:44.521650Z      2356 Query SELECT c FROM sbtest1 WHERE id=164802
2018-03-31T15:38:44.521790Z      2356 Query SELECT c FROM sbtest1 WHERE id BETWEEN 95241 AND 95340
2018-03-31T15:38:44.522168Z      2356 Query SELECT SUM(k) FROM sbtest1 WHERE id BETWEEN 1 AND 100
2018-03-31T15:38:44.522500Z      2356 Query SELECT c FROM sbtest1 WHERE id BETWEEN 304556 AND 304655 ORDER BY c
2018-03-31T15:38:44.522941Z      2356 Query SELECT DISTINCT c FROM sbtest1 WHERE id BETWEEN 924 AND 1023 ORDER BY c
2018-03-31T15:38:44.523525Z      2356 Query UPDATE sbtest1 SET k=k+1 WHERE id=514

As you can see, it only has very limited information about queries: no query execution time or which user is running the query. This type of log is helpful if you want to see very clean, basic information on what queries your application is really running. It can also help debug MySQL crashes because, unlike other log formats, the query is written to this log file before MySQL attempts to execute the query.

The MySQL Slow Log is, in my opinion, much more useful (especially with Percona Server Slow Query Log Extensions). Again as the name implies, you would think it is only used for slow queries (and by default, it is). However, you can set long_query_time to 0 (with a few other options) to get all queries here with lots of rich information about query execution:

# Time: 2018-03-31T15:48:55.795145Z
# User@Host: sbtest[sbtest] @ localhost []  Id: 2332
# Schema: sbtest  Last_errno: 0 Killed: 0
# Query_time: 0.000143  Lock_time: 0.000047 Rows_sent: 1  Rows_examined: 1 Rows_affected: 0
# Bytes_sent: 188  Tmp_tables: 0 Tmp_disk_tables: 0  Tmp_table_sizes: 0
# QC_Hit: No  Full_scan: No Full_join: No  Tmp_table: No Tmp_table_on_disk: No
# Filesort: No  Filesort_on_disk: No  Merge_passes: 0
#   InnoDB_IO_r_ops: 0  InnoDB_IO_r_bytes: 0  InnoDB_IO_r_wait: 0.000000
#   InnoDB_rec_lock_wait: 0.000000  InnoDB_queue_wait: 0.000000
#   InnoDB_pages_distinct: 0
# Log_slow_rate_type: query  Log_slow_rate_limit: 10
SET timestamp=1522511335;
SELECT c FROM sbtest1 WHERE id=2428336;

Finally, there is the MySQL Audit Log, which is part of the MySQL Enterprise offering and format-compatible Percona Server for MySQL Audit Log Plugin. This is designed for auditing access to the server, and as such it has matched details in the log. Unlike the first two log formats, it is designed first and foremost to be machine-readable and supports JSON, XML, and CVS output formats:

{"audit_record":{"name":"Query","record":"743017006_2018-03-31T01:03:12","timestamp":"2018-03-31T15:53:42 UTC","command_class":"select","connection_id":"2394","status":0,"sqltext":"SELECT SUM(k) FROM sbtest1 WHERE id BETWEEN 3 AND 102","user":"sbtest[sbtest] @ localhost []","host":"localhost","os_user":"","ip":"","db":"sbtest"}}
{"audit_record":{"name":"Query","record":"743017007_2018-03-31T01:03:12","timestamp":"2018-03-31T15:53:42 UTC","command_class":"select","connection_id":"2394","status":0,"sqltext":"SELECT c FROM sbtest1 WHERE id BETWEEN 2812021 AND 2812120 ORDER BY c","user":"sbtest[sbtest] @ localhost []","host":"localhost","os_user":"","ip":"","db":"sbtest"}}
{"audit_record":{"name":"Query","record":"743017008_2018-03-31T01:03:12","timestamp":"2018-03-31T15:53:42 UTC","command_class":"select","connection_id":"2394","status":0,"sqltext":"SELECT DISTINCT c FROM sbtest1 WHERE id BETWEEN 1 AND 100 ORDER BY c","user":"sbtest[sbtest] @ localhost []","host":"localhost","os_user":"","ip":"","db":"sbtest"}}

As you can see, there are substantial differences in the purposes of the different MySQL log formats, along with the information they provide.

Why Analyze Raw MySQL Query Logs?

In my opinion, there are two main reasons to look directly at raw log files without aggregation (you might find others):

  1. Auditing, where the Audit Log is useful (Vadim recently blogged about it).
  2. Advanced MySQL/application debugging, where an aggregated summary might not allow you to drill down to the fullest level of detail.

When you’re debugging using MySQL logs, the Slow Query Log, set to log all queries with no sampling, is the most useful. Of course, this can cause significant additional overhead in many workloads, so it is best to do it in a development environment (if you can repeat the situation you’re looking to analyze). At the very least, don’t do it during peak time.

For Percona Server for MySQL, these options ensure it logs all queries to the query log with no sampling:


Now that we have full queries, we can easily use Linux command line tools like grep and others to look into what is going on. However, many times this isn’t always convenient. This is where loading logs into storage that you can conveniently query is a good solution.

Let’s look into some specific and interesting cases.

Were Any Queries Killed?

FROM mysql_slow_log
WHERE killed > 0
│ 2018-04-02 19:02:56 │ select benchmark(10000000000,"1+1") │  10.640794 │
1 rows in set. Elapsed: 0.242 sec. Processed 929.14 million rows, 1.86 GB (3.84                                         billion rows/s., 7.67 GB/s.)

Yes. A query got killed after running for 10 seconds.

Did Any Query Fail? With What Error Codes?

FROM mysql_slow_log
GROUP BY error_num
│         0 │ 2018-04-02 18:59:49 │ 2018-04-07 19:39:27 │ 925428375 │
│      1160 │ 2018-04-02 19:02:56 │ 2018-04-02 19:02:56 │         1 │
│      1213 │ 2018-04-02 19:00:00 │ 2018-04-07 19:18:14 │   3709520 │
│      1054 │ 2018-04-07 19:38:14 │ 2018-04-07 19:38:14 │         1 │
4 rows in set. Elapsed: 2.391 sec. Processed 929.14 million rows, 7.43 GB (388.64 million rows/s., 3.11 GB/s.)

You can resolve error codes with the perror command:

root@rocky:~# perror 1054
MySQL error code 1054 (ER_BAD_FIELD_ERROR): Unknown column '%-.192s' in '%-.192s'

This command has many uses. You can use it to hunt down application issues (like in this example of a missing column — likely due to bad or old code). It can also help you to spot SQL injection attempts that often cause queries with bad syntax, and troubleshoot deadlocks or foreign key violations.

Are There Any Nasty, Long Transactions?

   max(_time) - min(_time) AS run_time,
   count(*) AS num_queries,
   sum(rows_affected) AS rows_changed
FROM mysql_slow_log
WHERE transaction_id != ''
GROUP BY transaction_id
ORDER BY rows_changed DESC
│ 17E070082      │ 0        │      1      │ 9999         │
│ 17934C73C      │ 2        │      6      │ 4            │
│ 178B6D346      │ 0        │      6      │ 4            │
│ 17C909086      │ 2        │      6      │ 4            │
│ 17B45EFAD      │ 5        │      6      │ 4            │
│ 17ABAB840      │ 0        │      6      │ 4            │
│ 17A36AD3F      │ 3        │      6      │ 4            │
│ 178E037A5      │ 1        │      6      │ 4            │
│ 17D1549C9      │ 0        │      6      │ 4            │
│ 1799639F2      │ 1        │      6      │ 4            │
10 rows in set. Elapsed: 15.574 sec. Processed 930.58 million rows, 18.23 GB (59.75 million rows/s., 1.17 GB/s.)

Finding transactions that modify a lot of rows, like transaction 17E070082 above, can be very helpful to ensure you control MySQL replication slave lag. It is also critical if you’re looking to migrate to MySQL Group Replication or Percona XtraDB Cluster.

What Statements Were Executed in a Long Transaction?

FROM mysql_slow_log
WHERE transaction_id = '17E070082'
   _time ASC,
   _ms ASC
│ 2018-04-07 20:08:43 │ 890693 │ update sbtest1 set k=0 where id<10000 │
1 rows in set. Elapsed: 2.361 sec. Processed 931.04 million rows, 10.79 GB (394.27 million rows/s., 4.57 GB/s.)

I used transaction 17E070082 from the previous query above (which modified 9999 rows). Note that this schema improves compression by storing the seconds and microseconds parts of the timestamp in different columns.

Were Any Queries Dumping Large Numbers of Rows From the Database?

FROM mysql_slow_log
WHERE rows_sent > 10000
│ 2018-04-07 20:21:08 │ SELECT /*!40001 SQL_NO_CACHE */ * FROM `sbtest1` │  10000000 │ 1976260712 │
1 rows in set. Elapsed: 0.294 sec. Processed 932.19 million rows, 3.73 GB (3.18 billion rows/s., 12.71 GB/s.)

Did Someone Update a Record?

FROM mysql_slow_log
WHERE (rows_affected > 0) AND (query LIKE '%id=3301689%')
│ 2018-04-02 19:04:48 │ UPDATE sbtest1 SET k=k+1 WHERE id=3301689 │
1 rows in set. Elapsed: 0.046 sec. Processed 2.29 million rows, 161.60 MB (49.57 million rows/s., 3.49 GB/s.)

Note that I’m cheating here by assuming we know an update used a primary key, but it is practically helpful in a lot of cases.

These are just some examples of what you can find out by querying raw slow query logs. They contain a ton of information about query execution (especially in Percona Server for MySQL) that allows you to use them both for performance analysis and some security and auditing purposes.

Hortonworks Community Connection (HCC) is an online collaboration destination for developers, DevOps, customers and partners to get answers to questions, collaborate on technical articles and share code examples from GitHub.  Join the discussion.

big data ,data analytics ,mysql ,querying ,tutorial

Published at DZone with permission of

Opinions expressed by DZone contributors are their own.

{{ parent.title || parent.header.title}}

{{ parent.tldr }}

{{ parent.urlSource.name }}