Top Most Overlooked MySQL Performance Optimizations: Part II
Raid levels, large tables, and buffer pools are among the tips offered to optimize MySQL performance. Here is Part II of Muhammad Irfan's Q and A roundup from his recent webinar.
Join the DZone community and get the full member experience.Join For Free
Did you miss Part I? Check it out here!
On myisam settings, doesn’t the MySQL and information_schema schemas require myisam? If so, are any settings more than default needing to be changed?
performance_schema uses the PERFORMANCE_SCHEMA storage engine, so only MySQL system database tables use the MyISAM engine. The MySQL system database is not used much and usually default settings for MyISAM engine are fine.
Will functions make my app slow compare than query?
I’m not sure how you’re comparing “queries” versus “stored functions.” Functions also need to transform, similar to the query execution plan. But it might be slower compare to well-coded SQL, even with the overhead of copying the resulting data set back to the client. Typically, functions have many SQL queries. The trade-off is that this does increase the load on the database server because more of the work is done on the server side and less is done on the client (application) side.
Foreign key will make my fetches slower?
MySQL enforces referential integrity (which ensures data consistency between related tables) via foreign keys for the InnoDB storage engine. There could be some overhead of this for the INSERT/UPDATE/DELETE foreign key column, which has to check if the value exists in a related column of other tables. There could be some overhead for this, but again it’s an index lookup so the cost shouldn’t be high. However, locking overhead comes into play as well. This blog post from our CEO is informative on this topic. This especially affects writes, but I don’t think FK fetches slower for SELECT as it’s an index lookup.
Large pool size can have a negative impact on performance? About 62GB of pool size?
The InnoDB buffer pool is by far the most important option for InnoDB Performance, as it’s the main cache for data and indexes and it must be set correctly. Setting it large enough (i.e., larger than your dataset) shouldn’t cause any problems as long as you leave enough memory for OS needs and for MySQL buffers (e.g., sort buffer, join buffer, temporary tables, etc.).
62GB doesn’t necessarily mean a big InnoDB buffer pool. It depends on how much memory your MySQL server contains, and what the size of your total InnoDB dataset is. A good rule of thumb is to set the InnoDB buffer pool size as large as possible, while still leaving enough memory for MySQL buffers and for OS.
You find duplicate, redundant indexes by looking at information_schema.key_column_usage directly?
The key_column_usage view provides information about key columns constraints. It doesn’t provide information about duplicate or redundant indexes.
Can you find candidate missing indexes by looking at the slow query log?
Yes, as I mentioned you can find unused indexes by enabling log_queries_not_using_indexes. It writes to slow_query_log. You can also enable the user_statistics feature which adds several information_schema tables, and you can find un-used indexes with the help of user_statistics. pt-index-usage is yet another tool from Percona toolkit for this purpose. Also, check this blog post on this topic.
How to find the unused indexes? They also have an impact on performance.
Unused indexes can be found with the help of the pt-index-usage tool from Percona toolkit as mentioned above. If you are using Percona Server, you can also use User Statistics feature. Check this blog post from my colleague, which shows another technique to find unused indexes.
As far as I understand, MIXED will automatically use ROW for non-deterministic and STATEMENT for deterministic queries. I’ve been using it for years now without any problems. So why this recommendation of ROW?
In Mixed Mode, MySQL uses statement-based replication for most queries, switching to row-based replication only when statement-based replication would cause an inconsistency. We recommend ROW-based logging because it’s efficient and performs better as it requires less row locks. However, RBR can generate more data if a DML query affects many rows and a significant amount of data needs to be written to the binary log (and you can configure binlog_row_image parameter to control the amount of logging). Also, make sure you have good network bandwidth between master/slave(s) for RBR, as it needs to send more data to slaves. Another important thing to get best of the performance with ROW-based replication is to make sure all your database tables contain a Primary Key or Unique Key (because of this bug http://bugs.mysql.com/bug.php?id=53375).
Can you give a brief overview of sharding… The pros and cons also.
With Sharding, database data is split into multiple databases with each shard storing a subset of data. Sharding is useful to scale writes if you have a huge dataset and a single server can’t handle the amount of writes.
Performance and throughput could be better with sharding. On the other hand, it requires lots of development and administration efforts. The application needs to be aware of the shards and keep track of which data is stored in which shard. You can use MySQL Fabric framework to manage farms of MySQL Servers. Check for details in the manual.
Why not mixed replication mode instead of row-based replication ?
As I mentioned above, MIXED uses a STATEMENT-based format by default, and converts to ROW-based replication format for non-deterministic queries. But ROW-based format is recommended as there could still be cases where MySQL fails to detect non-deterministic query behavior and replicates in a STATEMENT-based format.
Can you specify a few variables which could reduce slave lag?
Because of the single-threaded nature of MySQL (until MySQL 5.6), there is always a chance that a MySQL slave can lag from the master. I would suggest considering the below parameters to avoid slave lag:
- innodb_flush_log_at_trx_commit <> 1, Either set it t or 0 however, it could cause you 1 second of data loss in case of crash.
- innodb_flush_method = O_DIRECT, for unix like operating system O_DIRECT is recommended to avoid double buffering. If your InnoDB data and log files are located on SAN then O_DIRECT is probably not good choice.
- log_bin = 0, Disable binary logging (if enabled) to minimize extra Disk IO.
- sync_binlog = 0, Disable sync_binlog.
Those above parameters would definitely help to minimize slave lag. However, along with that make sure your slave(s) hardware is as strong as the master. Make sure your read queries are fast enough. Don’t overload slave to much, and distribute read traffic evenly between slave(s). Also, you should have the same table definitions on slave(s) as the master (e.g., master server indexes must exist on slave(s) tables too). Last but not least, I wrote a blog post on how to diagnose and cure replication lag. It might be useful for further reading.
Opinions expressed by DZone contributors are their own.