MySQL is a very capable database server. It powers numerous web sites and applications and is so widely used that it was etched as the “M” in LAMP platform. One of the reasons MySQL has become so popular is that it can be extremely fast, particularly in cases where data is primarily being read such as web sites. Even with all its strengths, it is possible for a developer to right a bad query that slows down the database or for the volume of data to reach a point where a normally speedy query becomes quite slow. Optimizing your MySQL queries will help you gain the most performance from your database.
Test to Find Your Problem Spots
It is surprising how often developers will attempt to optimize without really knowing where the problem is located (or that a problem really exists). This stabbing in the dark often leads to developers agonizing over how to optimize a query that is actually performing quite well. The first step to an optimization process should be to identify how the system is performing and where the bottlenecks are located. This is a two step process involving benchmarking and profiling.
Benchmarking is the process of simulating a heavy load on your server to see how well it performs. Benchmarking usually tells you how many pages per second your web server can handle or in the case of MySQL queries per second. There are a lot of tools for doing benchmarking and different approaches. Tools like Super Smack hit the database directly and benchmark it. Tools like ab simulate web traffic and benchmark your database that way. Typically, you should benchmark using both methods.
Profiling is the process of analyzing your database and finding slow queries. MySQL provides some great built-in tools to help you profile your queries. One of these is the MySQL slow query log. This log, when turned on, logs all queries that take longer to execute than the maximum time set. You set this maximum by setting the long_query_time variable in your my.cnf configuration file. The minimum value is 1 and the default value is 10. This value is measured in seconds. To turn on the slow query log, add the following to your my.cnf configuration file:
slow_query_log = <log_filename>
Once you have identified the problem queries using the slow query log, you can further investigate using MySQL’s EXPLAIN and SHOW STATUS commands. EXPLAIN will give you the execution plan for executing a query and may help you find where the query is slowing down.
Normalize and Then Denormalize Where Appropriate
There is a lot of debate about database normalization and performance. It seems that some people feel that normalization creates performance problems. Normalization does trade off some performance for a smaller database footprint and lower memory consumption. It also helps to ensure data integrity. Generally, it is better to normalize your database. MySQL can be incredibly efficient in executing a join and normalization has a number of benefits that outweigh the small performance cost.
Having said that, there are times when it might be preferable to denormalize your data. Sometimes, the extra work of normalization really is not necessary. Take for example the following two tables used to track leads:
CREATE TABLE leads ( id int UNSIGNED NOT NULL AUTO_INCREMENT, name char(100), source char(20), PRIMARY KEY(id) ); CREATE TABLE lead_source( id int UNSIGNED NOT NULL AUTO_INCREMENT, source char(20), PRIMARY_KEY(id) );
In this application there are only 3 lead sources: web, mail, phone. Normalization here really isn’t necessary, if my code provides a way to make sure that the field gets set consistently. In this case, I am coding in Python so I set the following constant:
LEAD_SOURCES = ['web','phone','mail']
Using this constant to set my form values, I can make sure that only one of these three gets set. As a result I condense my table to:
CREATE TABLE leads ( id int UNSIGNED NOT NULL AUTO_INCREMENT, name char(100), source char(20), PRIMARY KEY(id) );
This is a good example of when you should denormalize your data. Fields that only have a couple of choices are good candidates for denormalization, if you can provide a mechanism to control what value gets set in the field in question.
Artificial Primary Keys
It is often necessary to use artificial primary keys. This is typically because the scheme of the underlying data could change and this might affect your primary keys. For example, a bank might use account numbers as the primary key for a table. What happens if that bank later merges with another bank and the format for account numbers changes? In this case, an artificial primary key like an AUTO_INCREMENT is a good idea. But consider a table that is used to define a many to many relationship between tags and photos. Instead of this:
CREATE TABLE tags_photos ( id int UNSIGNED NOT NULL AUTO_INCREMENT, tag_id int NOT NULL, photo_id int NOT NULL, PRIMARY_KEY(id) );
CREATE TABLE tags_photos ( tag_id int NOT NULL, photo_id int NOT NULL, PRIMARY_KEY(tag_id, photo_id) );
In the previous example, the number of entries would have been limited by the size of the integer data type on your system. In the second example, unlimited relationships can be defined.
Check Your Indexes
One of the most common problems with a SQL query is joining or searching on a field that does not have an index. When this occurs, MySQL has to scan all the records in a table. Always check your indexes and make sure you index any fields that you will be frequently joining on or searching by.
Know Your SQL
This might seem like an obvious suggestion but many problems with SQL queries are caused by developers that really don’t understand SQL. If you are not sure what the difference is between a left or right join or how and when to use subselects, get a good book on SQL and practice. Knowing when to use the various SQL statements will do more for optimizing your MySQL queries than any technical trick or strategy.
MySQL is an incredibly robust database platform but even it can need optimization from time to time. The first step in optimizing your MySQL queries should always be to collect good data on how the database is actually performing and where bottlenecks exist. From there it is easy to begin forming strategies to add indexes and clean up queries or table schemas to optimize the slow queries. Finally, make sure your grasp of the SQL language is up to the task. Many slow queries are actually caused by developer errors.