Poorly written queries can be and often are the main reason for slow server response time. They’re a common bottleneck in the process of delivering dynamic web pages. You probably can’t make every query super fast, but you can certainly improve many, starting with those that have the slowest performance.
I’ve been talking about databases for the last few months, specifically MySQL databases for a few weeks now. I started with some thoughts about SQL in general and then walked you through some configuration tips for MySQL. Then I presented a mini-series about using indexes. I talked about their pros and cons, the different types of indexes, and finally how to create indexes and when to use them.
Last week I picked up the series again and talked about queries. I showed you how you can identify your slowest queries and the ones most likely in need of improvement. What I didn’t talk about was how to write better queries, which I want to address today.
Tips for Writing Better Queries
The first thing to keep in mind is that the fastest query is the one you don’t have to run so make sure you need to run a query before you do.
Like any other type of code, there are multiple ways to write MySQL queries to retrieve the same information from a database, but some of those ways are quicker and some are slower.
I can’t give you specific tips for improving the specific queries you run on your site, but I can offer some general tips I’ve collected for writing better queries in general and point you to some resources for more.
Be more specific with the information you want returned. Don’t use the wildcard to return every column (SELECT * FROM TableName), unless you really need data from every column. Instead select the specific columns and only the specific columns that contain the values that you need (SELECT LastName, FirstName FROM TableName).
Instead of this:
SELECT * FROM TableName;
Opt for this
SELECT Column1, Column2 FROM TableName;
With a small database where the tables only contain a few rows, it’s probably not a big deal to use the wildcard character, but as your database grows it will become a bigger deal.
While you should generally select fewer columns, you’re better off selecting additional columns instead of using SELECT DISTINCT to ensure each row is distinct. SELECT DISTINCT can use a lot of processing. You can usually ensure each row is distinct by selecting an additional column. For example, even if you only need a customer’s last name, you might want to also include their first name or address in your SELECT statement to make sure each row returned is distinct from the others.
Wildcards (%) can be used to help find information in plaintext data (FULLTEXT) and they can be used at the beginning, end, or somewhere in the middle. The closer the wildcard character is to the front of text you want to search, the wider the database will have to search to find matches. It’s best to leave wildcards for the ends of text you want to find.
It’s also better to use either prefix or postfix wildcard than it is to use both.
SELECT * FROM TableName WHERE COLUMN LIKE ‘%texttofind’; //prefix
SELECT * FROM TableName WHERE COLUMN LIKE ‘texttofind%’; //postfix
SELECT * FROM TableName WHERE COLUMN LIKE ‘%texttofind%’; // both
LIMIT can be used to limit the number of results returned instead of all the possible results. You can use LIMIT for sample queries when testing so you aren’t using too much server resources, but, even more, there are plenty of queries where you only want the first few results. For example, any kind of search for the most recent results, probably doesn’t need to return more than five or ten rows of data.
SELECT PostsID FROM TableName ORDER BY date DESC LIMIT 10
You can also use LIMIT to select a range of values. The following queries are two ways to SELECT the next five PostIDs starting with the eleventh record.
SELECT PostsID FROM TableName ORDER BY date DESC LIMIT 11, 5
SELECT PostsID FROM TableName ORDER BY date DESC LIMIT 5 OFFSET 11
Using LIMIT M,N can slow down queries in certain circumstances. The larger the offset the more records MySQL has to find before it can return the next few.
INNER JOINs are typically more performant than using a WHERE clause in a query. Here’s a query using a WHERE clause to map data between two tables.
SELECT Customers.CustomerID, Customers.LastName, Orders.OrderDate
FROM Customers, Orders
WHERE Customers.CustomerID = Orders.CustomerID
To return the data, MySQL creates what’s called a Cartesian Join or Cross Join. It creates all possible combinations of the variables before filtering to find the records to return. If each table contained 100 rows then MySQL would generate 100 X 100 or 10,000 results, which it would then filter to find those where the IDs matched to return to the query.
Here’s how the query could be rewritten using an INNER JOIN.
SELECT Customers.CustomerID, Customers.LastName, Orders.OrderDate
INNER JOIN Orders
ON Customers.CustomerID = Orders.CustomerID
The INNER JOIN generates only the records where the IDs match, which are then returned to the query.
Some database management systems are able to automatically run WHERE joins as INNER JOINs so there’s no performance gain for writing the query with an INNER JOIN, but some is not all and if you want to ensure performance use INNER JOINs instead of WHERE joins.
For similar reasons you want to choose JOINs over subqueries or queries nested within other queries.
However, if you’re using a HAVING statement to filter the results of an INNER JOIN, you may be better off using a WHERE statement instead of using HAVING as it eliminates a step in what HAVING will have to do to and so it’s more efficient and performant to use WHERE in this case.
UNION should be used instead of OR to speed up performance. Both statements lead to the same results, but UNION can combine the result set of multiple queries and it runs faster.
SELECT * FROM table-1, table-2 WHERE table-1.column-1 = table-2.column-2 OR table-1.column-3 = table-2.column-4;
SELECT * FROM table-1, table-2 WHERE table-1.column-1 = table-2.column-2
SELECT * FROM table-1, table-2 WHERE table-1.column-3 = table-2.column-4
Finally, when you do use WHERE, GROUP BY or ORDER By clauses, use them on indexed columns, that is add an index to the columns that will be used in the clause.
The optimization tips I’ve presented here are a few of those I found. There are lots more though. Here are a few of the articles I looked through when writing this post and it isn’t hard to find much more if you care to look.
- 8 Ways to Fine-tune your SQL Queries (for production databases)
- 10 sql tips to speed up your database
- 10 essential performance tips for MySQL
- An Ultimate Guide For MySQL Query Optimization
- 101 Tips to MySQL Tuning and Optimization
- How To Speed Up MySQL: An Introduction To Optimizing
MySQL Query Cache
I said at the start that the fastest query is the one you don’t have to run in the first place. One way not to have to run queries is through caching.
One way you can cache queries is with the MySQL query cache. In the earlier post about MySQL configuration I mentioned two variables, query_cache_type and query_cache_size. The first can turn on the MySQL query cache with a value of ON and the second sets the size to use for the cache.
When the query cache is enabled, MySQL will check each query to determine if the results are already stored in cache and if so it will use the cached results instead of performing the query.
If a table rarely changes, if new rows aren’t added or removed and the table remains static, then the query cache might be something to try, but there are generally better ways to cache database results.
Another option is the use of an external cache such as memcached. I’ll talk more about memcached in a future series, but thought I’d briefly mention it here.
Let me leave you with a couple of PHP tips that relate to your SQL database.
When you initially connect to a database you can use a persistent connectionc. In some databases, making a connection is expensive in terms of performance and you would prefer connections remain open persistently.
With MySQL that’s not an issue and you’ll want to use non-persistent connections so use mysql_connect() instead of mysql_pconnect(). The latter has been deprecated in PHP 5.5 and removed in PHP 7, but MySQLi and MariaDB have similar persistent connection methods you probably want to avoid.
You can also set things in your php.ini file to prevent persistent connections.
mysql.allow_persistent = Off
On last thing. In your PHP code, avoid using queries inside loops. Loops are generally slow so why place what is probably going to be your slowest code (the query) inside a loop?
The fastest query is the one you never execute, but you probably aren’t running too many queries you don’t need. Fortunately, you can optimize a lot of them, especially if you know a few basic tips.
Be more specific with your queries. Don’t ask for more information than you need to be returned. SELECT the fewest amount of columns you need. LIMIT results when you can. Prefer INNER JOINs to WHERE clauses. Learn a few basic things for writing faster queries and develop good habits for writing them.
When possible, you want to cache query results either through the MySQL query cache or better through external cache like Memcached. I’ll be back to close out the backend side of this performance series with a last series about caching some time in the near future.
Download a free sample from my book, Design Fundamentals.