As useful as indexes are to finding information in a database, they aren’t all created equal. Sometimes you need to find specific values in a table and sometimes you need to find something inside a specific value, say a blob of text. Not all indexes are suited for every task.
Last week I started talking about indexes as part of a larger series on SQL, specifically MySQL databases. I few weeks ago I walked through some MySQL configuration tweaks and last week I talked about indexes in general along with some of their pros and cons and when indexes are and aren’t appropriate.
Today I want to talk about the different types of indexes that exist as well as when you would prefer to use one type over another. I’ll also talk a little about the different data structures that MySQL uses to build index tables.
Single and Multiple Column Indexes
Last week I talked about indexes as though they’re all exactly same, but there are different types of indexes and different ways you might create them.
The first thing to know (and the first way we can distinguish index types) is that you can add an index to a single column or to multiple columnsv. In fact, in MySQL 8 an index can contain up to 16 columns though you probably won’t ever need that many. Whether to use a single or multiple column index depends on the type of data for which you commonly search.
For example, you might have a customers table that holds columns for first_name and last_name. If you only ever search the last_name values then a single column index on the last_name would be appropriate. If you often search for both first and last names in the same query, then you might consider a multiple column index on first_name and last_name.
In general you want to use a multi-column index to speed queries that test all the columns in the index, or just the first column, or just the first and second column, or just the first, second, and third columns, and so on all the way up to testing all the columns. If your query only tests one column, then a single column query is more appropriate.
Five Types of Indexes
When you create an index or add one to an existing table, you’ll create it as one of several types of indexes.
A unique index is one in which all column values must be unique. In a single column unique index there can be no duplication of values in the column being indexed. In a multi-column unique index the values can be duplicated in a single column, but the combination of column values in each row must be unique. You use a unique index to prevent duplicate values and you often define the index after a table has been created.
A primary key is a unique index in which no value can be NULL. Every row must have a value for the column or combination of columns. You would usually define a primary key on the smallest number of columns possible because of this, and most of the time a primary key will be set on a single column. Also, once set the column values in the primary key can’t be changed.
You wouldn’t, for example, include a middle_name field in a primary key as not everyone has or uses a middle name and NULL values are to be expected. But you might include both first and last names as a primary key or more likely last_name and address, both of which will always exist, but aren’t likely to be duplicated in combination. You usually define a primary key when the table is created.
A simple, regular, or normal index is an index where the values don’t need to be unique and they can be NULL. This is the index I’ve mostly been talking about to this point. They’re added simply to help the database find things faster.
A fulltext index, as the name implies, are used for full text searches. Sometimes you want to find the blob of text that contains a certain word or group of words or maybe you want to find a certain substring within the larger block of text.
Instead of the entire value being indexed as a whole, a fulltext index will index the individual words inside each text block. This makes it faster to find the specific words and phrases inside the whole text.
A descending index (available in version 8+ of MySQL) is a regular index stored in reverse order. It’s helpful when you run queries for the most recently added data like you might to show your five most recent posts or the ten most recent comments on all your posts.
Index Data Structures
MySQL has options in the type of data structure to use when creating index tables. The indexes can be either clustered or non-clustered. In a clustered index, the rows of data are stored on disk in the same order as the index. Because of this you can only ever have a single clustered index on any table.
A non-clustered index uses pointers to the data. In other words the structure of the index is separate from the structure of the data rows in the table. Because of the separate structures, you can use as many as you want and you can rearrange the index and still have everything pointing to the right place.
Clustered indexes are generally faster to read since you can get all the information from the index and not have to also consult the original table. A non-clustered index requires you find the specific index and use it to look up the data in the table. A clustered index can take longer to write to, especially if the new data requires the existing data to be reorganized for the index.
InnoDB tables in MySQL use the primary key as the clustered index. If no primary key exists, it will look for the first unique index without any NULL values. If it can’t find either, it will create a clustered index based on a synthetic column containing row ID values. All other indexes are non-clustered.
In addition to being clustered or non-clustered, there are different data structures that can be used to create an index, two important ones being B-tree indexes and hash indexes.
Have you ever been in an airport looking for your gate, say gate 44, and you see signs that show gates 1–30 are to the left and gates 31–60 are to the right? Following the signs, you go to your right.
In tree structure terms, when you’re standing at the signs, you’re at the root node. The lower numbered gates to the left are one child node and the higher numbers gates to the right are another child node. When you’re at the root node and you turn right (gates 31–60, you haven’t found your gate (44) yet, but you’ve eliminated half the gates (1–30) from consideration.
What I’ve described is similar to a binary search tree where each node lets you search further in one of two directions. A B-Tree (or B+Tree) is similar except that there could be more than two options, so instead of signs for gates 1–30 and 31–60, you might see three signs (1–20, 21–40, and 41–60) or ten signs for gates 1–10, 11–20, and so on.
To find what you want in a B-Tree structure, you start at the root and work your way down through all the child nodes, eliminating what you don’t want until you get to what you do want.
A B-Tree structure is good for comparisons in expressions that use the =, >, >=, <, <=, or BETWEEN operators. Is also good for LIKE comparisons where the argument is a constant string that doesn’t start with a wildcard character.
A hash index works differently. Instead of ordering index records based on comparisons like a B-Tree index, a Hash index uses a hashing function to create a semi-unique hash value for each key in the index. The value is then used to determine where to place the key/value pair in the index.
Hash indexes can be used for equality comparisons that use the = or <=> operators. They’re very fast, but they aren’t good at finding ranges of values, only exact values and they can’t be used to sort results through ORDER BY.
MyISAM tables only use B-Tree indexes. InnoDB tables can take advantage of Hash indexes through the InnoDB Adaptive Hash Index. It’s enabled by the innodb_adaptive_hash_index option, or turned off by –skip-innodb_adaptive_hash_index at startup.
MySQL will observe the pattern of searches and build this index based on what it finds, hence the use of adaptive in the name. An adaptive hash index is built on top of a B-Tree index.
In a way an adaptive hash index is like a cache of some of the index, based on what’s most often searched for. If something is in that cache (the adaptive hash index) it will be found quicker than the B-Tree. If not, the B-Tree will be used after, which means the search ends up being a little slower, but still successful. There’s also an overhead to maintaining the adaptive hash index so it isn’t something you’ll automatically set for everything, but it is an option.
Not all indexes are exactly the same. You can add them to single columns and you can add them to multiple columns, depending on the particular set of data that you want returned.
There are also multiple types of indexes you can choose depending on whether the data needs to be unique or exist at all, whether you want to find specific values amongst all the rows of the table or strings within values of text.
MySQL also uses different data structures to create index tables. B-Tree indexes are quick when comparisons are used. Hash indexes are faster as long as you don’t want a range of values. MySQL even has an adaptive hash index which lets you use a hash index and fall back on a B-Tree.
With all this talk about indexes, there’s still one very important thing I haven’t shown you. I haven’t shown you how to create indexes and add them to existing tables and I haven’t offered any thoughts on what conditions lead to good situations for using indexes. I’ll talk about both next week.
Download a free sample from my book, Design Fundamentals.