Indexes in MySQL - aavin.dev

Table of Contents

Indexes in MySQL (Mysql indexes) are a great tool for optimizing SQL queries. To understand how they work, let’s look at working with data without them.

Reading data from disk

Files are stored in hard drive will be in blocks.One file usually takes several blocks. Each block knows which block comes after it. The file is divided into pieces and each piece is saved in an empty block.

When reading a file, the controller take turns going through all the blocks and collect the file from pieces. Blocks of one file can be scattered across the disk (fragmentation). Then the file reading will slow down, because you will need to jump to different parts of the disk.

The two main physical factors of disk that determine how quickly the disk subsystem will respond to requests are the average disk access time on hard drive and the speed of the disk controller. When writing data to disk, the OS writes only to the controller, so high-speed writing demands a very fast controller. When reading, data is transferred from the disk to the controller, so disk access speed is critical.

When the controller try to look for something inside the file, it has to go through all the blocks in which it is stored. If the file is very large, then the number of blocks will be significant. The need to jump from block to block, which can be in different places, will greatly slow down the data search.

Data search in MySQL

Let’s consider MySQL tables are regular files. Let’s execute a request of this type:

SELECT * FROM users WHERE age = 29;

MySQL opens a file where data from the users table is stored . And then it starts to sort through the whole file to find the necessary records.

In addition, MySQL will compare the data in each row of the table with the value in the query. Let’s say the work is carried out with a table in which there are 10 records. Then MySQL will read all 10 records, compare the age column of each of them with a value of 29, and select only the appropriate data.

There are two problems when reading data:

Slow reading of files due to the location of blocks in different parts of the disk (fragmentation).
A large number of comparison operations to find the data you need.

Sorting the data

Imagine that we sorted our 10 records in descending order. Then using the binary search algorithm , we could select the values we need for a maximum of 4 operations:

In addition to fewer comparison operations, we would save on reading unnecessary records.

An index is a sorted set of values. In MySQL, indexes are always built for a particular column. For example, we could build an index for the age column from the example.

The choice of indexes in MySQL

In the simplest case, an index needs to be created for those columns that are present in the WHERE clause.This mainly depends upon the business requirements,type of query patterns often used.

Consider the query from the example

SELECT * FROM users WHERE age = 29;

We need to create an index on the age column

CREATE INDEX age ON users(age);

After this operation, MySQL will begin to use the age index (consider index as a 🚩) to perform such queries. The index will be used for selections by the ranges of values of this column.

SELECT * FROM users WHERE age < 29;

Sorting

Let’s fire a query

SELECT * FROM users ORDER BY register_date;

the same rule applies – create an index on the column by which sorting occurs

CREATE INDEX register_date ON users(register_date);

Index storage internals

Imagine that our table looks like this

id | name | age
1  | Den | 29th
2  | Alyona | fifteen
3  | Putin | 89
4  | Petro | 12

After creating an index on the age column, MySQL will save all its values in sorted form

age index
12
fifteen
29th
89

In addition, the relationship between the value in the index and the record to which this value corresponds will also be preserved. Usually a primary key is used for this.

age index and post link
12: 4
15: 2
29: 1
89: 3

Unique Indexes

MySQL supports unique indexes. This is convenient for columns whose values should be unique throughout the table. Such indexes improve sampling performance for unique values. For instance

SELECT * FROM users WHERE email = '[email protected]';

On the email column we need to create a unique index as before.

CREATE UNIQUE INDEX email ON users(email);

Compound indices

MySQL can use only one index per query (unless MySQL is able to combine the results of the samples across multiple indexes ). Therefore, for queries that use multiple columns, you must use composite indexes.

Consider the following query

SELECT * FROM users WHERE age = 29 AND gender = 'male'

We should create a composite index on both columns

CREATE INDEX age_gender ON users(age, gender);

Compound Index Usage

To use composite indexes correctly, you need to understand the structure of their storage. Everything works exactly the same as for a regular index.

For a table with following data

id | name | age | gender
1 | Den | 29 | male
2 | Alyona | 15 | female
3 | Putin | 89 | tsar
4 | Petro | 12 | male

composite index values will be as follows

age_gender
12male
15female
29male
89tsar

This means that the order of columns in the index will play a big role. Typically, columns used in WHERE clauses should be placed at the beginning of the index. Columns from ORDER BY – at the end.

Range Search

Imagine that our query will use not a comparison, but a range search

SELECT * FROM users WHERE age <= 29 AND gender = 'male'

Then MySQL will not be able to use the full index, as gender values will differ for different age column values. In this case, the database will try to use part of the index (only age) to execute this query

age_gender 
12 male
 15 female
 29 male
 89 female

First, all data that matches the condition age <= 29 will be filtered . Then, a search for the value “male” will be performed without using an index.

Sorting in Compound index

Compound indexes can also be used if sorting is performed

SELECT * FROM users WHERE gender = 'male' ORDER BY age

In this case, we will need to create the index in a different order, because sorting (ORDER) occurs after filtering (WHERE)

CREATE INDEX gender_age ON users(gender, age);

This order of columns in the index allows you to filter by the first part of the index, and then sort the result by the second.

The columns in the index may be larger if required

SELECT * FROM users WHERE gender = 'male' AND country = 'UA' ORDER BY age, register_time

In this case, create the following index

CREATE INDEX gender_country_age_register ON users(gender, country, age, register_time);

Using EXPLAIN for index analysis

mysql> EXPLAIN SELECT * FROM users WHERE email = '[email protected]';
+ ---- + ------------- + ------- + ------ + --------------- + ------ + --------- + ------ + ------ + ------------- +
| id | select_type | table | type | possible_keys | key   | key_len | ref | rows | Extra |
+ ---- + ------------- + ------- + ------ + --------------- + ------ + --------- + ------ + ------ + ------------- +
| 1 | SIMPLE | users | ALL | NULL | NULL | NULL | NULL |  336 | Using where |
+ ---- + ------------- + ------- + ------ + --------------- + ------ + --------- + ------ + ------ + ------------- +

The key column shows the index used. The possible_keys column shows all the indexes that can be used for this query. The rows column shows the number of records that the database had to read to fulfill this query (there are 336 entries in the table).

As you can see, the example does not use any index. After creating the index

mysql> EXPLAIN SELECT * FROM users WHERE email = '[email protected]';
+ ---- + ------------- + ------- + ------- + -------------- - + ------- + --------- + ------- + ------ + ------- +
| id | select_type | table | type | possible_keys | key    | key_len | ref | rows | Extra |
+ ---- + ------------- + ------- + ------- + -------------- - + ------- + --------- + ------- + ------ + ------- +
| 1 | SIMPLE | users | const | email | email | 386 | const |    1 | |
+ ---- + ------------- + ------- + ------- + -------------- - + ------- + --------- + ------- + ------ + ------- +

Only one entry has been read since index was used.

Checking the length of composite indexes

Explain will also help determine if the composite index is being used correctly. Let’s check the query from the example (with an index on the age and gender columns)

mysql> EXPLAIN SELECT * FROM users WHERE age = 29 AND gender = 'male';
+ ---- + ------------- + -------- + ------ + -------------- - + ------------ + --------- + ------------- + ------ + ---- --------- +
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+ ---- + ------------- + -------- + ------ + -------------- - + ------------ + --------- + ------------- + ------ + ---- --------- +
| 1 | SIMPLE | users | ref | age_gender | age_gender | 24       | const, const | 1 | Using where |
+ ---- + ------------- + -------- + ------ + -------------- - + ------------ + --------- + ------------- + ------ + ---- --------- +

The key_len value indicates the index length used. In our case, 24 bytes is the length of the entire index (5 bytes age + 19 bytes gender).

If we change the exact comparison to a range search, we see that MySQL uses only part of the index

mysql> EXPLAIN SELECT * FROM users WHERE age <= 29 AND gender = 'male';
+ ---- + ------------- + -------- + ------ + -------------- - + ------------ + --------- + ------ + ------ + ----------- - +
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+ ---- + ------------- + -------- + ------ + -------------- - + ------------ + --------- + ------ + ------ + ----------- - +
| 1 | SIMPLE | users | ref | age_gender | age_gender | 5        | | 82 | Using where |
+ ---- + ------------- + -------- + ------ + -------------- - + ------------ + --------- + ------ + ------ + ----------- - +

This is a signal that the created index is not suitable for this query. If we create the correct index

mysql> Create index gender_age on users (gender, age);
mysql> EXPLAIN SELECT * FROM users WHERE age <29 and gender = 'male';
+ ---- + ------------- + -------- + ------- + ------------- ---------- + ------------ + --------- + ------ + ------ + - ----------- +
| id | select_type | table | type | possible_keys | key | key_len | ref | rows | Extra |
+ ---- + ------------- + -------- + ------- + ------------- ---------- + ------------ + --------- + ------ + ------ + - ----------- +
| 1 | SIMPLE | users | range | age_gender, gender_age | gender_age | 24       | NULL | 47 | Using where |
+ ---- + ------------- + -------- + ------- + ------------- ---------- + ------------ + --------- + ------ + ------ + - ----------- +

In this case, MySQL uses the entire gender_age index, as the order of the columns in it allows you to make this selection.

Index selectivity

Now if we issue the following query

SELECT * FROM users WHERE age = 29 AND gender = 'male'

For the above query, you must create a composite index. But how to choose the sequence of columns in the index? There are two options:

age, gender
gender, age

Both will do. But they will work with different efficiency.

To understand this, consider the uniqueness of the values of each column and the number of corresponding entries in the table

mysql> select age, count (*) from users group by age;
+ ------ + ---------- +
| age | count (*) |
+ ------ + ---------- +
| 15 |      160 |
| 16 |      250 |
| ... |
| 76 |      210 |
| 85 |      230 |
+ ------ + ---------- +
68 rows in set (0.00 sec)

mysql> select gender, count (*) from users group by gender;
+ -------- + ---------- +
| gender | count (*) |
+ -------- + ---------- +
| female |     8740 |
| male |     4500 |
+ -------- + ---------- +
2 rows in set (0.00 sec)

From the above we get the following information :

Any age column usually contains about 200 entries.
Any gender column value is about 6,000 entries.

If the age column goes first in the index, then MySQL after the first part of the index the number of records will be 200 and has to select out of them. But If the gender column goes first, then the number of records will be around 6000 now we can imagine the order of efficiency here.

This means that the age_gender index will work better than gender_age.

The selectivity of a column is determined by the number of entries (cardinality)in the table with the same values. When records with the same value are few, selectivity is high. Such columns must be used first in composite indices.

Primary keys

Primary Key is a special type of index, which is the identifier of the entries in the table. It is necessarily unique and is indicated when creating tables

CREATE TABLE `users` (
  `id` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `email` varchar(128) NOT NULL,
  `name` varchar(128) NOT NULL,
  PRIMARY KEY (`id`),
) ENGINE=InnoDB AUTO_INCREMENT=1 DEFAULT CHARSET=utf8

Note:- When using InnoDB tables, always define primary keys . If there is no primary key, MySQL will still create a virtual private key.

Cluster Indexes

Regular indexes are non-clustered. This means that the index itself stores only links to table entries. When working with the index, only the list of records (more precisely, the list of their primary keys) matching the query is determined. After this, another request occurs – to obtain the data of each record from this list.

Cluster indexes store the entire data of records, not links to them. When working with such an index, an additional data reading operation is not required.

The primary keys of InnoDB tables are clustered. Therefore, sampling on them is very effective.

Overhead on Indexing

It is important to remember that indexing requires additional write operations to disk. Each time you update or add data to a table, data is also written and updated in the index.

New Data/Update Data–> Index + Data updated

Create only the necessary indexes so as not to waste server resources. Monitor index sizes for your tables

mysql> show table status ;
+ ------------------- + -------- + --------- + ---------- - + -------- + ---------------- + ------------- + ------- ---------- + -------------- + ----------- + ------------ ---- + --------------------- + ------------- + --------- --- + ----------------- + ---------- + ---------------- + --------- +
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time | Update_time | Check_time | Collation | Checksum | Create_options | Comment |
+ ------------------- + -------- + --------- + ---------- - + -------- + ---------------- + ------------- + ------- ---------- + -------------- + ----------- + ------------ ---- + --------------------- + ------------- + --------- --- + ----------------- + ---------- + ---------------- + --------- +
...
| users | InnoDB | 10 | Compact | 314 | 208 |       65536 | 0 |        16384 | 0 | 355 | 2014-07-11 01:12:17 | NULL | NULL | utf8_general_ci | NULL | | |
+ ------------------- + -------- + --------- + ---------- - + -------- + ---------------- + ------------- + ------- ---------- + -------------- + ----------- + ------------ ---- + --------------------- + ------------- + --------- --- + ----------------- + ---------- + ---------------- + --------- +
18 rows in set (0.06 sec)

When to create indexes?

Indexes should be created as slow queries are detected. Requests that run more than 1 second are the first candidates for optimization.
Start creating indexes with the most common queries. Based in business requirements or common query patterns used. A query that runs a second, but 1000 times a day does more damage than a 10-second query that runs fewer times a day.
Do not create indexes on tables with fewer than a few thousand records. For such sizes, the gain from using the index will be almost invisible.
Do not create indexes in advance, for example, in a development environment. Indexes should be set exclusively for the form and type of load of a working system.
Delete unused indexes.

Last but Not Least

Allow enough time to analyze and organize indexes in MySQL (and in other databases). This can take a lot more time than designing the database structure. It will be convenient to organize a test environment with a copy of real data and check there different index structures.This will save lot of time on production systems debugging.

Do not create indexes on each column that is in the query, MySQL does not work like that. If your use case is different then you may think on using a different database specific to your use-case or business scenario. Use unique indexes where necessary. Always set primary keys.