19.3. SQL Tuning Statements and Practices
MySQL provides several statements and utilities that assist with tuning SQL, and you need to be familiar with these resources. The statements and utilities are described in the following sections.
19.3.1. EXPLAIN Statement
The most important SQL tuning statement in the MySQL language is EXPLAIN. EXPLAIN exposes the execution plan that the optimizer will use to resolve a particular SQL statement. Without EXPLAIN, you are doomed to trial-and-error tuning.
EXPLAIN has a simple syntax:
EXPLAIN sql_text;
EXPLAIN returns a result set consisting of at least one row for each table referenced in the SQL. Additional rows might be returned to indicate how subqueries or derived tables are used in the query. Example 19-3 is a simple demonstration of an explain plan for a two-table join (we used the \G option to print the output with each column on a separate line).
Example 19-3. Example of EXPLAIN output
mysql> EXPLAIN SELECT customer_name -> FROM employees join customers -> ON(customers.sales_rep_id=employees.employee_id) -> WHERE employees.surname='GRIGSBY' -> AND employees.firstname='RAY' \G
*************************** 1. row *************************** id: 1 select_type: SIMPLE table: employees1 type: ref possible_keys: PRIMARY,i_employees_name3 key: i_employees_name4 key_len: 80 ref: const,const rows: 15 Extra: Using where; Using index6 *************************** 2. row *************************** id: 1 select_type: SIMPLE table: customers2 type: ref possible_keys: i_customers_sales_rep key: i_customers_sales_rep8 key_len: 9 ref: sqltune.employees.EMPLOYEE_ID7 rows: 55589 Extra: Using where 2 rows in set (0.04 sec)
|
Let's take a look at the most important pieces of information from these plans. Numbers used in the explanation below correspond to superscripts in the EXPLAIN output above.
For joins, the order of the rows output by EXPLAIN corresponds to the join order, so the presence of the employees table in the first row indicates that employees was the first table in the join. customers is the second table in the join. MySQL had a choice between the primary key index and the i_employees_name index. MySQL chose the i_employees_name index to retrieve rows from employees. This index was on (surname, firstname). MySQL has determined that it will fetch only a single row in this stage of the query (e.g., it determined that there was only one employees row with that particular surname+firstname combination). Because the columns in the i_employees_name index were the only employees columns included in the SQL, MySQL was able to satisfy this part of the query using the index aloneaccessing rows in the table itself was unnecessary. MySQL was required to find rows in the customers table that matched specific values of employees.employee_id. MySQL used the i_customers_sales_rep index to retrieve these rows (this was an index on customers.sales_rep_id). MySQL expected to retrieve about 5558 rows from customers. The value here refers to the number of rows that are expected to be processed each time this step is executedwhich, in this case, is only once.
We'll look at a variety of EXPLAIN outputs for common query scenarios in the next few chapters. For now, the main thing to recognize and accept is that if you are going to be tuning SQL statements, you will need to get familiar with the EXPLAIN statement and learn how to interpret the EXPLAIN output.
| The EXPLAIN statement is the primary tool in your SQL tuning toolbox. You should become competent in the interpretation of EXPLAIN output. |
|
19.3.2. EXPLAIN and Stored Programs
Unfortunately, there is no way to directly obtain EXPLAIN output for the SQL statements inside stored programs. EXPLAIN will generate an error if asked to explain a CALL statement or a stored program name.
We hope that this restriction will be relaxed in future releases. In the meantime, to tune the SQL in your stored programs, you need to work with the SQL outside of the stored program and only add it to the program when you are satisfied that it is optimized.
19.3.3. Details of the EXPLAIN Output
The output from the EXPLAIN statement consists of lines containing the following columns:
id
Identifies the individual SELECT statement within a SQL statement that contains multiple SELECT clauses. There will be multiple SELECT statements in SQL statements that contain subqueries, in-line views, or UNION operations. All rows in the EXPLAIN output that have the same ID will belong to the same SELECT statement.
select_type
This column identifies the type of the SELECT statement responsible for this step. Table 19-1 lists the possible values.
Table 19-1. Possible values for the select_type column of the EXPLAIN statement outputselect_type | Explanation |
---|
SIMPLE | A simple SELECT statement that does not involve either subqueries or UNIONs. | PRIMARY | If the SQL contains subqueries or UNIONs, PRIMARY indicates the outermost SQL. PRIMARY could be the SELECT statement that contains subqueries within it or the first SELECT in a UNION. | UNION | The second or subsequent SELECT statements contributing to a UNION operation. | UNION RESULT | The result set of a UNION operation. | SUBQUERY | A subquery that returns rows that are not "dependent" on the rows in the outer SELECT. In practice, this means that the subquery does not contain references to columns in other SELECT statements. | DEPENDENT SUBQUERY | A subquery whose results are dependent on the values in an outer SELECT. This is typical of EXISTS subqueries and of IN subqueries (which MySQL rewrites as EXISTS). | DEPENDENT UNION | The second or subsequent SELECT in a UNION that is dependent on rows from an outer SELECT. | DERIVED | SELECT that appears within a subquery within the FROM clause of another SQL. |
table
Indicates the name of the table involved in this step. If the table is aliased within the SQL statement, then the name of the alias rather than the name of the table will be reported.
type
Indicates the method by which rows will be selected from the table involved. Table 19-2 shows the possible values for the type column.
Table 19-2. Possible values for the type column of the EXPLAIN statement outputtype | Explanation |
---|
all | All rows in the table concerned will be read. This occurs primarily when no suitable index exists to retrieve the rows, or when MySQL determines that a full scan of the table will be less expensive than an index lookup. | const | An index is used to retrieve all values from the table matching a constant value supplied in the WHERE clause. | eq_ref | An index is used to retrieve all rows from the table that match the rows supplied by a previous SELECT. eq_ref is typically seen in conjunction with a well-optimized, indexed join. eq_ref indicates that all parts of a unique or primary key index are used. | ref | Like eq_ref except that either only part of the index can be used or the index is not unique or primary. | ref_or_null | Like ref except that the condition also includes a search for null values. | index merge | Occurs when MySQL merges multiple indexes to retrieve the results. | unique_subquery | An index lookup is used to satisfy the result of a subquery. | range | An index is used to retrieve a range of values from the table. This occurs typically when >, <, or BETWEEN operators are involved. | index | A full scan of the index is undertaken to find the necessary rows. |
possible_keys
Lists all of the keys (indexes) that MySQL considered as having potential to resolve this step. If an index is listed here, but is not used to resolve the step, you can consider using optimizer hints to force or encourage the use of the index. If the index is not listed, then in all probability MySQL cannot use it.
key
Indicates the key (index) that MySQL used to resolve the query.
key_len
Shows the length of the columns in the index used to resolve the query. If there is more than one column in the index, key_len might indicate that only part of the index is used.
ref
Shows which columns are used to select rows from the table. ref may list columns from other tables (join columns from other tables) or the word const if a constant value will be used (this constant value might have come from a WHERE clause literal, or might have been obtained earlier in the query execution).
rows
Indicates the number of rows that MySQL estimates will be processed by this step.
Extra
Contains additional information about the execution step. Possible values for Extra are shown in Table 19-3. Multiple values from this column may appear in the Extra column, separated by semicolons.
Table 19-3. Possible values for the extra column of the EXPLAIN statement outputExtra | Explanation |
---|
distinct | MySQL will stop searching for more rows after the first match is found. | not exists | Occurs in a LEFT JOIN when there is an additional WHERE clause condition that indicates that the WHERE clause condition will never be satisfied. A LEFT JOIN with an IS NULL condition will generate this output. This allows the optimizer to eliminate the table from further processing. | range checked for each record | There is no good general-purpose index to support a join. MySQL will determine on a row-by-row basis whether to use an index and/or which index to use. | Using filesort | MySQL needs to return rows in order, and no index is available to support that ordering. MySQL will need to sort the rows and may need to write to disk during that sorting. Even if there is sufficient memory to avoid a disk sort, you will still see this tag if a sort is necessary. | Using index | This step could be resolved by reading an index alone. Typically, this occurs when all of the columns required to resolve the step are present in an index. | Using index for group-by | Same as Using index, but used to support a GROUP BY operation. | Using temporary | A temporary table is created to hold intermediate results. Often seen in conjunction with using filesort. | Using where | The results returned by this step are filtered to satisfy the WHERE clause condition. | Using sort_union | Similar to using union except that the rows had to be sorted before the UNION could be performed, usually because range conditions are involved. | Using union | A form of index merge in which rows that appeared in any of the index scans are returned. Typically used to support WHERE clause conditions that include OR conditions. | Using intersect | A form of index merge in which only the rows appearing in all of the index scans are returned. Typically used to support WHERE clause conditions that include only AND conditions. |
19.3.4. Extended EXPLAIN
An undocumented feature of the EXPLAIN statement can be used to reveal the rewrites that MySQL performs on a statement prior to execution.
If you issue the statement EXPLAIN EXTENDED sql, followed by SHOW WARNINGS, MySQL will print the SQL that it actually executes, including any rewrites applied to the SQL by the optimizer. For instance, in Example 19-4, we see how MySQL rewrites an IN subquery to an EXISTS subquery.
Example 19-4. Using EXPLAIN EXTENDED
mysql> EXPLAIN EXTENDED SELECT COUNT(*) FROM ta_5000 WHERE sales_id IN (SELECT sales_ id FROM tb_5000)\G *************************** 1. row *************************** id: 1 select_type: PRIMARY table: ta_5000 type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 5131 Extra: Using where *************************** 2. row *************************** id: 2 select_type: DEPENDENT SUBQUERY table: tb_5000 type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 4985 Extra: Using where 2 rows in set, 1 warning (0.04 sec)
mysql> SHOW WARNINGS \G *************************** 1. row *************************** Level: Note Code: 1003 Message: select count(0) AS 'count(*)' from 'sqltune'.'ta_5000' where <in_optimizer> ('sqltune'.'ta_5000'.'SALES_ID',<exists>(select 1 AS 'Not_used' from 'sqltune'.'tb_5000' where (<cache>('sqltune'.'ta_5000'. 'SALES_ID') = 'sqltune'.'tb_5000'.'SALES_ID'))) 1 row in set (0.05 sec)
|
Most of the time, MySQL rewrites are not particularly significant. However, if you are completely at a loss to understand MySQL's refusal to use an index or some other execution plan decision, examining the rewrite might be useful.
19.3.5. Optimizer Hints
Optimizer hints are instructions that you can embed in your SQL that do not change the meaning of the SQL, but rather instruct or suggest to the optimizer how you would like the SQL to be executed.
Most of the time, you will not need to add hints. In fact, hints can be dangerous because they limit the choices the optimizer has available, and if data in the tables change or if new indexes are added to the table, MySQL may be unable to adapt because of your hints. However, there definitely will be situations where you will discover that the optimizer has made a less than perfect decision and you will want to give the optimizer specific instructions.
Table 19-4 lists the commonly used optimizer hints . We will see examples of each of these hints in the next two chapters.
Table 19-4. MySQL optimizer hintsHint | Where it appears | What it does |
---|
STRAIGHT_JOIN | After the SELECT clause | Forces the optimizer to join the tables in the order in which they appear in the FROM clause. Use this if you want to force tables to be joined in a particular order. | USE INDEX(index [,index...]) | After a table name in the FROM clause | Instructs MySQL to only consider using the indexes listed. MySQL may choose to use none of the indexes if it calculates that using them would not be faster than scanning the entire table. | FORCE INDEX(index [,index...]) | After a table name in the FROM clause | Instructs MySQL to use one of the indexes listed. This differs from USE INDEX in that MySQL is instructed not to perform a table scan of the data unless it is impossible to use any of the indexes listed. | IGNORE INDEX(index [,index...]) | After a table name in the FROM clause | Instructs MySQL not to consider any of the listed indexes when working out the execution plan. |
19.3.6. Measuring SQL and Stored Program Execution
When we execute a SQL statement from the MySQL command line, MySQL is kind enough to report on the elapsed time taken to execute the statement:
mysql> CALL TestProc1( ); Query OK, 0 rows affected (9.35 sec)
Elapsed time is a good first measurement of SQL or stored program performance, but there are lots of reasons why elapsed time might vary between runs that may have absolutely nothing to do with how well the SQL statement is optimized:
Other users may be running jobs on the host while we execute our SQL statements; we will be contending with them for CPU, disk I/O, and locks. The number of physical I/Os necessary to execute our statement will vary depending on the amount of data cached in the operating system, the MyISAM key cache, the InnoDB buffer pool, and/or some other storage engine-specific cache.
For these reasons, it is sometimes better to obtain additional metrics to work out whether our tuning efforts are successful. Useful execution statistics can be obtained from the SHOW STATUS statement, although the level of detail will vary depending on our storage engine, with InnoDB currently offering the most comprehensive selection of statistics.
Generally, we will want to compare before and after variables for each statistic andbecause the statistics are sometimes computed across all sessions using the MySQL serverensure that our session has exclusive use of the server while the statement runs.
In Example 19-5, we calculate the number of logical and physical reads performed while counting the number of rows on the InnoDB-based sales table. Logical reads are the number of block requests from the InnoDB buffer pool, while physical reads reflect the number of blocks that actually had to be read from disk.
Example 19-5. Examining InnoDB execution statistics before and after SQL statement execution
mysql> /* Logical reads before execution*/ SHOW STATUS LIKE 'Innodb_buffer_pool_read_requests'; +----------------------------------+-------+ | Variable_name | Value | +----------------------------------+-------+ | Innodb_buffer_pool_read_requests | 598 | +----------------------------------+-------+ 1 row in set (0.01 sec)
mysql> /* Physical reads before execution*/ SHOW STATUS LIKE 'Innodb_data_reads'; +-------------------+-------+ | Variable_name | Value | +-------------------+-------+ | Innodb_data_reads | 79 | +-------------------+-------+ 1 row in set (0.01 sec)
mysql> mysql> SELECT count(*) from sales; +----------+ | count(*) | +----------+ | 2500000 | +----------+ 1 row in set (27.67 sec)
mysql> mysql> /* Logical reads after execution*/ SHOW STATUS LIKE 'Innodb_buffer_pool_read_requests'; +----------------------------------+--------+ | Variable_name | Value | +----------------------------------+--------+ | Innodb_buffer_pool_read_requests | 365177 | +----------------------------------+--------+ 1 row in set (0.46 sec)
mysql> /* Physical reads after execution*/ SHOW STATUS LIKE 'Innodb_data_reads'; +-------------------+-------+ | Variable_name | Value | +-------------------+-------+ | Innodb_data_reads | 17472 | +-------------------+-------+ 1 row in set (0.01 sec)
|
Subtracting the before values from the after values gives us a logical read count of 364,579 and a physical read count of 17,393. We also note the elapsed time of 27.67 seconds.
The next time we execute this query, we might see a lower physical read count and a lower elapsed time because the data we need is already in cache. However, we would not expect the logical read count to change unless the data in the table was changed. This makes the logical read statistics (Innodb_buffer_pool_read_requests) arguably the most useful statistics for determining if our SQL tuning efforts have been successful.
Table 19-5 shows the SHOW STATUS variables that are most useful for measuring SQL execution performance.
Table 19-5. SHOW STATUS statistics that are useful when measuring SQL performanceSHOW STATUS statistic | Explanation |
---|
Innodb_buffer_pool_read_requests | Number of requests from the InnoDB buffer pool. This statistic is sometimes called logical reads since it reflects the absolute number of data reads required to satisfy a query. This value will remain constant between runs provided that our data does not change. If we observe a reduction in this statistic, then we have almost certainly improved the performance of our query. | Innodb_data_reads | Number of blocks from disk that InnoDB had to read to execute the query. If the cache is empty, then this value will be equal to Innodb_buffer_pool_read_requests. If all of the required blocks are in the cache, then this statistic will be 0. Usually, the value will be somewhere in between. If two executions of the same SQL have different response times, we can look at this statistic to determine if the difference is because one execution required more physical I/O. | Innodb_rows_read | Number of rows read by InnoDB to satisfy the query. For some SQL statements, we may see excessive values for this statistic, which generally indicates that the SQL is inefficient (because it is accessing the same rows twice, or because it is accessing more rows than are required). | Last_query_cost | Optimizer's "cost" estimate for the last SQL executed. Unlike the other metrics, this statistic does not require us to have to calculate a delta value. Higher costs indicate that the optimizer thinks the SQL will take longer to run. | Sort_rows | Number of rows that had to be sorted. | Sort_merge_passes | Number of disk sort "merge runs" that had to be performed. The fewer merge runs, the faster the sort. Chapter 21 describes sort optimization in detail. |
19.3.7. The Slow Query Log
One way to identify SQL statements or stored programs that may need tuning is to enable the MySQL slow query log. We can do this by adding the following lines to our MySQL initialization files:
log_slow_queries long_query_time=N
This will cause MySQL to write any queries that exceed an elapsed time exceeding N seconds to a log file. The log file can be found in the MySQL data directory and is named hostname-slow.log. For each SQL statement identified, MySQL will print the SQL statement along with a few execution statistics, as shown in Example 19-6.
Example 19-6. Example of slow query log contents
Time Id Command Argument # Time: 050214 23:42:30 # User@Host: root[root] @ localhost [127.0.0.1] # Query_time: 67 Lock_time: 0 Rows_sent: 1 Rows_examined: 101199 use sqltune; select count(*) from customers where contact_surname not in (select surname from employees);
|
The slow query log execution statistics are not particularly enlightening, and there is no EXPLAIN output, so we would normally paste the SQL into our MySQL client for further analysis.
Starting with MySQL 5.1.6, the slow query log can be directed to the database table mysql.slow_log. This allows us to more easily access the information from MySQL clients and gives us the power to analyze the information using SQL statements. We enable logging to this table by specifying log_output=TABLE in our initialization file.
|
No comments:
Post a Comment