21.1. Tuning SubqueriesA subquery is a SQL statement that is embedded within the WHERE clause of another statement. For instance, Example 21-1 uses a subquery to determine the number of customers who are also employees. Example 21-1. SELECT statement with a subquery
We can identify the subquery through the DEPENDENT SUBQUERY tag in the Select type column of the EXPLAIN statement output, as shown here:
The same query can also be rewritten as an EXISTS subquery, as in Example 21-2. Example 21-2. SELECT statement with an EXISTS subquery
Note that the EXPLAIN output for the EXISTS subquery is identical to that of the IN subquery. This is because MySQL rewrites IN-based subqueries as EXISTS-based syntax before execution. The performance of subqueries will, therefore, be the same, regardless of whether you use the EXISTS or the IN operator. 21.1.1. Optimizing SubqueriesWhen MySQL executes a statement that contains a subquery in the WHERE clause, it will execute the subquery once for every row returned by the main or "outer" SQL statement. It therefore follows that the subquery had better execute very efficiently: it is potentially going to be executed many times. The most obvious way to make a subquery run fast is to ensure that it is supported by an index. Ideally, we should create a concatenated index that includes every column referenced within the subquery. For our example query in the previous example, we should create an index on all the employees columns referenced in the subquery:
We can see from the following EXPLAIN output that MySQL makes use of the index to resolve the subquery. The output also includes the Using index clause, indicating that only the index is usedthe most desirable execution plan for a subquery.
Figure 21-1 shows the relative performance of both the EXISTS and IN subqueries Figure 21-1. Subquery performance with and without an indexNot only will an indexed subquery outperform a nonindexed subquery, but the un-indexed subquery will also degrade exponentially as the number of rows in each of the tables increases. (The response time will actually be proportional to the number of rows returned by the outer query times the number of rows accessed in the subquery.) Figure 21-2 shows this exponential degradation.
21.1.2. Rewriting a Subquery as a JoinMany subqueries can be rewritten as joins. For instance, our example subquery could have been expressed as a join, as shown in Example 21-3. Figure 21-2. Exponential degradation in nonindexed subqueriesExample 21-3. Subquery rewritten as a join
Subqueries sometimes result in queries that are easier to understand, and when the subquery is indexed, the performance of both types of subqueries and the join is virtually identical, although, as described in the previous section, EXISTS has a small advantage over IN. Figure 21-3 compares the three solutions for various sizes of tables. Figure 21-3. IN, EXISTS, and JOIN solution scalability (indexed query)However, when no index exists to support the subquery or the join, then the join will outperform both IN and EXISTS subqueries. It will also degrade less rapidly as the number of rows to be processed increases. This is because of the MySQL join optimizations. Figure 21-4 shows the performance characteristics of the three solutions where no index exists. Figure 21-4. Comparison of nonindexed JOIN, IN, and EXISTS performance
21.1.3. Using Subqueries in Complex JoinsAlthough a subquery, in general, will not outperform an equivalent join, there are occasions when you can use subqueries to obtain more favorable execution plans for complex joins Let's look at an example. You have an application that from time to time is asked to report on the quantity of sales made to a particular customer by a particular sales rep. The SQL might look like Example 21-4. Example 21-4. Complex join SQL
We already have an index on the primary key columns for customers, employees, and products, so MySQL uses these indexes to join the appropriate rows from these tables to the sales table. In the process, it eliminates all of the rows except those that match the WHERE clause condition:
This turns out to be a fairly expensive query, because we have to perform a full scan of the large sales table. What we probably want to do is to retrieve the appropriate primary keys from products, customers, and employees using the WHERE clause conditions, and then look up those keys (quickly) in the sales table. To allow us to quickly find these primary keys, we would create the following indexes:
To enable a rapid sales table lookup, we would create the following index:
Once we do this, our execution plan looks like this:
Each step is now based on an index lookup, and the sales lookup is optimized through a fast concatenated index. The execution time reduces from about 25 seconds (almost half a minute) to about 0.01 second (almost instantaneous).
As we noted in the previous chapter, we can't always create all of the concatenated indexes that we might need to support all possible queries on a table. In this case, we may want to perform an "index merge" of multiple single-column indexes. However, MySQL will not normally perform an index merge when optimizing a join. In this case, to get an index merge join, we can try to rewrite the join using subqueries, as shown in Example 21-5. Example 21-5. Complex join SQL rewritten to support index merge
The EXPLAIN output shows that an index merge will now occur, as shown in Example 21-6. Example 21-6. EXPLAIN output for an index merge SQL
The performance of the index merge solution is about 0.025 secondslower than the concatenated index but still about 1,000 times faster than the initial join performance. This is an especially useful technique if you have a STAR schema (one very large table that contains the "facts," with foreign keys pointing to other, smaller "dimension" tables). Figure 21-5 compares the performance of the three approaches. Although an index merge is not quite as efficient as a concatenated index, you can often satisfy a wider range of queries using an index merge, since this way you need only create indexes on each column, not concatenated indexes on every possible combination of columns.
Figure 21-5. Optimizing a complex join with subqueries and index merge |
Tuesday, October 27, 2009
Section 21.1. Tuning Subqueries
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment