Tuesday, October 27, 2009

Section 21.1. Tuning Subqueries










21.1. Tuning Subqueries









A subquery is a SQL statement that is embedded within the WHERE clause of another statement. For instance, Example 21-1 uses a subquery to determine the number of customers who are also employees.


Example 21-1. SELECT statement with a subquery




SELECT COUNT(*)
FROM customers
WHERE (contact_surname, contact_firstname,date_of_birth)
IN (select surname,firstname,date_of_birth
FROM employees)




We can identify the subquery through the DEPENDENT SUBQUERY tag in the Select type column of the EXPLAIN statement output, as shown here:



Explain plan
------------

ID=1 Table=customers Select type=PRIMARY Access type=ALL
Rows=100459
Key= (Possible= )
Ref= Extra=Using where
ID=2 Table=employees Select type=DEPENDENT SUBQUERY Access type=ALL
Rows=1889
Key= (Possible= )
Ref= Extra=Using where



The same query can also be rewritten as an EXISTS subquery, as in Example 21-2.


Example 21-2. SELECT statement with an EXISTS subquery




SELECT count(*)
FROM customers
WHERE EXISTS (SELECT 'anything'
FROM employees
where surname=customers.contact_surname
AND firstname=customers.contact_firstname
AND date_of_birth=customers.date_of_birth)

Short Explain
-------------
1 PRIMARY select(ALL) on customers using no key
Using where
2 DEPENDENT SUBQUERY select(ALL) on employees using no key




Note that the EXPLAIN output for the EXISTS subquery is identical to that of the IN subquery. This is because MySQL rewrites IN-based subqueries as EXISTS-based syntax before execution. The performance of subqueries will, therefore, be the same, regardless of whether you use the EXISTS or the IN operator.



21.1.1. Optimizing Subqueries




When MySQL executes a statement that contains a subquery in the WHERE clause, it will execute the subquery once for every row returned by the main or "outer" SQL statement. It therefore follows that the subquery had better execute very efficiently: it is potentially going to be executed many times. The most obvious way to make a subquery run fast is to ensure that it is supported by an index. Ideally, we should create a concatenated index that includes every column referenced within the subquery.


For our example query in the previous example, we should create an index on all the employees columns referenced in the subquery:



CREATE INDEX i_customers_name ON customers
(contact_surname, contact_firstname, date_of_birth)



We can see from the following EXPLAIN output that MySQL makes use of the index to resolve the subquery. The output also includes the Using index clause, indicating that only the index is usedthe most desirable execution plan for a subquery.



Short Explain
-------------
1 PRIMARY select(ALL) on employees using no key
Using where
2 DEPENDENT SUBQUERY select(index_subquery) on customers
using i_customers_name
Using index; Using where



Figure 21-1 shows the relative performance of both the EXISTS and IN subqueries
with and without an index.



Figure 21-1. Subquery performance with and without an index



Not only will an indexed subquery outperform a nonindexed subquery, but the un-indexed subquery will also degrade exponentially as the number of rows in each of the tables increases. (The response time will actually be proportional to the number of rows returned by the outer query times the number of rows accessed in the subquery.) Figure 21-2 shows this exponential degradation.


Subqueries should be optimized by creating an index on all of the columns referenced in the subquery. SQL statements containing subqueries that are not supported by an index can show exponential degradation as table row counts increase.





21.1.2. Rewriting a Subquery as a Join






Many subqueries can be rewritten as joins. For instance, our example subquery could have been expressed as a join, as shown in Example 21-3.



Figure 21-2. Exponential degradation in nonindexed subqueries



Example 21-3. Subquery rewritten as a join




SELECT count(*)
FROM customers JOIN employees
ON (employees.surname=customers.contact_surname
AND employees.firstname=customers.contact_firstname
AND employees.date_of_birth=customers.date_of_birth)




Subqueries sometimes result in queries that are easier to understand, and when the subquery is indexed, the performance of both types of subqueries and the join is virtually identical, although, as described in the previous section, EXISTS has a small advantage over IN. Figure 21-3 compares the three solutions for various sizes of tables.



Figure 21-3. IN, EXISTS, and JOIN solution scalability (indexed query)



However, when no index exists to support the subquery or the join, then the join will outperform both IN and EXISTS subqueries. It will also degrade less rapidly as the number of rows to be processed increases. This is because of the MySQL join optimizations. Figure 21-4 shows the performance characteristics of the three solutions where no index exists.



Figure 21-4. Comparison of nonindexed JOIN, IN, and EXISTS performance



A join will usually outperform an equivalent SQL with a subqueryand will show superior scalabilityif there is no index to support either the join or the subquery. If there are supporting indexes, the performance differences among the three solutions are negligible.





21.1.3. Using Subqueries in Complex Joins




Although a subquery, in general, will not outperform an equivalent join, there are occasions when you can use subqueries to obtain more favorable execution plans for complex joins
especially when index merge operations are concerned.


Let's look at an example. You have an application that from time to time is asked to report on the quantity of sales made to a particular customer by a particular sales rep. The SQL might look like Example 21-4.


Example 21-4. Complex join SQL




SELECT COUNT(*), SUM(sales.quantity), SUM(sales.sale_value)
FROM sales
JOIN customers ON (sales.customer_id=customers.customer_id)
JOIN employees ON (sales.sales_rep_id=employees.employee_id)
JOIN products ON (sales.product_id=products.product_id)
WHERE customers.customer_name='INVITRO INTERNATIONAL'
AND employees.surname='GRIGSBY'
AND employees.firstname='RAY'
AND products.product_description='SLX';




We already have an index on the primary key columns for customers, employees, and products, so MySQL uses these indexes to join the appropriate rows from these tables to the sales table. In the process, it eliminates all of the rows except those that match the WHERE clause condition:



Short Explain
-------------
1 SIMPLE select(ALL) on sales using no key

1 SIMPLE select(eq_ref) on employees using PRIMARY
Using where
1 SIMPLE select(eq_ref) on customers using PRIMARY
Using where
1 SIMPLE select(eq_ref) on products using PRIMARY
Using where



This turns out to be a fairly expensive query, because we have to perform a full scan of the large sales table. What we probably want to do is to retrieve the appropriate primary keys from products, customers, and employees using the WHERE clause conditions, and then look up those keys (quickly) in the sales table. To allow us to quickly find these primary keys, we would create the following indexes:



CREATE INDEX i_customer_name ON customers(customer_name);
CREATE INDEX i_product_description ON products(product_description);
CREATE INDEX i_employee_name ON employees(surname, firstname);



To enable a rapid sales table lookup, we would create the following index:



CREATE INDEX i_sales_cust_prod_rep ON sales(customer_id,product_id,sales_rep_id);



Once we do this, our execution plan looks like this:



Short Explain
-------------
1 SIMPLE select(ref) on customers using i_customer_name
Using where; Using index
1 SIMPLE select(ref) on employees using i_employee_name
Using where; Using index
1 SIMPLE select(ref) on products using i_product_description
Using where; Using index
1 SIMPLE select(ref) on sales using i_sales_cust_prod_rep
Using where



Each step is now based on an index lookup, and the sales lookup is optimized through a fast concatenated index. The execution time reduces from about 25 seconds (almost half a minute) to about 0.01 second (almost instantaneous).


To optimize a join, create indexes to support all of the conditions in the WHERE clause and create concatenated indexes to support all of the join conditions.



As we noted in the previous chapter, we can't always create all of the concatenated indexes that we might need to support all possible queries on a table. In this case, we may want to perform an "index merge" of multiple single-column indexes. However, MySQL will not normally perform an index merge when optimizing a join.


In this case, to get an index merge join, we can try to rewrite the join using subqueries, as shown in Example 21-5.


Example 21-5. Complex join SQL rewritten to support index merge




SELECT COUNT(*), SUM(sales.quantity), SUM(sales.sale_value)
FROM sales
WHERE product_id= (SELECT product_id
FROM products
WHERE product_description='SLX')
AND sales_rep_id=(SELECT employee_id
FROM employees
WHERE surname='GRIGSBY'
AND firstname='RAY')
AND customer_id= (SELECT customer_id
FROM customers
WHERE customer_name='INVITRO INTERNATIONAL');




The EXPLAIN output shows that an index merge will now occur, as shown in Example 21-6.


Example 21-6. EXPLAIN output for an index merge SQL




Short Explain
-------------
1 PRIMARY select(index_merge) on sales using i_sales_rep,i_sales_cust
Using intersect(i_sales_rep,i_sales_cust); Using where
4 SUBQUERY select(ref) on customers using i_customer_name

3 SUBQUERY select(ref) on employees using i_employee_name

2 SUBQUERY select(ref) on products using i_product_description




The performance of the index merge solution is about 0.025 secondslower than the concatenated index but still about 1,000 times faster than the initial join performance. This is an especially useful technique if you have a STAR schema (one very large table that contains the "facts," with foreign keys pointing to other, smaller "dimension" tables).


Figure 21-5 compares the performance of the three approaches. Although an index merge is not quite as efficient as a concatenated index, you can often satisfy a wider range of queries using an index merge, since this way you need only create indexes on each column, not concatenated indexes on every possible combination of columns.


Rewriting a join with subqueries can improve join performance, especially if you need to perform an index merge joinconsider this technique for STAR joins.




Figure 21-5. Optimizing a complex join with subqueries and index merge














No comments:

Post a Comment