20.2. Tuning JoinsSo far we have looked at tuning SQL queries against a single table only. Let's move on to tuning SQL queries that join rows from two or more tables. 20.2.1. How MySQL Joins TablesMySQL currently joins 20.2.2. Joins Without IndexesThe basic join algorithm is not very well suited to joining multiple tables unless there are indexes to support the join.[*] Performance might be adequate for very small tables, but as table sizes increase, the join overhead
Figure 20-8 shows how response time increases for nonindexed joins as the size of each table increases. This semi-exponential degradation is extremely undesirable: if we extrapolate the response time curve for larger tables, we predict that it would take 20 minutes to join two tables of 100,000 rows, 20 hours to join two tables with 1 million rows each, and 81 days to join two tables of 10 million rows each! This is definitely not the way you want your applications to perform as your database grows in size. Figure 20-8. Table size versus elapsed time for nonindexed joins20.2.3. Joins with IndexesTo get predictable and acceptable performance for our join, we need to create indexes to support the join. Generally, we will want to create concatenated indexes based on any columns in a table that might be used to join that table to another table. However, we don't need an index on the first (or "driving") table's columns; that is, if we are joining customers to sales, in that order, then our index needs to be on saleswe don't need an index on both tables. Creating an index on the join column not only reduces execution time, but also prevents an exponential increase in response time as the tables grow in size. Figure 20-9 shows how the response time increases as the number of rows increases when there is an index to support the join. Not only is performance much better (about 0.1 second compared to more than 25 seconds for two tables of 20,000 rows), but the increase in response time is far more predictable. Extrapolating the response time for the indexed join, we can predict that joining two tables of 10 million rows each could be achieved in only 40 secondscompared to 81 days for the nonindexed join.[*]
Figure 20-9. Response time versus table size for an indexed join20.2.4. Join OrderBy far, the most important factor in the optimization of MySQL joins is to ensure that each successive join is supported by an index. Beyond that, we should:
Generally, the MySQL optimizer can be relied upon to pick a good join order. However, if we need to change the join order, we can use the STRAIGHT_JOIN hint to ensure that the tables are joined in the order in which they appear in the FROM clause. For instance, the following use of STRAIGHT_JOIN ensures that the join order is from the smallest table (ta_1000) to the largest (ta_5000):
Figure 20-10 shows the difference in elapsed time when joining tables in either ascending or descending order of table size. Joining from smallest to largest is about twice as fast as joining from largest to smallest.
Figure 20-10. Table size and join order20.2.5. A Simple Join ExampleBased on our discussions so far, here is a summary of the most important rules for optimizing MySQL joins:
Let's apply these rules to a simple example. Consider the case in which we are listing all sales for a particular customer. The query looks like this:
With just the primary key indexes in place, the EXPLAIN output looks like this:
This execution plan satisfies our first rule: an index (the primary key customer_id of customers) is used to join sales to customers. However, our second ruleeliminating rows as early as possible in the join sequenceis violated: all of the sales rows are read first, even though only some of those sales (those for a particular customer) are needed. Furthermore, we are joining the larger table sales (2.5 million rows) to the smaller table customers (100,000 rows). So, what we need to achieve is an efficient join from customers to sales. This means indexing the sales.customer_id column so that we can find sales for a particular customer. The following index should do the trick:
The execution plan now looks like this:
This is better, but we could improve matters further if we did not have to do the full scan on customers. Adding the following index will let us obtain the desired customer more efficiently:
Once this is done, the execution plan looks like this:
This is the optimal execution plan for this query. The desired customer is found quickly by the index, and then matching sales for that customer are found using the i_sales_customer index. Figure 20-11 shows the performance improvements gained by our optimizations. Figure 20-11. Optimization of a simple join |
Sunday, October 25, 2009
Section 20.2. Tuning Joins
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment