12.5 Finding Rows in One Table That Match Rows in Another
12.5.1 Problem
You want to use rows in one table to locate rows in another table.
12.5.2 Solution
Use a join with an appropriate WHERE clause to match up records from different tables.
12.5.3 Discussion
The records in the shirt, tie, and pants tables from Recipe 12.2 have no special relationship to each other, so no combination of rows is more meaningful than any other. That's okay, because the purpose of the examples that use those tables is to illustrate how to perform a join, not why you'd do so.
The "why" is that joins allow you to combine information from multiple tables when each table contains only part of the information in which you're interested. Output rows from a join are more complete than rows from either table by itself. This kind of operation often is based on matching rows in one table to rows in another, which requires that each table have one or more columns of common information that can be used to link them together logically.
To illustrate, suppose you're starting an art collection, using the following two tables to record your acquisitions. artist lists those painters whose works you want to collect, and painting lists each painting that you've purchased:
CREATE TABLE artist ( a_id INT UNSIGNED NOT NULL AUTO_INCREMENT, # artist ID name VARCHAR(30) NOT NULL, # artist name PRIMARY KEY (a_id), UNIQUE (name) );
CREATE TABLE painting ( a_id INT UNSIGNED NOT NULL, # artist ID p_id INT UNSIGNED NOT NULL AUTO_INCREMENT, # painting ID title VARCHAR(100) NOT NULL, # title of painting state VARCHAR(2) NOT NULL, # state where purchased price INT UNSIGNED, # purchase price (dollars) INDEX (a_id), PRIMARY KEY (p_id) );
You've just begun the collection, so the tables contain only the following records:
mysql> SELECT * FROM artist ORDER BY a_id; +------+----------+ | a_id | name | +------+----------+ | 1 | Da Vinci | | 2 | Monet | | 3 | Van Gogh | | 4 | Picasso | | 5 | Renoir | +------+----------+ mysql> SELECT * FROM painting ORDER BY a_id, p_id; +------+------+-------------------+-------+-------+ | a_id | p_id | title | state | price | +------+------+-------------------+-------+-------+ | 1 | 1 | The Last Supper | IN | 34 | | 1 | 2 | The Mona Lisa | MI | 87 | | 3 | 3 | Starry Night | KY | 48 | | 3 | 4 | The Potato Eaters | KY | 67 | | 3 | 5 | The Rocks | IA | 33 | | 5 | 6 | Les Deux Soeurs | NE | 64 | +------+------+-------------------+-------+-------+
The low values in the price column of the painting table betray the fact that your collection actually contains only cheap facsimiles, not the originals. Well, that's all right�who can afford the originals?
Each table contains partial information about your collection. For example, the artist table doesn't tell you which paintings each artist produced, and the painting table lists artist IDs but not their names. To answer certain kinds of questions, you must combine the two tables, and do so in a way that matches up records properly. The "matching up" part is a matter of writing an appropriate WHERE clause. In Recipe 12.2, I mentioned that performing a full join generally is a bad idea because of the amount of output produced. Another reason not to perform a full join is that the result may be meaningless. The following full join between the artist and painting tables makes this clear. It includes no WHERE clause, and thus produces output that conveys no useful information:
mysql> SELECT * FROM artist, painting; +------+----------+------+------+-------------------+-------+-------+ | a_id | name | a_id | p_id | title | state | price | +------+----------+------+------+-------------------+-------+-------+ | 1 | Da Vinci | 1 | 1 | The Last Supper | IN | 34 | | 2 | Monet | 1 | 1 | The Last Supper | IN | 34 | | 3 | Van Gogh | 1 | 1 | The Last Supper | IN | 34 | | 4 | Picasso | 1 | 1 | The Last Supper | IN | 34 | | 5 | Renoir | 1 | 1 | The Last Supper | IN | 34 | | 1 | Da Vinci | 1 | 2 | The Mona Lisa | MI | 87 | | 2 | Monet | 1 | 2 | The Mona Lisa | MI | 87 | | 3 | Van Gogh | 1 | 2 | The Mona Lisa | MI | 87 | | 4 | Picasso | 1 | 2 | The Mona Lisa | MI | 87 | | 5 | Renoir | 1 | 2 | The Mona Lisa | MI | 87 | | 1 | Da Vinci | 3 | 3 | Starry Night | KY | 48 | | 2 | Monet | 3 | 3 | Starry Night | KY | 48 | | 3 | Van Gogh | 3 | 3 | Starry Night | KY | 48 | | 4 | Picasso | 3 | 3 | Starry Night | KY | 48 | | 5 | Renoir | 3 | 3 | Starry Night | KY | 48 | | 1 | Da Vinci | 3 | 4 | The Potato Eaters | KY | 67 | | 2 | Monet | 3 | 4 | The Potato Eaters | KY | 67 | | 3 | Van Gogh | 3 | 4 | The Potato Eaters | KY | 67 | | 4 | Picasso | 3 | 4 | The Potato Eaters | KY | 67 | | 5 | Renoir | 3 | 4 | The Potato Eaters | KY | 67 | | 1 | Da Vinci | 3 | 5 | The Rocks | IA | 33 | | 2 | Monet | 3 | 5 | The Rocks | IA | 33 | | 3 | Van Gogh | 3 | 5 | The Rocks | IA | 33 | | 4 | Picasso | 3 | 5 | The Rocks | IA | 33 | | 5 | Renoir | 3 | 5 | The Rocks | IA | 33 | | 1 | Da Vinci | 5 | 6 | Les Deux Soeurs | NE | 64 | | 2 | Monet | 5 | 6 | Les Deux Soeurs | NE | 64 | | 3 | Van Gogh | 5 | 6 | Les Deux Soeurs | NE | 64 | | 4 | Picasso | 5 | 6 | Les Deux Soeurs | NE | 64 | | 5 | Renoir | 5 | 6 | Les Deux Soeurs | NE | 64 | +------+----------+------+------+-------------------+-------+-------+
Clearly, you're not maintaining these tables to match up each artist with each painting, which is what the preceding query does. An unrestricted join in this case produces nothing more than a lot of output with no value, so a WHERE clause is essential to give the query meaning. For example, to produce a list of paintings together with the artist names, you can associate records from the two tables using a simple WHERE clause that matches up values in the artist ID column that is common to both tables and that serves as the link between them:
mysql> SELECT * FROM artist, painting -> WHERE artist.a_id = painting.a_id; +------+----------+------+------+-------------------+-------+-------+ | a_id | name | a_id | p_id | title | state | price | +------+----------+------+------+-------------------+-------+-------+ | 1 | Da Vinci | 1 | 1 | The Last Supper | IN | 34 | | 1 | Da Vinci | 1 | 2 | The Mona Lisa | MI | 87 | | 3 | Van Gogh | 3 | 3 | Starry Night | KY | 48 | | 3 | Van Gogh | 3 | 4 | The Potato Eaters | KY | 67 | | 3 | Van Gogh | 3 | 5 | The Rocks | IA | 33 | | 5 | Renoir | 5 | 6 | Les Deux Soeurs | NE | 64 | +------+----------+------+------+-------------------+-------+-------+
The column names in the WHERE clause include table qualifiers to make it clear which a_id values to compare. The output indicates who painted each painting, and, conversely, which paintings by each artist are in your collection. However, the output is perhaps overly verbose. (It includes two identical a_id columns, for example; one comes from the artist table, the other from the painting table.) You may want to see the a_id values only once. Or you may not want to see any ID columns at all. To exclude them, provide a column output list that names specifically only those columns in which you're interested:
mysql> SELECT artist.name, painting.title, painting.state, painting.price -> FROM artist, painting -> WHERE artist.a_id = painting.a_id; +----------+-------------------+-------+-------+ | name | title | state | price | +----------+-------------------+-------+-------+ | Da Vinci | The Last Supper | IN | 34 | | Da Vinci | The Mona Lisa | MI | 87 | | Van Gogh | Starry Night | KY | 48 | | Van Gogh | The Potato Eaters | KY | 67 | | Van Gogh | The Rocks | IA | 33 | | Renoir | Les Deux Soeurs | NE | 64 | +----------+-------------------+-------+-------+
By adding other conditions to the WHERE clause, you can use row-matching queries to answer more specific questions, such as the following:
Which paintings did Van Gogh paint? To answer this question, identify the record from the artist table that corresponds to the artist name, use its a_id value to find matching records in the painting table, and select the title from those records:
mysql> SELECT painting.title -> FROM artist, painting -> WHERE artist.name = 'Van Gogh' AND artist.a_id = painting.a_id; +-------------------+ | title | +-------------------+ | Starry Night | | The Potato Eaters | | The Rocks | +-------------------+
Who painted "The Mona Lisa"? To find out, go in the other direction, using information in the painting table to find information in the artist table:
mysql> SELECT artist.name -> FROM artist, painting -> WHERE painting.title = 'The Mona Lisa' AND painting.a_id = artist.a_id; +----------+ | name | +----------+ | Da Vinci | +----------+
Which artists' paintings did you purchase in Kentucky or Indiana? This is somewhat similar to the last query, but tests a different column in the painting table to find the initial set of records to be joined with the artist table:
mysql> SELECT DISTINCT artist.name -> FROM artist, painting -> WHERE painting.state IN ('KY','IN') AND artist.a_id = painting.a_id; +----------+ | name | +----------+ | Da Vinci | | Van Gogh | +----------+ The query also uses DISTINCT to display each artist name just once. Try it without DISTINCT and you'll see that Van Gogh is listed twice�that's because you obtained two Van Goghs in Kentucky.
Joins can also be used with aggregate functions to produce summaries. For example, to find out how many paintings you have per artist, use this query:
mysql> SELECT artist.name, COUNT(*) AS 'number of paintings' -> FROM artist, painting -> WHERE artist.a_id = painting.a_id -> GROUP BY artist.name; +----------+---------------------+ | name | number of paintings | +----------+---------------------+ | Da Vinci | 2 | | Renoir | 1 | | Van Gogh | 3 | +----------+---------------------+ A more elaborate query might also show how much you paid for each artist's paintings, in total and on average:
mysql> SELECT artist.name, -> COUNT(*) AS 'number of paintings', -> SUM(painting.price) AS 'total price', -> AVG(painting.price) AS 'average price' -> FROM artist, painting WHERE artist.a_id = painting.a_id -> GROUP BY artist.name; +----------+---------------------+-------------+---------------+ | name | number of paintings | total price | average price | +----------+---------------------+-------------+---------------+ | Da Vinci | 2 | 121 | 60.5000 | | Renoir | 1 | 64 | 64.0000 | | Van Gogh | 3 | 148 | 49.3333 | +----------+---------------------+-------------+---------------+
Note that the summary queries produce output only for those artists in the artist table for whom you actually have acquired paintings. (For example, Monet is listed in the artist table but is not present in the summary because you don't have any of his paintings yet.) If you want the summary to include all artists, even if you have none of their paintings yet, you must use a different kind of join�specifically, a LEFT JOIN. See Recipe 12.6 and Recipe 12.9.
Because a join can easily cause MySQL to process large numbers of row combinations, it's a good idea to make sure that the columns you're comparing are indexed. Otherwise, performance can drop off quickly as table sizes increase. For the artist and painting tables, joins are made based on the values in the a_id column of each table. If you look back at the CREATE TABLE statements that were shown for these tables in Recipe 12.5, you'll see that a_id is indexed in each table.
|
|
No comments:
Post a Comment