Friday, November 13, 2009

12.5 Finding Rows in One Table That Match Rows in Another




I l@ve RuBoard










12.5 Finding Rows in One Table That Match Rows in Another




12.5.1 Problem



You want to use rows in one
table to locate rows in another table.





12.5.2 Solution



Use a join with an appropriate
WHERE clause to match up records from
different tables.





12.5.3 Discussion



The records in the shirt, tie,
and pants tables from Recipe 12.2
have no special relationship to each other, so no combination of rows
is more meaningful than any other. That's okay,
because the purpose of the examples that use those tables is to
illustrate how to perform a join, not why you'd do
so.



The "why" is that joins allow you
to combine information from multiple tables when each table contains
only part of the information in which you're
interested. Output rows from a join are more complete than rows from
either table by itself. This kind of operation often is based on
matching rows in one table to rows in another, which requires that
each table have one or more columns of common information that can be
used to link them together logically.



To illustrate, suppose you're starting an art
collection, using the following two tables to record your
acquisitions. artist lists those painters whose
works you want to collect, and painting lists each
painting that you've purchased:



CREATE TABLE artist
(
a_id INT UNSIGNED NOT NULL AUTO_INCREMENT, # artist ID
name VARCHAR(30) NOT NULL, # artist name
PRIMARY KEY (a_id),
UNIQUE (name)
);

CREATE TABLE painting
(
a_id INT UNSIGNED NOT NULL, # artist ID
p_id INT UNSIGNED NOT NULL AUTO_INCREMENT, # painting ID
title VARCHAR(100) NOT NULL, # title of painting
state VARCHAR(2) NOT NULL, # state where purchased
price INT UNSIGNED, # purchase price (dollars)
INDEX (a_id),
PRIMARY KEY (p_id)
);


You've just begun the collection, so the tables
contain only the following records:



mysql> SELECT * FROM artist ORDER BY a_id;
+------+----------+
| a_id | name |
+------+----------+
| 1 | Da Vinci |
| 2 | Monet |
| 3 | Van Gogh |
| 4 | Picasso |
| 5 | Renoir |
+------+----------+
mysql> SELECT * FROM painting ORDER BY a_id, p_id;
+------+------+-------------------+-------+-------+
| a_id | p_id | title | state | price |
+------+------+-------------------+-------+-------+
| 1 | 1 | The Last Supper | IN | 34 |
| 1 | 2 | The Mona Lisa | MI | 87 |
| 3 | 3 | Starry Night | KY | 48 |
| 3 | 4 | The Potato Eaters | KY | 67 |
| 3 | 5 | The Rocks | IA | 33 |
| 5 | 6 | Les Deux Soeurs | NE | 64 |
+------+------+-------------------+-------+-------+


The low values in the price column of the
painting table betray the fact that your
collection actually contains only cheap facsimiles, not the
originals. Well, that's all right�who can
afford the originals?



Each table contains partial information about your collection. For
example, the artist table doesn't
tell you which paintings each artist produced, and the
painting table lists artist IDs but not their
names. To answer certain kinds of questions, you must combine the two
tables, and do so in a way that matches up records properly. The
"matching up" part is a matter of
writing an appropriate WHERE clause. In Recipe 12.2, I mentioned that performing a full join
generally is a bad idea because of the amount of output produced.
Another reason not to perform a full join is that the result may be
meaningless. The following full join between the
artist and painting tables
makes this clear. It includes no WHERE clause, and
thus produces output that conveys no useful information:



mysql> SELECT * FROM artist, painting;
+------+----------+------+------+-------------------+-------+-------+
| a_id | name | a_id | p_id | title | state | price |
+------+----------+------+------+-------------------+-------+-------+
| 1 | Da Vinci | 1 | 1 | The Last Supper | IN | 34 |
| 2 | Monet | 1 | 1 | The Last Supper | IN | 34 |
| 3 | Van Gogh | 1 | 1 | The Last Supper | IN | 34 |
| 4 | Picasso | 1 | 1 | The Last Supper | IN | 34 |
| 5 | Renoir | 1 | 1 | The Last Supper | IN | 34 |
| 1 | Da Vinci | 1 | 2 | The Mona Lisa | MI | 87 |
| 2 | Monet | 1 | 2 | The Mona Lisa | MI | 87 |
| 3 | Van Gogh | 1 | 2 | The Mona Lisa | MI | 87 |
| 4 | Picasso | 1 | 2 | The Mona Lisa | MI | 87 |
| 5 | Renoir | 1 | 2 | The Mona Lisa | MI | 87 |
| 1 | Da Vinci | 3 | 3 | Starry Night | KY | 48 |
| 2 | Monet | 3 | 3 | Starry Night | KY | 48 |
| 3 | Van Gogh | 3 | 3 | Starry Night | KY | 48 |
| 4 | Picasso | 3 | 3 | Starry Night | KY | 48 |
| 5 | Renoir | 3 | 3 | Starry Night | KY | 48 |
| 1 | Da Vinci | 3 | 4 | The Potato Eaters | KY | 67 |
| 2 | Monet | 3 | 4 | The Potato Eaters | KY | 67 |
| 3 | Van Gogh | 3 | 4 | The Potato Eaters | KY | 67 |
| 4 | Picasso | 3 | 4 | The Potato Eaters | KY | 67 |
| 5 | Renoir | 3 | 4 | The Potato Eaters | KY | 67 |
| 1 | Da Vinci | 3 | 5 | The Rocks | IA | 33 |
| 2 | Monet | 3 | 5 | The Rocks | IA | 33 |
| 3 | Van Gogh | 3 | 5 | The Rocks | IA | 33 |
| 4 | Picasso | 3 | 5 | The Rocks | IA | 33 |
| 5 | Renoir | 3 | 5 | The Rocks | IA | 33 |
| 1 | Da Vinci | 5 | 6 | Les Deux Soeurs | NE | 64 |
| 2 | Monet | 5 | 6 | Les Deux Soeurs | NE | 64 |
| 3 | Van Gogh | 5 | 6 | Les Deux Soeurs | NE | 64 |
| 4 | Picasso | 5 | 6 | Les Deux Soeurs | NE | 64 |
| 5 | Renoir | 5 | 6 | Les Deux Soeurs | NE | 64 |
+------+----------+------+------+-------------------+-------+-------+


Clearly, you're not maintaining these tables to
match up each artist with each painting, which is what the preceding
query does. An unrestricted join in this case produces nothing more
than a lot of output with no value, so a WHERE
clause is essential to give the query meaning. For example, to
produce a list of paintings together with the artist names, you can
associate records from the two tables using a simple
WHERE clause that matches up values in the artist
ID column that is common to both tables and that serves as the link
between them:



mysql> SELECT * FROM artist, painting
-> WHERE artist.a_id = painting.a_id;
+------+----------+------+------+-------------------+-------+-------+
| a_id | name | a_id | p_id | title | state | price |
+------+----------+------+------+-------------------+-------+-------+
| 1 | Da Vinci | 1 | 1 | The Last Supper | IN | 34 |
| 1 | Da Vinci | 1 | 2 | The Mona Lisa | MI | 87 |
| 3 | Van Gogh | 3 | 3 | Starry Night | KY | 48 |
| 3 | Van Gogh | 3 | 4 | The Potato Eaters | KY | 67 |
| 3 | Van Gogh | 3 | 5 | The Rocks | IA | 33 |
| 5 | Renoir | 5 | 6 | Les Deux Soeurs | NE | 64 |
+------+----------+------+------+-------------------+-------+-------+


The column names in the WHERE clause include table
qualifiers to make it clear which a_id values to
compare. The output indicates who painted each painting, and,
conversely, which paintings by each artist are in your collection.
However, the output is perhaps overly verbose. (It includes two
identical a_id columns, for example; one comes
from the artist table, the other from the
painting table.) You may want to see the
a_id values only once. Or you may not want to see
any ID columns at all. To exclude them, provide a column output list
that names specifically only those columns in which
you're interested:



mysql> SELECT artist.name, painting.title, painting.state, painting.price
-> FROM artist, painting
-> WHERE artist.a_id = painting.a_id;
+----------+-------------------+-------+-------+
| name | title | state | price |
+----------+-------------------+-------+-------+
| Da Vinci | The Last Supper | IN | 34 |
| Da Vinci | The Mona Lisa | MI | 87 |
| Van Gogh | Starry Night | KY | 48 |
| Van Gogh | The Potato Eaters | KY | 67 |
| Van Gogh | The Rocks | IA | 33 |
| Renoir | Les Deux Soeurs | NE | 64 |
+----------+-------------------+-------+-------+


By adding other conditions to the WHERE clause,
you can use row-matching queries to answer more specific questions,
such as the following:




  • Which paintings did Van Gogh paint? To answer this question, identify
    the record from the artist table that corresponds
    to the artist name, use its a_id value to find
    matching records in the painting table, and select
    the title from those records:

    mysql> SELECT painting.title
    -> FROM artist, painting
    -> WHERE artist.name = 'Van Gogh' AND artist.a_id = painting.a_id;
    +-------------------+
    | title |
    +-------------------+
    | Starry Night |
    | The Potato Eaters |
    | The Rocks |
    +-------------------+

  • Who painted "The Mona Lisa"? To
    find out, go in the other direction, using information in the
    painting table to find information in the
    artist table:

    mysql> SELECT artist.name
    -> FROM artist, painting
    -> WHERE painting.title = 'The Mona Lisa' AND painting.a_id = artist.a_id;
    +----------+
    | name |
    +----------+
    | Da Vinci |
    +----------+

  • Which artists' paintings did you purchase in
    Kentucky or Indiana? This is somewhat similar to the last query, but
    tests a different column in the painting table to
    find the initial set of records to be joined with the
    artist table:

    mysql> SELECT DISTINCT artist.name
    -> FROM artist, painting
    -> WHERE painting.state IN ('KY','IN') AND artist.a_id = painting.a_id;
    +----------+
    | name |
    +----------+
    | Da Vinci |
    | Van Gogh |
    +----------+

    The query also uses DISTINCT to display each
    artist name just once. Try it without DISTINCT and
    you'll see that Van Gogh is listed
    twice�that's because you obtained two Van
    Goghs in Kentucky.



  • Joins can also be used with aggregate
    functions to produce
    summaries. For example, to find out how
    many paintings you have per artist, use this query:

    mysql> SELECT artist.name, COUNT(*) AS 'number of paintings'
    -> FROM artist, painting
    -> WHERE artist.a_id = painting.a_id
    -> GROUP BY artist.name;
    +----------+---------------------+
    | name | number of paintings |
    +----------+---------------------+
    | Da Vinci | 2 |
    | Renoir | 1 |
    | Van Gogh | 3 |
    +----------+---------------------+

    A more elaborate query might also show how much you paid for each
    artist's paintings, in total and on average:


    mysql> SELECT artist.name,
    -> COUNT(*) AS 'number of paintings',
    -> SUM(painting.price) AS 'total price',
    -> AVG(painting.price) AS 'average price'
    -> FROM artist, painting WHERE artist.a_id = painting.a_id
    -> GROUP BY artist.name;
    +----------+---------------------+-------------+---------------+
    | name | number of paintings | total price | average price |
    +----------+---------------------+-------------+---------------+
    | Da Vinci | 2 | 121 | 60.5000 |
    | Renoir | 1 | 64 | 64.0000 |
    | Van Gogh | 3 | 148 | 49.3333 |
    +----------+---------------------+-------------+---------------+


Note that the summary queries produce output only for those artists
in the artist table for whom you actually have
acquired paintings. (For example, Monet is listed in the
artist table but is not present in the summary
because you don't have any of his paintings yet.) If
you want the summary to include all artists, even if you have none of
their paintings yet, you must use a different kind of
join�specifically, a LEFT
JOIN. See Recipe 12.6 and
Recipe 12.9.




Joins and Indexes



Because a join can easily cause MySQL to process
large numbers of row combinations, it's a good idea
to make sure that the columns you're comparing are
indexed. Otherwise, performance can drop off quickly as table sizes
increase. For the artist and
painting tables, joins are made based on the
values in the a_id column of each table. If you
look back at the CREATE TABLE
statements that were shown for these tables in Recipe 12.5, you'll see that
a_id is indexed in each table.











    I l@ve RuBoard



    No comments:

    Post a Comment