8.1 Writing Inner Joins
A join that identifies combinations of rows from two tables is called an inner join. Inner joins may be written using two different syntaxes. One syntax lists the tables to be joined separated by a comma. The other uses the INNER JOIN keywords.
8.1.1 Writing Inner Joins with the Comma Operator
A simple question you might ask about the information in the world database is, "What languages are spoken in each country?" That question has a trivial answer if you don't mind listing countries by code. Just select the information from the CountryLanguage table. Two of its columns list three-letter country codes and language names:
mysql> SELECT CountryCode, Language FROM CountryLanguage;
+-------------+------------------+
| CountryCode | Language |
+-------------+------------------+
| ABW | Dutch |
| ABW | English |
| ABW | Papiamento |
| ABW | Spanish |
| AFG | Balochi |
| AFG | Dari |
| AFG | Pashto |
| AFG | Turkmenian |
| AFG | Uzbek |
| AGO | Ambo |
| AGO | Chokwe |
| AGO | Kongo |
| AGO | Luchazi |
| AGO | Luimbe-nganguela |
| AGO | Luvale |
...
That result would be more meaningful and easier to understand if it displayed countries identified by full name. However, that cannot be done using just the CountryLanguage table, which contains country codes and not names. Country names are available in the world database, but they're stored in the Country table that contains both the three-letter codes and the names:
mysql> SELECT Code, Name FROM Country;
+------+----------------------+
| Code | Name |
+------+----------------------+
| AFG | Afghanistan |
| NLD | Netherlands |
| ANT | Netherlands Antilles |
| ALB | Albania |
| DZA | Algeria |
| ASM | American Samoa |
| AND | Andorra |
| AGO | Angola |
| AIA | Anguilla |
| ATG | Antigua and Barbuda |
| ARE | United Arab Emirates |
| ARG | Argentina |
| ARM | Armenia |
| ABW | Aruba |
| AUS | Australia |
...
A query to display languages and full country names can be written as a join that matches the country codes in the CountryLanguage table with those in the Country table. To do that, modify the CountryLanguage query in the following ways:
Change the FROM clause to name both the CountryLanguage and Country tables. This tells MySQL that it must consult multiple tables to process the query. Add a WHERE clause that indicates how to match records in the two tables. A join has the potential to generate all combinations of rows from the two tables; the WHERE clause tells MySQL which of these combinations you're interested in. To choose the proper matches, use the country code values that are common to the two tables. That is, match CountryCode values in the CountryLanguage table with Code values in the Country table. Change the output column list to display the Name column from the Country table rather than the CountryCode column from the CountryLanguage table.
The query that results from these changes is as follows:
mysql> SELECT Name, Language FROM CountryLanguage, Country
-> WHERE CountryCode = Code;
+----------------------+------------+
| Name | Language |
+----------------------+------------+
| Afghanistan | Balochi |
| Afghanistan | Dari |
| Afghanistan | Pashto |
| Afghanistan | Turkmenian |
| Afghanistan | Uzbek |
| Netherlands | Arabic |
| Netherlands | Dutch |
| Netherlands | Fries |
| Netherlands | Turkish |
| Netherlands Antilles | Dutch |
| Netherlands Antilles | English |
| Netherlands Antilles | Papiamento |
| Albania | Albaniana |
| Albania | Greek |
| Albania | Macedonian |
...
Essentially what this query does is treat Country as a lookup table. For any given country code in the CountryLanguage table, the query uses that code to find the corresponding row in the Country table and retrieves the country name from that row.
Note several things about this query and the result that it produces:
For an inner join, the order in which the FROM clause names the tables doesn't matter. Either of these FROM clauses would work:
FROM CountryLanguage, Country
FROM Country, CountryLanguage
The output column list of the join displays one column from each table: Name from Country and Language from CountryLanguage. The list can name any columns that are appropriate for your purposes, from any of the joined tables. Suppose that you want to show both country code and name, as well as the continent in which each country is located. The following query does that by adding two columns to the output column list:
mysql> SELECT Code, Name, Continent, Language
-> FROM CountryLanguage, Country
-> WHERE CountryCode = Code;
+------+----------------------+---------------+------------+
| Code | Name | Continent | Language |
+------+----------------------+---------------+------------+
| AFG | Afghanistan | Asia | Balochi |
| AFG | Afghanistan | Asia | Dari |
| AFG | Afghanistan | Asia | Pashto |
| AFG | Afghanistan | Asia | Turkmenian |
| AFG | Afghanistan | Asia | Uzbek |
| NLD | Netherlands | Europe | Arabic |
| NLD | Netherlands | Europe | Dutch |
| NLD | Netherlands | Europe | Fries |
| NLD | Netherlands | Europe | Turkish |
| ANT | Netherlands Antilles | North America | Dutch |
| ANT | Netherlands Antilles | North America | English |
| ANT | Netherlands Antilles | North America | Papiamento |
| ALB | Albania | Europe | Albaniana |
| ALB | Albania | Europe | Greek |
| ALB | Albania | Europe | Macedonian |
...
Or suppose that you want to display each language together with the percentage of people who speak it. Select the Percentage column from the CountryLanguage table:
mysql> SELECT Name, Language, Percentage FROM CountryLanguage, Country
-> WHERE CountryCode = Code;
+----------------------+----------------+------------+
| Name | Language | Percentage |
+----------------------+----------------+------------+
| Afghanistan | Pashto | 52.4 |
| Netherlands | Dutch | 95.6 |
| Netherlands Antilles | Papiamento | 86.2 |
| Albania | Albaniana | 97.9 |
| Algeria | Arabic | 86.0 |
| American Samoa | Samoan | 90.6 |
| Andorra | Spanish | 44.6 |
| Angola | Ovimbundu | 37.2 |
| Anguilla | English | 0.0 |
| Antigua and Barbuda | Creole English | 95.7 |
| United Arab Emirates | Arabic | 42.0 |
| Argentina | Spanish | 96.8 |
| Armenia | Armenian | 93.4 |
| Aruba | Papiamento | 76.7 |
| Australia | English | 81.2 |
...
As with any other SELECT, the output from a join does not appear in any particular order by default. To sort the results, add an ORDER BY clause. Output from the preceding query would be more easily understood with the rows sorted by country name and language percentage. Then for a country in which multiple languages are spoken, you can see which languages are most prevalent:
mysql> SELECT Name, Language, Percentage FROM CountryLanguage, Country
-> WHERE CountryCode = Code ORDER BY Name, Percentage;
+----------------+------------+------------+
| Name | Language | Percentage |
+----------------+------------+------------+
| Afghanistan | Balochi | 0.9 |
| Afghanistan | Turkmenian | 1.9 |
| Afghanistan | Uzbek | 8.8 |
| Afghanistan | Dari | 32.1 |
| Afghanistan | Pashto | 52.4 |
| Albania | Macedonian | 0.1 |
| Albania | Greek | 1.8 |
| Albania | Albaniana | 97.9 |
| Algeria | Berberi | 14.0 |
| Algeria | Arabic | 86.0 |
| American Samoa | English | 3.1 |
| American Samoa | Tongan | 3.1 |
| American Samoa | Samoan | 90.6 |
| Andorra | French | 6.2 |
| Andorra | Portuguese | 10.8 |
...
Syntactically, the WHERE clause in a join is optional. However, it's usually necessary in practice to include a WHERE clause to keep the join from producing output far in excess of what you really want to see and to make sure that the output contains only information that's meaningful for the question you're asking.
A join can produce every combination of rows from the two tables, which is in fact what you'll get from an unrestricted join that includes no WHERE clause. This is called a Cartesian product, and the number of rows in the result is the product of the number of rows in the individual tables. For example, the Country and CountryLanguage tables contain approximately 240 and 1,000 rows, respectively, so a Cartesian product between them produces about 240,000 rows. But much of such output is irrelevant because most of the combinations aren't meaningful.
The following query shows what happens if you join records in the CountryLanguage and Country tables without a WHERE clause. The query displays the code from both tables to show that even nonmatching combinations are produced by an unrestricted join:
mysql> SELECT Code, Name, CountryCode, Language
-> FROM CountryLanguage, Country;
+------+-------------+-------------+------------------+
| Code | Name | CountryCode | Language |
+------+-------------+-------------+------------------+
| AFG | Afghanistan | ABW | Dutch |
| AFG | Afghanistan | ABW | English |
| AFG | Afghanistan | ABW | Papiamento |
| AFG | Afghanistan | ABW | Spanish |
| AFG | Afghanistan | AFG | Balochi |
| AFG | Afghanistan | AFG | Dari |
| AFG | Afghanistan | AFG | Pashto |
| AFG | Afghanistan | AFG | Turkmenian |
| AFG | Afghanistan | AFG | Uzbek |
| AFG | Afghanistan | AGO | Ambo |
| AFG | Afghanistan | AGO | Chokwe |
| AFG | Afghanistan | AGO | Kongo |
| AFG | Afghanistan | AGO | Luchazi |
| AFG | Afghanistan | AGO | Luimbe-nganguela |
| AFG | Afghanistan | AGO | Luvale |
...
If you're using the mysql client program and want to guard against the possibility of generating huge result sets due to forgetting a WHERE clause, invoke the program with the --safe-updates option (which, despite its name, also affects output from joins). See section 3.2.8, "Using the --safe-updates Option," for more information.
The WHERE clause for a join specifies how to match records in the joined tables and eliminates noncorresponding combinations of rows from the output. The WHERE clause also can include additional conditions to further restrict the output and answer more specific questions. Here are some examples:
In which countries is the Swedish language spoken? To answer this, include a condition that identifies the language you want to know about:
mysql> SELECT Name, Language FROM CountryLanguage, Country
-> WHERE CountryCode = Code AND Language = 'Swedish';
+---------+----------+
| Name | Language |
+---------+----------+
| Norway | Swedish |
| Sweden | Swedish |
| Finland | Swedish |
| Denmark | Swedish |
+---------+----------+
What languages are spoken in the country of Sweden? This question is the complement of the previous one, and can be answered by using a condition that identifies the country of interest rather than the language:
mysql> SELECT Name, Language FROM CountryLanguage, Country
-> WHERE CountryCode = Code AND Name = 'Sweden';
+--------+---------------------------+
| Name | Language |
+--------+---------------------------+
| Sweden | Arabic |
| Sweden | Finnish |
| Sweden | Norwegian |
| Sweden | Southern Slavic Languages |
| Sweden | Spanish |
| Sweden | Swedish |
+--------+---------------------------+
Joins can use any of the constructs allowed for single-table SELECT statements. The following join uses the COUNT() function and a GROUP BY clause to summarize the number of languages spoken per country, and a HAVING clause to restrict the output to include only those countries where more than 10 languages are spoken:
mysql> SELECT COUNT(*), Name
-> FROM CountryLanguage, Country
-> WHERE CountryCode = Code
-> GROUP BY Name
-> HAVING COUNT(*) > 10;
+----------+--------------------+
| COUNT(*) | Name |
+----------+--------------------+
| 12 | Canada |
| 12 | China |
| 12 | India |
| 12 | Russian Federation |
| 11 | South Africa |
| 11 | Tanzania |
| 12 | United States |
+----------+--------------------+
8.1.2 Writing Inner Joins with INNER JOIN
As described in the previous section, one form of inner join syntax uses the comma operator to name the joined tables. Another inner join syntax uses the INNER JOIN keywords. With this syntax, those keywords replace the comma operator between table names in the FROM clause. Also, with INNER JOIN, the conditions that indicate how to perform record matching for the tables move from the WHERE clause to become part of the FROM clause.
There are two syntaxes for specifying matching conditions with INNER JOIN queries:
Add ON and an expression that states the required relationship between tables. Suppose that a join performs a country code match between the CountryLanguage and CountryLanguage tables. With the comma operator, you write the join as follows:
SELECT Name, Language
FROM CountryLanguage, Country WHERE CountryCode = Code;
With INNER JOIN and ON, write the query like this instead:
SELECT Name, Language
FROM CountryLanguage INNER JOIN Country ON CountryCode = Code;
If the name of the joined column is the same in both tables, you can add USING() rather than ON after the table names, and list the name within the parentheses. For example, if the country code column happened to be named Code in both tables, you could write the query like this:
SELECT Name, Language
FROM CountryLanguage INNER JOIN Country USING(Code);
If you're joining the tables using more than one pair of like-named columns, list the column names within the parentheses of the USING() clause separated by commas.
|
No comments:
Post a Comment