Friday, October 30, 2009

5.6 Using DISTINCT to Eliminate Duplicates











 < Day Day Up > 





5.6 Using DISTINCT to Eliminate Duplicates



If a query returns a result that contains duplicate rows, you can remove duplicates and produce a result set in which every row is unique. To do this, include the keyword DISTINCT after SELECT and before the output column list.



Suppose that a query returns a result set that contains duplicated rows:










mysql> SELECT last_name FROM t;

+-----------+

| last_name |

+-----------+

| Brown |

| Larsson |

| Brown |

| Larsson |

+-----------+




Adding DISTINCT removes the duplicates and returns only unique rows:










mysql> SELECT DISTINCT last_name FROM t;

+-----------+

| last_name |

+-----------+

| Brown |

| Larsson |

+-----------+




Duplicate elimination for string values happens differently for binary and nonbinary strings. The strings 'ABC', 'Abc', and 'abc' are considered distinct if they're binary strings, but the same if they are nonbinary. (Stated another way, strings that vary only in lettercase are considered the same unless they're binary.)



Distinctiveness of rows is assessed taking NULL values into account. Suppose that a table t contains the following rows:










mysql> SELECT i, j FROM t;

+------+------+

| i | j |

+------+------+

| 1 | 2 |

| 1 | NULL |

| 1 | NULL |

+------+------+




For purposes of DISTINCT, the NULL values in the second column are considered the same, so the second and third rows are identical. Adding DISTINCT to the query eliminates one of them as a duplicate:










mysql> SELECT DISTINCT i, j FROM t;

+------+------+

| i | j |

+------+------+

| 1 | 2 |

| 1 | NULL |

+------+------+




Using DISTINCT is equivalent to using GROUP BY on all selected columns with no aggregate function. For such a query, GROUP BY just produces a list of distinct grouping values. If you display and group by a single column, the query produces the distinct values in that column. If you display and group by multiple columns, the query produces the distinct combinations of values in the column. For example, the following two queries are equivalent:










SELECT DISTINCT id FROM t;

SELECT id FROM t GROUP BY id;




As are these:










SELECT DISTINCT id, name FROM t;

SELECT id, name FROM t GROUP BY id, name;




Another correspondence between the behavior of DISTINCT and GROUP BY is that for purposes of assessing distinctness, DISTINCT considers all NULL values the same. This is analogous to the way that GROUP BY groups together NULL values.



A difference between DISTINCT and GROUP BY is that DISTINCT doesn't cause row sorting the way that GROUP BY does.



DISTINCT can be used with the COUNT() function to count how many distinct values a column contains. However, in this case, NULL values are ignored:










mysql> SELECT j FROM t;

+------+

| j |

+------+

| 2 |

| NULL |

| NULL |

+------+

mysql> SELECT COUNT(DISTINCT j) FROM t;

+-------------------+

| COUNT(DISTINCT j) |

+-------------------+

| 1 |

+-------------------+




COUNT(DISTINCT) is discussed further in section 5.5, "Aggregate Functions, GROUP BY, and HAVING."













     < Day Day Up > 



    No comments:

    Post a Comment