Saturday, October 24, 2009

10.28 Validation Using Table Metadata




I l@ve RuBoard










10.28 Validation Using Table Metadata




10.28.1 Problem



You need to check
input values against the legal members of an ENUM
or SET column.





10.28.2 Solution



Get the column definition, extract the list of members from it, and
check data values against the list.





10.28.3 Discussion



Some forms of validation involve checking input values against
information stored in a database. This includes values to be stored
in an ENUM or SET column, which
can be checked against the valid members stored in the column
definition. Database-backed validation also applies when you have
values that must match those listed in a lookup table to be
considered legal. For example, input records that contain customer
IDs can be required to match a record in a
customers table, or state abbreviations in
addresses can be verified against a table that lists each state. This
section describes ENUM- and
SET-based validation, and Recipe 10.29 discusses how to use lookup tables.



One way to check input values that correspond to the legal values of
ENUM or SET columns is to get
the list of legal column values into an array using the information
returned by SHOW COLUMNS,
then perform an array membership test. For example, the
favorite-color column color from the
profile table is an ENUM that
is defined as follows:



mysql> SHOW COLUMNS FROM profile LIKE 'color'\G
*************************** 1. row ***************************
Field: color
Type: enum('blue','red','green','brown','black','white')
Null: YES
Key:
Default: NULL
Extra:


If you extract the list of enumeration members from the
Type value and store them in an array
@members, you can perform the membership test like
this:



$valid = grep (/^$val$/i, @members);


The pattern constructor begins and ends with
^ and $ to require
$val to match an entire enumeration member (rather
than just a substring). It also is followed by an
i to specify a case-insensitive comparison,
because ENUM columns are not case sensitive.



In Recipe 9.7, we wrote a function
get_enumorset_info( ) that returns
ENUM or SET column metadata.
This includes the list of members, so it's easy to
use that function to write another utility routine,
check_enum_value(
)
, that gets the legal enumeration values and
performs the membership test. The routine takes four arguments: a
database handle, the table name and column name for the
ENUM column, and the value to check. It returns
true or false to indicate whether or not the value is legal:



sub check_enum_value
{
my ($dbh, $tbl_name, $col_name, $val) = @_;

my $valid = 0;
my $info = get_enumorset_info ($dbh, $tbl_name, $col_name);
if ($info && $info->{type} eq "enum")
{
# use case-insensitive comparison; ENUM
# columns are not case sensitive
$valid = grep (/^$val$/i, @{$info->{values}});
}
return ($valid);
}


For single-value testing, such as to validate a value submitted in a
web form, that kind of test works well. However, if
you're going to be testing a lot of values (like an
entire column in a datafile), it's better to read
the enumeration values into memory once, then use them repeatedly to
check each of the data values. Furthermore, it's a
lot more efficient to perform hash lookups
than array lookups (in Perl at least). To do so, retrieve the legal
enumeration values and store them as keys of a hash. Then test each
input value by checking whether or not it exists as a hash key.
It's a little more work to construct the hash, which
is why check_enum_value( )
doesn't do so. But for bulk validation, the improved
lookup speed more than makes up for the hash construction
overhead.[4]


[4] If you want to check for yourself the
relative efficiency of array membership tests versus hash lookups,
try the lookup_time.pl script in the
transfer directory of the
recipes distribution.



Begin by getting the
metadata for the column, then convert the list of legal enumeration
members to a hash:



my $ref = get_enumorset_info ($dbh, $tbl_name, $col_name);
my %members;
foreach my $member (@{$ref->{values}})
{
# convert hash key to consistent case; ENUM isn't case sensitive
$members{lc ($member)} = 1;
}


The loop makes each enumeration member exist as the key of a hash
element. The hash
key is
what's important here; the value associated with it
is irrelevant. (The example shown sets the value to
1, but you could use undef,
0, or any other value.) Note that the code
converts the hash keys to lowercase before storing them. This is
done because hash key lookups in Perl are
case sensitive. That's fine if the values that
you're checking also are case sensitive, but
ENUM columns are not. By converting the
enumeration values to a given lettercase before storing them in the
hash, then converting the values you want to check similarly, you
perform in effect a case insensitive key existence test:



$valid = exists ($members{lc ($val)});


The preceding example converts enumeration values and input values to
lowercase. You could just as well use uppercase�as long as you
do so for all values consistently.



Note that the existence test may fail if the input value is the empty
string. You'll have to decide how to handle that
case on a column-by-column basis. For example, if the column allows
NULL values, you might interpret the empty string
as equivalent to NULL and thus as being a legal
value.



The validation procedure for
SET values is
similar to that for ENUM values, except that an
input value might consist of any number of SET
members, separated by commas. For the value to be legal, each element
in it must be legal. In addition, because "any
number of members" includes
"none," the empty string is a legal
value for any SET column.



For one-shot testing of individual input values, you can use a
utility routine check_set_value(
)
that is similar to check_enum_value(
)
:



sub check_set_value
{
my ($dbh, $tbl_name, $col_name, $val) = @_;

my $valid = 0;
my $info = get_enumorset_info ($dbh, $tbl_name, $col_name);
if ($info && $info->{type} eq "set")
{
#return 1 if $val eq ""; # empty string is legal element
# use case-insensitive comparison; SET
# columns are not case sensitive
$valid = 1; # assume valid until we find out otherwise
foreach my $v (split (/,/, $val))
{
if (!grep (/^$v$/i, @{$info->{values}}))
{
$valid = 0; # value contains an invalid element
last;
}
}
}
return ($valid);
}


For bulk testing, construct a hash from the legal
SET members. The
procedure is the same as for producing a hash from
ENUM elements:



my $ref = get_enumorset_info ($dbh, $tbl_name, $col_name);
my %members;
foreach my $member (@{$ref->{values}})
{
# convert hash key to consistent case; SET isn't case sensitive
$members{lc ($member)} = 1;
}


To validate a given input value against the SET
member hash, convert it to the same lettercase as the hash keys,
split it at commas to get a list of the individual elements of the
value, then check each one. If any of the elements are invalid, the
entire value is invalid:



$valid = 1;         # assume valid until we find out otherwise
foreach my $elt (split (/,/, lc ($val)))
{
if (!exists ($members{$elt}))
{
$valid = 0; # value contains an invalid element
last;
}
}


After the loop terminates, $valid is true if the
value is legal for the SET column, and false
otherwise. Empty strings are always legal SET
values, but this code doesn't perform any
special-case test for an empty string. No such test is necessary,
because in that case the split( ) operation
returns an empty list, the loop never executes, and
$valid remains true.










    I l@ve RuBoard



    No comments:

    Post a Comment