Thursday, October 22, 2009

18.9 Processing File Uploads




I l@ve RuBoard










18.9 Processing File Uploads




18.9.1 Problem



You want to allow files to be uploaded
your web server and stored in your database.





18.9.2 Solution



Present
the user with a web form that includes a file field. Use a file field
in a web form. When the user submits the form, extract the file and
store it in MySQL.





18.9.3 Discussion



One special kind of web input is an uploaded file. A file is sent as
part of a POST request, but it's
handled differently than other POST parameters,
because a file is represented by several pieces of information such
as its contents, its MIME type, its original filename on the client,
and its name in temporary storage on the web server host.



To handle file uploads, you must send a special kind of form to the
user; this is true no matter what API you use to create the form.
However, when the user submits the form, the operations that check
for and process an uploaded file are API-specific.



To create a form that allows files to be uploaded, the opening
<form> tag should specify the
POST method and must also include an
enctype (encoding type) attribute with a value of
multipart/form-data:



<form method="POST" enctype="multipart/form-data" action="script_name">


If you don't specify this kind of encoding, the form
will be submitted using the default encoding type
(application/x-www-form-urlencoded) and file
uploads will not work properly.



To include a file upload field in the form, use an
<input> element of type
file. For example, to present a 60-character file
field named upload_file, the element looks like
this:



<input type="file" name="upload_file" size="60" />


The browser displays this field as a text input box into which the
user can enter the name manually. It also presents a Browse button
for selecting the file via the standard file-browsing system dialog.
When the user chooses a file and submits the form, the browser
encodes the file contents for inclusion into the resulting
POST request. At that point, the web server
receives the request and invokes your script to process it. The
specifics vary for particular APIs, but file uploads generally work like this:




  • The file will already have been uploaded and stored in a temporary
    directory by the time your upload-handling script begins executing.
    All your script has to do is read it. The temporary file will be
    available to your script either as an open file descriptor or the
    temporary filename, or perhaps both. The size of the file can be
    obtained through the file descriptor. The API may also make available
    other information about the file, such as its MIME type. (But note
    that some browsers may not send a MIME value.)


  • Uploaded files are deleted automatically by the web server when your
    script terminates. If you want a file's contents to
    persist beyond the end of your script's execution,
    you'll have to save it to a more permanent location
    (for example, in a database or somewhere else in the filesystem). If
    you save it in the filesystem, the directory where you store it must
    be accessible to the web server.


  • The API may allow you to control the location of the temporary file
    directory or the maximum size of uploaded files. Changing the
    directory to one that is accessible only to your web server may
    improve security a bit against local exploits by other users with
    login accounts on the server host.



This section discusses how to create forms that include a file upload
field. It also demonstrates how to handle uploads using a Perl
script, post_image.pl. The script is somewhat
similar to the store_image.pl script for loading
images from the command line (Recipe 17.7).
post_image.pl differs in that it allows you to
store images over the Web by uploading them, and it stores images
only in MySQL, whereas store_image.pl stores
them in both MySQL and the filesystem.



This section also discusses how to obtain file upload information
using PHP and Python. It does not repeat the entire image-posting
scenario shown for Perl, but the recipes
distribution contains equivalent implementations of
post_image.pl for PHP and Python.





18.9.4 Perl



You can specify multipart encoding for a
form several ways using the CGI.pm module. The following statements
are all equivalent:



print start_form (-action => url ( ), -enctype => "multipart/form-data");
print start_form (-action => url ( ), -enctype => MULTIPART ( ));
print start_multipart_form (-action => url ( ));


The first statement specifies the encoding type literally. The second
uses the CGI.pm MULTIPART( ) function, which is
easier than trying to remember the literal encoding value. The third
statement is easiest of all, because start_multipart_form(
)
supplies the enctype
parameter automatically. (Like start_form( ),
start_multipart_form( ) uses a default request
method of POST, so you need not include a
method argument.)



Here's a simple form that includes a text field for
assigning a name to an image, a file field for selecting the image
file, and a submit button:



print start_multipart_form (-action => url ( )),
"Image name:", br ( ),
textfield (-name =>"image_name", -size => 60),
br ( ), "Image file:", br ( ),
filefield (-name =>"upload_file", -size => 60),
br ( ), br ( ),
submit (-name => "choice", -value => "Submit"),
end_form ( );


When the user submits an uploaded file, begin processing it by
extracting the parameter value for the file field:



$file = param ("upload_file");


The value for a file upload parameter is special in
CGI.pm because you can use it two
ways. You can treat it as an open file handle to read the
file's contents, or pass it to uploadInfo(
)
to obtain a reference to a hash that provides information
about the file such as its MIME type. The following listing shows how
post_image.pl
presents the form and processes a submitted form. When first invoked,
post_image.pl generates a form with an upload
field. For the initial invocation, no file will have been uploaded,
so the script does nothing else. If the user submitted an image file,
the script gets the image name, reads the file contents, determines
its MIME type, and stores a new record in the
image table. For illustrative purposes,
post_image.pl also displays all the information
that the uploadInfo( ) function makes available
about the uploaded file.



#! /usr/bin/perl -w
# post_image.pl - allow user to upload image files via POST requests

use strict;
use lib qw(/usr/local/apache/lib/perl);
use CGI qw(:standard escapeHTML);
use Cookbook;

print header ( ), start_html (-title => "Post Image", -bgcolor => "white");

# Use multipart encoding because the form contains a file upload field

print start_multipart_form (-action => url ( )),
"Image name:", br ( ),
textfield (-name =>"image_name", -size => 60),
br ( ), "Image file:", br ( ),
filefield (-name =>"upload_file", -size => 60),
br ( ), br ( ),
submit (-name => "choice", -value => "Submit"),
end_form ( );

# Get a handle to the image file and the name to assign to the image

my $image_file = param ("upload_file");
my $image_name = param ("image_name");

# Must have either no parameters (in which case that script was just
# invoked for the first time) or both parameters (in which case the form
# was filled in). If only one was filled in, the user did not fill in the
# form completely.

my $param_count = 0;
++$param_count if defined ($image_file) && $image_file ne "";
++$param_count if defined ($image_name) && $image_name ne "";

if ($param_count == 0) # initial invocation
{
print p ("No file was uploaded.");
}
elsif ($param_count == 1) # incomplete form
{
print p ("Please fill in BOTH fields and resubmit the form.");
}
else # a file was uploaded
{
my ($size, $data);

# If an image file was uploaded, print some information about it,
# then save it in the database.

# Get reference to hash containing information about file
# and display the information in "key=x, value=y" format
my $info_ref = uploadInfo ($image_file);
print p ("Information about uploaded file:");
foreach my $key (sort (keys (%{$info_ref})))
{
printf p ("key="
. escapeHTML ($key)
. ", value="
. escapeHTML ($info_ref->{$key}));
}
$size = (stat ($image_file))[7]; # get file size from file handle
print p ("File size: " . $size);

binmode ($image_file); # helpful for binary data
if (sysread ($image_file, $data, $size) != $size)
{
print p ("File contents could not be read.");
}
else
{
print p ("File contents were read without error.");

# Get MIME type, use generic default if not present

my $mime_type = $info_ref->{'Content-Type'};
$mime_type = "application/octet-stream" unless defined ($mime_type);

# Save image in database table. (Use REPLACE to kick out any
# old image with same name.)

my $dbh = Cookbook::connect ( );
$dbh->do ("REPLACE INTO image (name,type,data) VALUES(?,?,?)",
undef,
$image_name, $mime_type, $data);
$dbh->disconnect ( );
}
}

print end_html ( );

exit (0);




18.9.5 PHP



To write an upload form in PHP, include
a file field. If you wish, you may also include a hidden field
preceding the file field that has a name of
MAX_FILE_SIZE and a value of the largest file size
you're willing to accept:



<form method="POST" enctype="multipart/form-data"
action="<?php print (get_self_path ( )); ?>">
<input type="hidden" name="MAX_FILE_SIZE" value="4000000" />
Image name:<br />
<input type="text" name="image_name" size="60" />
<br />
Image file:<br />
<input type="file" name="upload_file" size="60" />
<br /><br />
<input type="submit" name="choice" value="Submit" />
</form>


Be aware that MAX_FILE_SIZE is advisory only,
because it can be subverted easily. To specify a value that cannot be
exceeded, use the upload_max_filesize
configuration setting in the PHP initialization file. There is also a
file_uploads setting that controls whether or not
file uploads are allowed at all.



When the user submits the form, file upload information may be
obtained as follows:




  • As of PHP 4.1, file upload information from POST
    requests is placed in a separate array, $_FILES,
    which has one entry for each uploaded file. Each entry is itself an
    array with four elements. For example, if a form has a file field
    named upload_file and the user submits a file,
    information about it is available in the following variables:

    $_FILES["upload_file]["name"]              original filename on client host
    $_FILES["upload_file]["tmp_name"] temporary filename on server host
    $_FILES["upload_file]["size"] file size, in bytes
    $_FILES["upload_file]["type"] file MIME type

    Be careful here, because there may be an entry for an upload field
    even if the user submitted no file. In this case, the
    tmp_name value will be the empty string or the
    string none.



  • Earlier PHP 4 releases have file upload information in a separate
    array, $HTTP_POST_FILES, which has entries that
    are structured like those in $_FILES. For a file
    field named upload_file, information about it is
    available in the following variables:

    $HTTP_POST_FILES["upload_file]["name"]     original filename on client host
    $HTTP_POST_FILES["upload_file]["tmp_name"] temporary filename on server host
    $HTTP_POST_FILES["upload_file]["size"] file size, in bytes
    $HTTP_POST_FILES["upload_file]["type"] file MIME type

  • Prior to PHP 4, file upload information for a field named
    upload_file is available in a set of four
    $HTTP_POST_VARS variables:

    $HTTP_POST_VARS["upload_file_name"]        original filename on client host
    $HTTP_POST_VARS["upload_file"] temporary filename on server host
    $HTTP_POST_VARS["upload_file_size"] file size, in bytes
    $HTTP_POST_VARS["upload_file_type"] file MIME type


$_FILES is a superglobal array (global in any
scope). $HTTP_POST_FILES and
$HTTP_POST_VARS must be declared with the
global keyword if used in a non-global scope, such
as within a function.



To avoid having to fool around figuring out which array contains file
upload information, it makes sense to write a utility routine that
does all the work. The following function, get_upload_info(
)
, takes an argument corresponding to the
name of a file upload field. Then it examines the
$_FILES, $HTTP_POST_FILES, and
$HTTP_POST_VARS arrays as necessary and returns an
associative array of information about the file, or an unset value if
the information is not available. For a successful call, the array
element keys are "tmp_name",
"name", "size", and
"type" (that is, the keys are the same as those in
the entries within the $_FILES or
$HTTP_POST_FILES arrays.)



function get_upload_info ($name)
{
global $HTTP_POST_FILES, $HTTP_POST_VARS;

unset ($unset);
# Look for information in PHP 4.1 $_FILES array first.
# Check the tmp_name member to make sure there is a file. (The entry
# in $_FILES might be present even if no file was uploaded.)
if (isset ($_FILES))
{
if (isset ($_FILES[$name])
&& $_FILES[$name]["tmp_name"] != ""
&& $_FILES[$name]["tmp_name"] != "none")
return ($_FILES[$name]);
return (@$unset);
}
# Look for information in PHP 4 $HTTP_POST_FILES array next.
if (isset ($HTTP_POST_FILES))
{
if (isset ($HTTP_POST_FILES[$name])
&& $HTTP_POST_FILES[$name]["tmp_name"] != ""
&& $HTTP_POST_FILES[$name]["tmp_name"] != "none")
return ($HTTP_POST_FILES[$name]);
return (@$unset);
}
# Look for PHP 3 style upload variables.
# Check the _name member, because $HTTP_POST_VARS[$name] might not
# actually be a file field.
if (isset ($HTTP_POST_VARS[$name])
&& isset ($HTTP_POST_VARS[$name . "_name"]))
{
# Map PHP 3 elements to PHP 4-style element names
$info = array ( );
$info["name"] = $HTTP_POST_VARS[$name . "_name"];
$info["tmp_name"] = $HTTP_POST_VARS[$name];
$info["size"] = $HTTP_POST_VARS[$name . "_size"];
$info["type"] = $HTTP_POST_VARS[$name . "_type"];
return ($info);
}
return (@$unset);
}


See the post_image.php script for details about
how to use this function to get image information and store it in
MySQL.



The upload_tmp_dir
PHP configuration setting controls
where uploaded files are saved. This is /tmp by
default on many systems, but you may want to override it to
reconfigure PHP to use a different directory that's
owned by the web server user ID and thus more private.





18.9.6 Python



A simple upload form in Python can be
written like this:



print "<form method=\"POST\" enctype=\"multipart/form-data\" action=\"%s\">" \
% (os.environ["SCRIPT_NAME"])
print "Image name:<br />"
print "<input type=\"text\" name=\"image_name\", size=\"60\" />"
print "<br />"
print "Image file:<br />"
print "<input type=\"file\" name=\"upload_file\", size=\"60\" />"
print "<br /><br />"
print "<input type=\"submit\" name=\"choice\" value=\"Submit\" />"
print "</form>"


When the user submits the form, its contents can be obtained using
the FieldStorage( ) method of the
cgi module. (See Recipe 18.6.)
The resulting object contains an element for each input parameter.
For a file upload field, you get this information as follows:



form = cgi.FieldStorage ( )
if form.has_key ("upload_file") and form["upload_file"].filename != "":
image_file = form["upload_file"]
else:
image_file = None


According to most of the documentation that I have read, the
file attribute of an object that corresponds to a
file field should be true if a file has been uploaded. Unfortunately,
the file attribute seems to be true even when the
user submits the form but leaves the file field blank. It may even be
the case that the type attribute is set when no
file actually was uploaded (for example, to
application/octet-stream). In my experience, a
more reliable way to determine whether a file really was uploaded is
to test the filename attribute:



form = cgi.FieldStorage ( )
if form.has_key ("upload_file") and form["upload_file"].filename:
print "<p>A file was uploaded</p>"
else:
print "<p>A file was not uploaded</p>"


Assuming that a file was uploaded, access the
parameter's value attribute to
read the file and obtain its contents:



data = form["upload_file"].value


See the post_image.py script for details about
how to use this function to get image information and store it in
MySQL.










    I l@ve RuBoard



    No comments:

    Post a Comment