Wednesday, November 4, 2009

Item 48: Batch SQL work to avoid round-trips











 < Day Day Up > 





Item 48: Batch SQL work to avoid round-trips



Given the cost of moving across the network, we need to minimize the number of times we travel across the network connection to another machine. Unfortunately, the default behavior of the JDBC driver is to work with a one-statement, one-round-trip model: each time execute (or one of its variations, executeUpdate or executeQuery) runs, the driver marshals up the request and sends it to the database where it is parsed, executed, and returned. This tedious process consumes many CPU cycles in pure overhead on each trip. By batching statements together, we can send several SQL statements in a single round-trip to the database, thus avoiding some of that overhead.



Bear in mind that while this is predominantly a state management issue, trying to improve performance by reducing the number of times we have to hit the network, it also applies to transactional processing. We're likely to be in a situation where we're holding open transactional locks, and therefore we want to minimize the amount of time those locks are held open (see Item 29).



There's another element to this, however; sometimes multiple statements need to be executed in a group in order to carry out a logical request. Normally, you would want to do all this under a single transaction (thus requiring that you setAutoCommit(false) on the Connection you're using), but that doesn't imply that the driver will do it all as part of a single round-trip. It's entirely plausible that the driver will send each statement over individually, all while holding the transactional lock open. Therefore, since it's all happening under a single transaction, it's far better to execute them as a group. The transaction itself is an all-or-nothing situation anyway.










Connection conn = getConnectionFromSomeplace();

conn.setAutoCommit(false);

Statement stmt = conn.createStatement();



// Step 1: insert the person

stmt.addBatch("INSERT INTO person (id, f_name, l_name) " +

"VALUES (" + id + ", '" + firstName + "', '" +

lastName + "')");

// Step 2: insert the person's account

stmt.addBatch("INSERT INTO account (personID, type) " +

"VALUES (" + id + ", 'SAVINGS')");



// Execute the batch

int[] results = stmt.executeBatch();

// Check the batched results

boolean completeSuccess = true;

for (int i=0; i<results.length; i++)

{

if (results[i] >= 0 || results[i] ==

Statement.SUCCESS_NO_INFO)

/* Do nothing�statement was successful */ ;

else

{

/* Something went wrong; alert user? */

completeSuccess = false;

}

}



if (completeSuccess)

conn.commit();

else

conn.rollback();




In this code, we create a Statement and use it to execute two non-SELECT SQL statements against the database. (Batching doesn't work with SELECT statements.) In this case, we're adding a person to the database and adding information about the person's new savings account. Note that the key to executing in batch here is to use the executeBatch method on Statement; this tells the driver to send the statements over to the database en masse.



The executeBatch method returns an array of integers as a way to indicate the success or failure of each of the batched methods. Each element in the array corresponds to one batched statement; a zero or positive number indicates the "update count" result (the number of rows affected by the call), and a value of SUCCESS_NO_INFO indicates the call succeeded but didn't have an update count.



In the event one of the statements fails, JDBC allows the driver to take one of several options. It can throw an exception and halt execution of statements at that point, or it can continue processing statements, in which case the integer in the results array will be set to EXECUTE_FAILED.



Note that executing in batch doesn't imply transactional boundaries�the two statements have executed, but because AutoCommit is turned off, they have not yet committed to the database. Therefore, we need to either commit or roll back the work done; in this case, we commit only if all the statements executed successfully, a reasonable policy. If you call commit before calling executeBatch, the batched statements will be sent over as if you'd called executeBatch just prior to commit. If you try to batch with AutoCommit set to true, the results are undefined by the JDBC Specification�which is shorthand for "quite likely to yield some kind of exception."



Using JDBC's executeBatch method isn't the only way to batch execution, however. Because many databases support some kind of logical line termination character, thus creating the ability to execute multiple SQL statements as part of one logical line of input, it's possible to write JDBC calls like this:










Connection conn = ...; // Obtain from someplace

Statement stmt = conn.createStatement();

stmt.execute("SELECT first_name, last_name FROM Person " +

"WHERE primary_key = 1234; " +

"SELECT ci.content, ci.type " +

"FROM ContactInfo ci, PerConLookup pc " +

"WHERE ci.primary_key = pc.contactInfo_key " +

"AND pc.person_key = 1234");

ResultSet personRS = stmt.getResultSet();

// The first SELECT



if (stmt.getMoreResults())

{

// The other SELECT

ResultSet contactInfoRS = stmt.getResultSet();

}




Although more awkward to work with,[5] batching statements this way carries the advantage of working for both SELECT statements as well as INSERT, UPDATE, and DELETE statements, all while furthering the basic goal, that of trying to amortize the cost of several SQL queries by running them in a single request-response cycle against the database.

[5] In production code you should check the Boolean return value from execute to make sure the first SELECT statement produced results, rather than ignore it as I have here for clarity.













     < Day Day Up > 



    No comments:

    Post a Comment