Friday, October 23, 2009

Item 67: Aggressively release resources











 < Day Day Up > 





Item 67: Aggressively release resources



If anything, a J2EE application is all about accessing resources outside of the JVM: the relational database (JDBC), messaging systems (JMS), other objects running in separate processes (RMI), distributed transaction coordinators (JTA), enterprise integration systems (JCA), and so on. The Java Virtual Machine is great at managing resources within the VM, most notably memory, but unfortunately the JVM's automatic memory management scheme isn't so great at managing resources outside of the JVM (like database connections, result set cursors, and so on). As a result, J2EE programmers need to get into the habit of being as explicit about resource deallocation as we are about allocation. Loosely translated, that means aggressively shutting down resource objects the moment we're finished with them.



This may seem unnecessary at first�after all, Java provides the finalizer mechanism to do per-object cleanup during the whole garbage collection process. So, assuming all of those resource objects (connections, result sets, and so on) have finalizers to shut down whatever resources are held outside of the JVM, we can just let garbage collection take care of it all, right?



Tragically, no. As Item 74 explains, finalizers have some serious issues�in this case, finalizers are entirely nondeterministic and fire "sometime after" the object has been orphaned and left available for collection. The garbage collector doesn't realize that these resource objects are holding resources more precious than just N bytes of memory, however, so the collector won't get around to collecting them until later. As a matter of fact, because these resource objects tend to survive several generational garbage collection passes (see Item 72), they migrate to older generations in a generational collector, which in turn means they won't even be considered for collection until after the entire young generation of objects has been collected and there's still not room left to do an allocation. In a long-running system with the young generation tuned correctly (see Item 68), this means it could be minutes, even hours or days, before those objects get finalized. A system that allocates ResultSet objects at a rate of one per second and never releases them will quickly exhaust the database on the other side, regardless of the database server hardware underneath it.



This in turn means that unless you as a programmer step in to correct this state of affairs, you're going to be in for some nasty high-contention and out-of-resource scenarios. For example, when obtaining ResultSet from a Statement, depending on the transaction isolation level (see Item 35), the database typically has to hold locks against the data in the table the ResultSet is iterating over, because if the ResultSet is scrollable, the ResultSet itself has no idea which data is going to be accessed and used next until it's officially closed. (A forward-only "firehose"-style ResultSet at least knows that once data has been read, it can no longer be accessed, which is why many JDBC performance guides suggest using them as often as possible.) As a result, too many open ResultSet objects against a single table could create a situation where other clients seeking to access data in that table are held at bay until some of the open locks are released. That's contention, that hurts scalability, and that's exactly what you need to avoid.



As a result, you need to be as aggressive as you can about releasing any resource objects allocated by the system, in particular, JDBC Connection, Statement (with one notable exception), and ResultSet objects. In an ideal world, all of these objects would be held via WeakReference instances such that as soon as the last strong reference to them is dropped, the object itself could be cleaned up (per Item 74), but that doesn't appear to be happening anytime soon, so as a result you need to ensure you call close on any JDBC resource as soon as you're finished with it. Note that despite not being mentioned until the JDBC 3.0 Specification, calling close on a Connection typically closes all Statement objects created by that Connection, and calling close on a Statement typically closes all ResultSet objects created by that Statement.



Be very careful when writing the code to do the closing, however; you need to ensure that all possible code paths are covered. For example, consider the following snippet:










public String[] getFirstNames(String lastName)

{

ArrayList results = new ArrayList();

Connection con = null;

PreparedStatement stmt = null;

ResultSet rs = null;



try

{

con = getConnectionFromSomeplace();

String prepSQL =

"SELECT first_name FROM person " +

"WHERE last_name = ?";

stmt = con.prepareStatement(prepSQL);

// See Item 49 for why we use PreparedStatement

stmt.setString(1, lastName);

rs = stmt.executeQuery();



while (rs.next())

results.add(rs.getString(1));

}

catch (SQLException sqlEx)

{

Logger l = getLoggerFromSomeplace();

l.fatal("SQL statement failed: " + sqlEx);

// By the way, don't forget to do something

// more proactive to handle the error;

// see Item 7

}



if (rs != null) rs.close();

if (stmt != null) stmt.close();

if (con != null) con.close();

// Could also just call con.close if we know

// that the JDBC driver does cascading closure

// (which most do), but we'd have to know our

// JDBC driver (see Item 49) to feel safe doing

// that



return (String[])results.toArray(new String[0]);

}




As soon as we're done with the ResultSet (which in turn implies we're done with the Statement and the Connection that prepared it), we call close on them all and we're ready to move on to the next item on our to-do list. Right?



Wrong�there's a hideous bug in here just waiting to bite us later. What happens if an unchecked exception gets thrown from somewhere in the method that's not a SQLException? Because we're not catching anything other than SQLException instances in the try/catch block, we'll never execute the close calls on the Connection, Statement, or ResultSet, and we'll effectively "leak" the resource until the garbage collector gets around to finalizing them.



As a result, we need to change the code above just slightly, to make use of a finally block to do the closures, rather than resting on the good graces of fate to make sure those close calls get made:










public String[] getFirstNames(String lastName)

{

ArrayList results = new ArrayList();

Connection con = null;

PreparedStatement stmt = null;

ResultSet rs = null;



try

{

con = getConnectionFromSomeplace();

stmt = con.prepareStatement(

"SELECT first_name FROM person " +

"WHERE last_name = ?");



// See Item 61 for why we use PreparedStatement

// even though we'll lose the "preparation"

// part of it when the Connection closes



stmt.setString(1, lastName);

rs = stmt.executeQuery();



while (rs.next())

results.add(rs.getString(1));



return (String[])results.toArray(new String[0]);

}

catch (SQLException sqlEx)

{

Logger l = getLoggerFromSomeplace();

// See Item 12, as well as Item 7

l.fatal("SQL statement failed: " + sqlEx);



return new String[0];

// See Effective Java [Bloch, Item 27]

}

finally

{

if (rs != null) rs.close();

if (stmt != null) stmt.close();

if (con != null) con.close();

// Could also just call con.close if we know

// that the JDBC driver does cascading closure

// (which most do), but we'd have to know our

// JDBC driver (see Item 49) to feel safe doing

// that

}

}




Now, regardless of how execution leaves the try block�simple completion, a return statement, or an exception being thrown within it�the finally block will always get called, thus ensuring that the JDBC resource objects are always released aggressively.



At first blush, it seems counterintuitive to do this. After all, it's not a free operation to establish the connection in the first place because authentication against the credentials passed in (username, password) has to take place against the database, so why release it just as soon as we've established it? This would seem to be an obvious situation where we would want to cache the connection off someplace, either into HttpSession or a stateless session bean.



The problem is that caching off the connection improves performance at the expense of scalability. Most often, a client program using a database connection, or any other form of external resource for that matter, doesn't use 100% of its capacity. Think about the typical enterprise Web- or UI-based application: we present some data to the user, who spends an eternity (to the CPU, anyway) thinking about it and possibly making changes and then submits those changes, which may or may not go directly into the database. In the meantime, we're holding open a connection against the database that no other client can use. If you think about it, a client probably makes use of the database connection it obtains about 5% of the time. Which brings us back to the whole point of moving to the three- or n-tier system in the first place: sharing resources. If we can multiplex that connection across 20 clients, that connection gets used to its full 100% capacity, yet only one physical connection is necessary�theoretically, then, we can now support 20 times more concurrent clients against the same hardware resources.



But we're still left with a fundamental problem, that of connection management�we're still facing the overhead of establishing and closing those database connections. In an ideal world, if all clients are somehow routing their database requests through the same JVM, we can pool connections, making it look like the middleware layer has established 20 connections to the database when in fact it's merely multiplexing 20 clients over the same physical connection. Assuming, then, that each client uses the same credentials to connect to the database, the cost of obtaining a connection is amortized across all 20 clients, making it a much more palatable situation.



Even when connection pooling isn't possible, though, we're still generally going to prefer to acquire, use, and release resources aggressively. Granted, it's a performance hit, but most enterprise systems written today are more interested in scalability than in performance. This may seem like an improbable statement at first, but bear in mind that many of the IT projects developed after 1995 are built to sit facing the public via the Internet, and it's a truly embarrassing PR blunder to have your system crumple under a burst load like that produced by the Slashdot effect. Most users won't notice if ordering books online takes a few extra seconds (time that can often be made up by reducing the number of fluffy graphic elements on your Web page�see Item 52), but they'll definitely notice when the service goes down under load.



This acquire-use-release approach is called just-in-time activation (JITA), by the way, and is more or less directly modeled by parts of the J2EE Specification, most notably the stateless session bean. Remember, a stateless session bean cannot hold state across method calls, meaning that any resources the bean needs must be acquired at the start of the method and released at the end of the method. It's a great enforcement of the JITA policy, but while it works well for situations where a single stateless session bean method call models a single user interaction session (HTTP request, most often), it creates a lot of "resource churn" when a single user request requires multiple stateless session bean calls�you get an "acquire-use-release-acquire-use-release-acquire-use-release" effect. This is predominantly why, in the case of EJB stateless session beans, multiple authors have suggested that your stateless session bean methods should match (more or less) one-to-one with your system use cases, since that way the resources can be acquired and released once.



What about those situations when we need to keep the data retrieved out of a ResultSet around for longer than the scenario above? For example, in a search-results page, we want to hold the search results across page invocations�we don't want to display all the search results at once, obviously, but we don't want to have to go back to the database over and over again to conduct the same search on each successive "Next" request from the search results page, either.



Instead of holding the data in a connected ResultSet, put the data into a disconnected RowSet, close the ResultSet, and use the RowSet (which inherits from ResultSet, remember, so the API is identical) to display the search page results rather than using the original ResultSet. Unfortunately, doing so means taking up more memory in the client process, since all of the result set is now being held in memory inside the RowSet instance, where before it was being pulled over in an on-demand fashion via the ResultSet APIs. This is a case where you have to make a conscious decision to support scalability at the JVM level (memory being the finite resource in contention) as opposed to scalability at the database (transactional lock) level�for most projects, it's easier to put more memory into the client than to change the transactional policies within the database, but this is a decision you have to make.



Keep an eye, by the way, on JSR-114, the new RowSet implementation JSR that will define five different kinds of RowSet objects for standard use throughout J2EE applications. (The RowSet interface has been a part of J2EE for some time now, but no implementations have been standardized�Sun released several, the CachedRowSet and WebRowSet being the two most popular, but they remained outside of the formal J2EE Specification.) In addition to supporting the disconnected storage of relational data that we like about the RowSet, several new features are being added, such as optimistic concurrency semantics (see javax.sql.rowset.spi), XML input/output (see javax.sql.rowset.spi.XMLReader and javax.sql.rowset.spi.XMLWriter), and the ability to maintain relational relationships across multiple RowSet objects (see javax.sql.rowset.JoinRowSet). The RowSet should be your first choice for offline storage�either use what has already been provided as part of JSR-114, or roll your own RowSet implementation to provide the additional semantics you need.



Regardless of whether you store the data into a RowSet or a List of Map instances, pull what data you need across as quickly as you can in order to aggressively release the JDBC resource objects, thereby easing the strain on the JDBC plumbing and database lock management. Note that although not as well-known, this advice applies to other "outside" resources providing connected data feeds, such as distributed transactions or Connector resource objects (like Record objects).













     < Day Day Up > 



    No comments:

    Post a Comment