Saturday, October 24, 2009

Item 70: Recognize ClassLoader boundaries











 < Day Day Up > 





Item 70: Recognize ClassLoader boundaries



The ClassLoader architecture owns responsibility for the most fundamental task of the JVM, that of loading code into the JVM and making it available for execution. Along the way, it also defines isolation boundaries so that classes of similar package/class naming won't necessarily clash with one another. In doing so, ClassLoaders introduce phenomenal flexibility into a Java server-based environment and, as a result, phenomenal complexity. Dealing with ClassLoader relationships drives developers nuts�just when we think we've got it all figured out, something else creeps in and throws everything up in the air again. Despite our desire to bury our collective head in the sand and pretend that ClassLoaders will go away if we wish hard enough, the brutal fact remains that enterprise Java developers must understand the ClassLoader relationships within their server product of choice.



Consider an all-too-common scenario: a servlet Web application stores Beans (not the EJB kind) in user session space for use during later processing as part of the Web application�classic MVC/Model 2 architecture. At first, the Beans were deployed in a separate .jar file to the CLASSPATH of the server product itself�it seemed simpler that way. Unfortunately, doing so meant that the Beans accidentally clashed with other Beans of the same name (how many LoginBean classes can we count?) in a different Web application, so the Beans had to move into the Web application's WEB-INF/lib directory, where they're supposed to have been in the first place.



Now, however, whenever a servlet in the Web application changes and the container does its auto-reloading magic, weird ClassCastException errors start to creep in. Say you have code like this:










LoginBean lb = (LoginBean)session.getAttribute("loginBean");

if (lb == null)

{

. . .

}




The servlet container keeps complaining that the object returned from the session.getAttribute call isn't, in fact, a LoginBean. You, meanwhile, are pulling your hair out in large clumps because you know for a fact that it is a LoginBean; you just put it in there on the previous page. Worse, when you try to verify that it is a LoginBean, by calling getClass.getName on the object returned, it shows up as LoginBean. Better yet, if you restart the server, the problem appears to go away completely, until the next time you change the servlet. You quietly contemplate retirement.



The problem here is one of ClassLoaders, not code.



To be particular, ClassLoaders are used within the Java environment not only as a loading mechanism but also a mechanism for establishing isolation boundaries between disparate parts of the code�in English, that means that my Web application shouldn't conflict in any way with your Web application, despite the fact that we're running in the same servlet container. I can name my servlets and beans by names that exactly match those in your Web application, and the two applications should run side by side without any problems.



To understand why ClassLoaders are used to provide this isolation behavior, we have to establish some fundamental rules about ClassLoaders.





  1. ClassLoaders form hierarchies.

    When a ClassLoader is created, it always defaults to a "parent" ClassLoader. By default, the JVM starts with three: a bootstrap loader written in native code to load the runtime library (rt.jar), a URLClassLoader pointing to the extensions directory (usually jre/lib/ext in your JRE directory) called the extensions loader, and another URLClassLoader pointing to the elements dictated by the java.class.path system property, which is set via the CLASSPATH environment variable. Containers like EJB or servlet containers will augment this by putting their own ClassLoaders into the tree, usually toward the bottom or leaf nodes. The hierarchy is used to delegate loading of code to the parent before trying to load code from the child ClassLoader, thus giving the bootstrap loader first chance at loading code. If a parent has already loaded a class, no further attempt at loading the class is made.



  2. Classes are loaded lazily.

    The Java Virtual Machine, like most managed environments, wants to minimize the work it needs to do at startup and won't load a class until it becomes absolutely necessary. This means that at any given point, a class may suddenly come across a method it hasn't invoked before, which in turn references a class that hasn't been loaded yet. This triggers the JVM to load that class, which brings up the next rule of ClassLoaders. By the way, this is why old-style non-JNDI JDBC code needed to "bootstrap" the driver into the JVM. Without that, the actual driver would never be loaded since your JDBC code traditionally doesn't directly reference the driver-specific classes, nor does JDBC itself.



  3. Classes are loaded by the ClassLoader that loaded the requesting class.

    In other words, if a servlet uses the class PersonBean, then when PersonBean needs to be loaded the JVM will go back to the ClassLoader that loaded the servlet. Certainly, if you have a reference to a ClassLoader, you can explicitly use that ClassLoader instance to load a class, but this is the exception, not the rule.



  4. Classes are uniquely identified within the JVM by a combination of class name, package name, and ClassLoader instance that loaded them.

    This rule means that a given class can be loaded twice into the VM, as long as the class is loaded through two different ClassLoaders. This also implies that when the JVM checks a castclass operation (such as the LoginBean cast earlier), it checks the two objects to see if they share any common ancestry from a ClassLoader perspective. If not, a ClassCastException is thrown. This also implies that since the two classes are considered to be unique, each has its own copy of static data.



Having established these rules, let's take a look at what this means, practically, to enterprise Java developers.



Isolation



In order to support the notion of isolation between Web applications, the servlet container creates a ClassLoader instance around each Web application, thereby effectively preventing the "leakage" of classes from one Web application to the other. Many servlet containers provide a "common" directory in which to put .jar files that can be seen by all Web applications as well. Therefore, most servlet containers have a ClassLoader hierarchy that looks, at a minimum, like the one shown in Figure 8.1.



Figure 8.1. ClassLoader isolation





Notice that this implies that a LoginBean class, deployed as part of WebAppA.war and also as part of WebAppB.war, is loaded twice into the servlet container: once through WebApp A and once through WebApp B. What happens if LoginBean has static data members?



The answer is simple, rooted in rule 4 mentioned earlier: each LoginBean is uniquely identified by its class name and package name (LoginBean) and the ClassLoader instance that loaded it. Each Web application is loaded by a separate ClassLoader instance. Therefore, these are two entirely orthogonal classes that maintain entirely separate static data.



This has profound implications for servlet developers�consider, for example, the ubiquitous hand-rolled ConnectionPool class. Typically this class is written to maintain a static data member that holds the Connection instances the pool wants to hand out. If we amend Figure 8.1, putting in ConnectionPool in place of LoginBean, the unsuspecting developer has three ConnectionPool instances going, not one, despite the fact that the pool itself was maintained as static data. To fix this, put the ConnectionPool class in a jar or ClassLoader higher in the ClassLoader hierarchy. Or, better yet, rely on the JDBC 3.0�compliant driver to handle Connection pooling entirely (see Item 73).



Moral: Singletons don't work unless you know where you are in the ClassLoader hierarchy.



Versioning



In order to support hot reloading of servlets, the typical servlet container creates a ClassLoader instance each time a Web application changes�so, for example, when a developer recompiles a servlet and drops it into the Web application's WEB-INF/classes directory, the servlet container notes that the change has taken place and creates an entirely new ClassLoader instance. It reloads all the Web application code through that ClassLoader instance and uses those classes to answer any new incoming requests. So now the picture looks something like Figure 8.2.



Figure 8.2. ClassLoaders running side by side to provide versioning





Let's complicate the picture somewhat: assume SampleWebApp, version 1, created a LoginBean and stored it into session space. The LoginBean was created as part of SampleWebApp-v1's ClassLoader, so the class type (the unique tuple of package name, class name, and ClassLoader instance) associated with this object is (unnamed package)/LoginBean/SampleWebApp-v1. So far, so good.



Now the developer touches a servlet (or hot deploys a new version of the Web application), which in turn forces the servlet container to reload the servlet, which in turn requires a new ClassLoader instance. You can see what's coming next. When the servlet tries to extract the LoginBean object out of session space, the class types don't match: the LoginBean instance is of a type loaded by ClassLoader 1 but is asked to cast to a class loaded by ClassLoader 2. Even though the classes are identical (LoginBean in both cases), the fact that they were loaded by two different ClassLoader instances means they are entirely different classes, and a ClassCastException is thrown.



At first, it seems entirely arbitrary that the same class loaded by different ClassLoaders must be treated as an entirely different class. As with most things Java does, however, this is for a good reason. Consider some of the implications if the classes are, in fact, different. Suppose the VM allows the cast above to take place, but the new version of LoginBean doesn't, in fact, have a method that the old version has, or doesn't implement an interface the old version does. Since we allowed the cast, what should the VM do when code calls the old method?



Some have suggested that the VM should compare the binary layouts of the two classes and allow the cast based on whether the two classes are, in fact, identical. This implies that every cast, not to mention every reference assignment, within the system would have to support this binary comparison, which would seriously hurt performance. In addition, rules would need to develop to determine when two classes were "identical"�if we add a method, is that an acceptable change? How about if we add a field?



The unfortunate realization is that the pairing of class name and ClassLoader instance is the simplest way to determine class uniqueness. The goal, then, is to work with it, so that those annoying ClassCastException errors don't creep up.



One approach is complete ignorance: frequently, when faced with this problem, we try to solve it by bypassing it entirely. In the interests of getting the code to work, we put the LoginBean class somewhere high enough in the ClassLoader hierarchy that it isn't subject to reloading, usually either on the container's CLASSPATH or in the container's JVM extensions directory (see Item 69). Unfortunately, that means that if LoginBean does change, the server has to be bounced in order to reload it. This can create some severe evolution problems: WebApps A and B depend on version 1 of LoginBean, but WebApp C needs version 2, which isn't backwards-compatible. If LoginBean is deployed high in the hierarchy, WebApps A and B will suddenly "break" when WebApp C is deployed. This is a great way to see how many consecutive 24-hour debugging sessions you can handle.



Worse yet, deploying LoginBean this high in the ClassLoader hierarchy means that other Web applications might also be able to see LoginBean, even if they shouldn't. So WebApp D, which doesn't use LoginBean at all, could still see the class and potentially use it as a means of attack against WebApps A, B, or C. This is dangerous if your code is to be hosted on a server you share with others (as in the case of an ISP); other applications could now use Reflection against your LoginBean class and maybe discover a few things you'd prefer to keep secret.



Don't despair�all isn't lost. A couple of tricks are available.



Trick Number One is to define an interface, say, LoginBean, and put that high in the ClassLoader hierarchy, where it won't get loaded by the ClassLoader that loads the Web application. An implementation of that interface, LoginBeanImpl, resides in the Web application, and any code that wants to use a LoginBeanImpl instead references it as a LoginBean (the interface). When the Web application is bounced, the assignment of the "old" LoginBeanImpl is being assigned to the interface LoginBean, which wasn't reloaded, so the assignment succeeds and no exception is thrown. The drawback here is obvious: every sessionable object needs to be split into interface and implementation. (This is partly why EJB forces this very state of affairs for each bean: this way, EJB can shuffle ClassLoader instances around without worrying about ClassCastException errors. Not coincidentally, this is also why EJB instances are forbidden to have static data members, since statics don't play well with interfaces.)



Trick Number Two is to store only objects from the Java base runtime library (e.g., String and Date objects) into session space, rather than custom-built objects. The Java Collections classes, Map in particular, can be quite useful here as "pseudo-classes" for holding data. Since the bootstrap ClassLoader loads the runtime library, these can never be hot-versioned and therefore won't be subject to the same problems. The danger here, however, is that the Java Collections classes will take instances of anything, meaning the temptation to just stick everything into session becomes harder to resist (see Item 39).



Trick Number Three assumes you want or must have your custom objects but can't take the time to break them into interface and implementation pieces. In that case, mark the class as Serializable, then use Java Object Serialization to store a serialized copy of the objects into the session as a byte array. Because Java Object Serialization more or less ignores this problem, and because byte arrays are implicitly themselves Serializable (thus satisfying the Servlet 2.2 Specification requirement/suggestion that only Serializable objects be stored into session), you can store the serialized version of the object, rather than the standard object type, and instead of assigning the session object back to the LoginBean reference, deserialize it:










// Store in session

LoginBean lb = new LoginBean(...);

try

{

ByteArrayOutputStream baos = new ByteArrayOutputStream();

ObjectOutputStream oos = new ObjectOutputStream(baos);

Oos.writeObject(lb);

byte[] bytes = baos.toByteArray();

session.setAttribute("loginBean", bytes);

}

catch (Exception ex)

{

// Handle exception

}



// Somewhere else, retrieve LoginBean from session

LoginBean lb = null;

try

{

byte[] bytes = (byte[])session.getAttribute("loginBean");

ByteArrayInputStream bais = new

ByteArrayInputStream(bytes);

ObjectInputStream ois = new ObjectInputStream(bais);

lb = (LoginBean)ois.readObject();

}

catch (Exception ex)

{

// Handle exception

}




This third approach carries a significant cost, however: it's somewhat expensive to serialize and deserialize objects, even when you follow Item 71. Fortunately, you shouldn't have to use it very often. Another problem, specifically related to servlet containers, is that byte arrays aren't JavaBean-compliant and so can't be used as targets of the standard JSP bean tags (useBean, getProperty, and setProperty).



The key here, ultimately, is to know exactly how your Java environment sets up ClassLoader relationships, and then work with them, rather than against them. If your environment doesn't tell you straight out, some judicious exploration, via calls to getClass().getClassLoader() and walking the hierarchy, is in order. Failure to do so means mysterious ClassCastException errors when you least want to see them�in production.













     < Day Day Up > 



    No comments:

    Post a Comment