Saturday, October 24, 2009

Statelessness


3 4



Statelessness



I don't believe I have said anything about stateless objects so far. Yet you might have heard somewhere that MTS objects are stateless. Let me put it in the clearest possible terms: this is nonsense. An object is formally defined as an entity that has behavior, identity, and—state. A stateless object therefore is not an object at all. This is not a word game. The object-oriented discipline has not evolved over decades to this point just to discover that state—a fundamental object aspect—simply could be thrown out because some new technology found it difficult to accommodate. The maintainability and complexity reduction benefits of object orientation would be hampered severely, and any such technology would be doomed. Fortunately (and expectedly), MTS and COM+ are not doomed. They have absolutely no problem with state. In this section, I will try to explain what is behind the misleading claim that MTS object are stateless and what statelessness really means in the context of
transactional COM+ objects.



It is true that MTS enforces transaction isolation, as we discovered in the previous section. Object state is jettisoned at the boundary of each transaction in which the object participates. This is a very sane model and always should be the practice in enterprise object programming, independent of any enforcing technology. The result of requesting a set of changes on shared state should depend only on the input to the request, the behavior of the objects performing the changes, and the state of the shared data. The state transformation can go through a number of stages, where each successive stage is dependent on objects having held their state from previous stages. Object state is therefore an important part of the transactional model. It is just bounded by transaction scope. In particular, the result of shared state transformation through an object layer should not depend on state held by these objects from requests that were made as part of prior transactions. Violation of this rule makes the
semantics of business objects unpredictable—very much like any violation of transaction isolation.



In many cases where you read about stateless MTS objects, the term stateless refers to this weak form of statelessness. (But in this case, I think that the term should not be used at all because of the confusion it has caused.) References to statelessness in the Platform SDK documentation fall into this category. Weak statelessness is a product of COM+'s disciplined transaction integration model, not its scalability model. Be aware that the weak statelessness constraint applies only to objects when they attempt to force a transaction boundary through calls to SetComplete and SetAbort. Otherwise, even transactional objects maintain their state until they are released, just like any other object does. Observe that the transaction boundary adjusts itself to the object lifetime boundary, and not the other way around. Transactions are forced to stay alive until their root objects die a natural death or explicitly request to be deactivated. Transactional MTS objects are not stateless;
instead, they force a transaction boundary to occur when they naturally are willing to give up their state—after completing their work by handling a number of stateful method invocations. Transactions do not control object state; instead, object state controls transactions.21



It is also true that MTS objects should not be shared among clients that execute as the result of separate, top-level requests, possibly on separate threads. (See Figure 13-11.) This is a demand of the scalability model, not the transaction model. It is accurate enough: if objects have no state, there is no motivation for sharing them among clients. Therefore, the true, strong form of statelessness naturally enforces the scalability model. Some have suggested applying statelessness throughout to achieve scalability. Such recommendations generally take the form of advising to call either SetComplete or SetAbort before returning from any method. This indeed amounts to giving up state in your objects. Per-instance member data (or module data, as Visual Basic calls it) is useless in such objects. All variable declarations could (and should) occur at method scope. A crucial discovery in object orientation is that grouping functions together with the external data they modify in a package
called an object tremendously enhances design clarity, code simplicity, and maintainability. The removal of all per-instance data members is tantamount to disallowing this packaging and splitting functions from data again. Classes that do this cannot be used to leverage many of the benefits of object orientation.



Modeling tends to be difficult with such classes. And all data necessary to accomplish a task performed by objects of the class must be passed at once. This makes function signatures long, hard to read, and unwieldy for callers. Optional parameters become a nightmare, especially for clients implemented in languages that do not support the named parameter feature. State that normally could be built up conveniently inside an object or even inside the RDBMS through multiple method invocations and separate properties now must be passed all at once. Often this causes the loss of type safety when safe arrays are used to ship data aggregates. In other cases, new data structures need to be invented to communicate aggregate data to the object, even though RDBMS tables already could hold this data, if such tables could be used to store this data piece by piece, one row at a time, with multiple method invocations building it up in the scope of a single transaction. (This state of affairsof course means that the
object cannot be deactivated across method invocations.) In other cases, data structures might be used only within an object, for gathering all information necessary to perform a business function, where this information gathering occurs repeatedly across method invocations. But these data structures now have to be made available externally so that clients can populate them and pass them to the object all at once. This is a clear violation of encapsulation and leads to reduced maintainability. Of course, it is no surprise to find that the cost of true statelessness is tremendously high.



Consider our CUserTracker example. Say we wanted to allow logging clients to attach some contextual information to each user tracking log entry. Callers might want to record such information as request parameters passed to them, system load conditions, the state of the data the callee is supposed to work on, and so on. Because contextual information for each log entry might consume multiple rows, we arrange the information in the RDBMS in a separate table with a foreign key relationship to the user tracking (primary) table. Columns in the secondary table include date and time information, as well as some string and numeric types because callers commonly record such data. Now how do we offer an interface to this call context facility in CUserTracker? If CUserTracker needs to be stateless, we are in trouble. We might have to force callers to pass the entire kit and caboodle in a gigantic safe array of strings. This is not type safe and is very inconvenient for C++ callers. But if CUserTracker can maintain
state across method invocations, we can equip it with properties that correspond to the columns of the secondary table, the one that holds the request context information. CUserTracker then would have a method AddRequestContextInfoSet. Every time the method is called, the content of all the properties would be added to an internal data structure or directly to the secondary table. The invocation returns having called DisableCommit, and CUserTracker will call SetComplete only after the caller signals that the data set is complete and all information for the primary table entry is present. This is a significantly easier model, it is type safe, and the internals of CUserTracker remain encapsulated.



The more input an object needs to do its job, the messier things get if the object does not hold state across method invocations. In general, object statelessness has no performance benefit. The work of assembling the data has to be done somewhere. If it is not assembled in the object, calls to do this work just bunch up in the object's clients. This work involves building up arrays and user-defined types of various sorts. But this should be object work, not client work! In addition, the object often can perform this work more efficiently because it can send it off to the RDBMS. Clients do not have this option. Because of the general difficulty of a stateless model and the encapsulation dangers inherent in it, as a general rule, I strongly advise against forbidding objects from holding state. In my opinion, it is like throwing the baby (a sane programming model, readability, maintainability, and encapsulation) out with the bathwater (object sharing and lack of scalability). In most scalable
application projects, doing so is completely unnecessary because the bathwater can be dumped out all by itself.



Finally, there is some confusion regarding which technologies encourage which programming models. It is indeed true that for scalable distributed applications network round-trips should be minimized. This practice is mandated by the first rule of scalability, which says that computing resources must be engaged efficiently. For DCOM applications, efficiency might very well mean calling a single method instead of calling three. Statelessness contributes to a better network utilization profile. But network utilization has absolutely nothing to do with COM+ (other than DCOM), MTS, the single concurrent client scalability model, or transactions. Network separation indeed can justify cumbersome method signatures and other elaborate techniques such as marshal-by-value (discussed in Chapter 8), though I am not sure about total statelessness even in that case. A call model that saves on network round-trips often is required with remote components, but complete statelessness rarely furthers the cause of
efficiency. Analyze impact on network traffic before you design the interface of an object that often will be called remotely. Then decide which optimizations should be applied. Don't optimize where optimization will serve only to compromise maintainability. Always remember : "Premature optimization is the root of all evil" (Donald E. Knuth).



In many scalable projects, the business objects are not separated physically from their clients. This is especially true for Web projects, in which the client is the Web server. In this case, business objects generally are invoked in process, so there is absolutely no justification for making these business objects stateless. Things might be a bit different in your project. Fat clients might be separated from business objects by a network boundary, but they might need to exercise business object interfaces frequently. If this is the case, have a look at Wade Baron's four-tier application architecture in Chapter 11. Wade shows us an appropriate architecture for such conditions that does not burden clients with stateless business objects and does not compromise scalability.



Saying that remote objects should be stateless (in the strict, original sense of the term) goes a little far in my opinion, but it is motivated by a desire to cut down on network traffic. Saying the same thing about transactional COM+ objects simply makes no sense. The COM+ programming model offers the EnableCommit and DisableCommit methods for a reason. Even COM+'s new auto-done property, which changes a method's default status from EnableCommit to SetComplete (or SetAbort if you return an error), applies only at the method level, not the component or even the interface level. The COM+ programming model, including the scalability and transaction integration models, is rich, expressive, and powerful. Don't sell it short by buying into the myth that making your "objects" stateless somehow will improve scalability or help your project in any way.



No comments:

Post a Comment