Saturday, October 24, 2009

Chapter 1. Introduction to Game AI











 < Day Day Up > 







Chapter 1. Introduction to Game AI



In the broadest sense, most games incorporate some form of artificial intelligence (AI). For instance, developers have used AI for years to give seemingly intelligent life to countless game characters, from the ghosts in the classic arcade game Pac Man to the bots in the first-person shooter Unreal, and many others in between. The huge variety of game genres and game characters necessitates a rather broad interpretation as to what is considered game AI. Indeed, this is true of AI in more traditional scientific applications as well.



Some developers consider tasks such as pathfinding as part of game AI. Steven Woodcock reported in his "2003 Game Developer's Conference AI Roundtable Moderator's Report' that some developers even consider collision detection to be part of game AI[*]. Clearly, some wide-ranging interpretations of game AI exist.



We're going to stick with a broad interpretation of game AI, which includes everything from simple chasing and evading, to pattern movement, to neural networks and genetic algorithms. Game AI probably best fits within the scope of weak AI (see the sidebar "Defining AI"). However, in a sense you can think of game AI in even broader terms.



In games, we aren't always interested in giving nonplayer characters human-level intellect. Perhaps we are writing code to control nonhuman creatures such as dragons, robots, or even rodents. Further, who says we always have to make nonplayer characters smart? Making some nonplayer characters dumb adds to the variety and richness of game content. Although it is true that game AI is often called upon to solve fairly complex problems, we can employ AI in attempts to give nonplayer characters the appearance of having different personalities, or of portraying emotions or various dispositions�for example, scared, agitated, and so on.



Defining AI



The question "what is artificial intelligence?" is not easy to answer. If you look up artificial intelligence in a dictionary, you'll probably find a definition that reads something like this: "The ability of a computer or other machine to perform those activities that are normally thought to require intelligence." This definition comes from The American Heritage Dictionary of the English Language, Fourth Edition (Houghton Mifflin Company). Still other sources define artificial intelligence as the process or science of creating intelligent machines.



From another perspective it's appropriate to think of AI as the intelligent behavior exhibited by the machine that has been created, or perhaps the artificial brains behind that intelligent behavior. But even this interpretation is not complete. To some folks, the study of AI is not necessarily for the purpose of creating intelligent machines, but for the purpose of gaining better insight into the nature of human intelligence. Still others study AI methods to create machines that exhibit some limited form of intelligence.



This begs the question: "what is intelligence?" To some, the litmus test for AI is how close it is to human intelligence. Others argue that additional requirements must be met for a machine to be considered intelligent. Some people say intelligence requires a conscience and that emotions are integrally tied to intelligence, while others say the ability to solve a problem requiring intelligence if it were to be solved by a human is not enough; AI must also learn and adapt to be considered intelligent.



AI that satisfies all these requirements is considered strong AI. Unlike strong AI, weak AI involves a broader range of purposes and technologies to give machines specialized intelligent qualities. Game AI falls into the category of weak AI.




The bottom line is that the definition of game AI is rather broad and flexible. Anything that gives the illusion of intelligence to an appropriate level, thus making the game more immersive, challenging, and, most importantly, fun, can be considered game AI. Just like the use of real physics in games, good AI adds to the immersiveness of the game, drawing players in and suspending their reality for a time.



[*]

[*] Steven Woodcock maintains an excellent Web site devoted to game AI at http://www.gameai.com.















     < Day Day Up > 



    Chapter 12. Working with Java Streams




    I l@ve RuBoard









    Chapter 12. Working with Java Streams



    Terms in This Chapter



    • Abstract class


    • ASCII


    • autoexec.bat file


    • Buffering


    • Caching


    • Callable object


    • Canonical path


    • Chaining


    • Character/byte stream


    • Class hierarchy


    • Concrete class


    • Current directory


    • Debug utility


    • Design pattern


    • File descriptor


    • File path (relative/absolute)


    • Flag


    • Helper class


    • Interface


    • Java networking APIs


    • Parameter


    • Parent directory


    • Polymorphism


    • Path separator


    • Prefetching


    • Separator


    • Single inheritance


    • Slice


    • Stream (binary/text)


    • Source code


    • Source file


    • Token


    • Unicode



    Streams are the Java programming language's way to support I/O. A stream can represent a file, a network connection, or the access of a Web site. Learning to deal with Java streams is essential for understanding Java's networking APIs.


    Most of the time conversion to and from the Java type system is transparent. When it isn't, this chapter will demonstrate how to do low-level type conversion straightforwardly.




    As If One Way Weren't Bad Enough


    The joke is that there are always two ways of doing things in Jython: the Python way and the Java way. For example, if you use Python to prototype for Java applications, you need to know how Java does it. You also need Java streams to do various forms of Java I/O, such as networking APIs.









      I l@ve RuBoard



      16.12 C- Versus C++- Style I/O




      I l@ve RuBoard










      16.12 C- Versus C++- Style I/O



      Both C- and C++- style I/O have their own features and quirks. In
      this section we'll discuss some of the differences
      between these two systems.




      16.12.1 Simplicity



      Let's say we want to write a simple checkbook
      program. We need to print an account statement. We need some code to
      print each line of the account statement (date, check number, payee,
      and amount).



      In C the print statement looks like:



      std::printf("%2d/%2d/%02d %4d: %-40s %f6.2\n",
      check.date.month, check.date.day, check.date.year,
      check.number, check.payee, check.amount);


      In C++ the print statement is:



      std::cout << setw(2) << check.date.month << '/' <<
      setw(2) << check.date.day << '/' <<
      setw(2) << setfill('0') << check.date.year << ' ' <<
      setw(4) << check.number << ':' <<
      setw(40) << setiosflags(std::ios::left) <<
      check.payee <<
      resetiosflags(std::ios::left) << ' ' <<
      setw(6) << setprecision(2) <<
      setiosflags(std::ios::fixed) <<
      check.amount <<
      setw(0) << '\n';


      From this example we can clearly see that the C-style I/O is more
      compact. It is not clear that compact is better. This author prefers
      the compact style of the C std::printf functions,
      while many others prefer the verbosity of the C++ I/O system. Besides
      if you're C++ programmers, you probably should
      program in C++ and not bring legacy I/O systems into the mix.



      Although it looks like C is more compact, things are not as obvious
      as they look. A well-designed date class would
      have its own output operator. Thus we can simplify our C++ code down
      to:



          std::cout << check.date <<
      setw(4) << check.number << ':' <<
      setw(40) << setiosflags(std::ios::left) <<
      check.payee <<
      resetiosflags(std::ios::left) << ' ' <<
      setw(6) << setprecision(2) <<
      setiosflags(std::ios::fixed) <<
      check.amount <<
      setw(0) << '\n';


      But this assumes that only the date has an output
      operator. If we designed our check class correctly, it should have
      one as well. This means that our code now has been simplified down
      to:



          std::cout << check << '\n';


      Now this doesn't mean that complexity has gone away.
      It's merely been moved from outside the class to
      inside it.



      This example serves to illustrate one of the key differences between
      C and C++. In C-style I/O, the information on how to manipulate the
      data (in this case, how to print it) is contained outside the data
      itself. In C++ it's possible to put the manipulation
      code and the data into a single class.



      If we are writing out our checkbook information in only one place,
      the C version may be simpler and easier to work with. So for simple
      programs, you may want to consider using C-style I/O. But suppose
      that we wanted to print out the data to a number of places. If we
      used C-style I/O, we would have to replicate our format code all over
      the place or create a small function to do the printing. With C++'s
      classes, we can keep the printing information in one logical place.
      (As a person who's just had to rewrite all the
      C-style format statements in a rather large piece of code, I can tell
      you that putting the formatting information in one place, the object,
      has some advantages.)





      16.12.2 Reliability



      When you use C++-style I/O, the system automatically detects the type
      of the variable and performs the approbate conversion.
      It's impossible to get the types wrong.



      With C-style I/O, it's easy to get the arguments to
      a std::printf mixed up, resulting in very strange
      results when you run the program. What's worse is
      that most compilers do not check std::printf calls
      and warn you if you make a mistake.



      One special C I/O function you should be aware of is
      std::gets. This function gets a
      line from standard input with no
      bounds-checking
      . So:



      std::gets(line);


      is exactly like:



      std::fgets(line, INFINITY, stdin);


      If there are too many characters in an input line, the
      std::gets function will cause a buffer overflow
      and trash memory. This single function and its lack of
      bounds-checking has to be responsible for more crashes and security
      holes than any other single C function.[2] You should never
      use it. You can get in enough trouble with the more reliable C
      functions without having to play Russian roulette with this one.


      [2] As I am
      writing this, Microsoft has just released a security patch to Windows
      XP to fix a buffer overflow bug.





      16.12.3 Speed



      I've done some benchmarks on C and C++ I/O for
      binary files. In general I've found the C I/O to be
      much faster. That's because the C I/O system is less
      flexible and has to deal with less overhead than the C++ system.










      I'm not talking about formatted I/O, just raw binary
      I/O. If you do formatted I/O in either system, you can expect your
      speed to go down tremendously. It's the single
      slowest system in the entire C and C++ library.






      16.12.4 Which Should You Use?



      Which I/O system is best? That depends on a large number of factors.
      First of all, any system you know is always going to be easier to use
      and more reliable than a system you don't know.



      However, if you know both systems, C-style I/O is good for the simple
      stuff. If you're not doing anything fancy with
      classes and just want to write simple formatted reports, the C I/O
      system will do the job. However, for larger jobs, the C++-object
      oriented system with its object-oriented I/O system handles
      complexity and organizes complex information much better than C-style
      I/O.



      But if you're learning I/O for the first time, I
      suggest that you stick with one I/O system, the C++ one. Learn
      C-style I/O only if you're forced to. (Say, for
      instance, you have to maintain some legacy code that uses the old
      C-style system.)










        I l@ve RuBoard



        11.22 Email Notification with Triggers and Alerts



        [ Team LiB ]





        11.22 Email Notification with Triggers and Alerts


        Database triggers can interface with database pipes and alerts. Figure 11-16 illustrates a model in which a database trigger posts a notification that a professor's salary has changed. This is a trigger that fires only from an update of the row. The trigger posts the alert. When the transaction commits, the signal is received by a second Oracle database connection. This connection is represented in Figure 11-16 as PROCESS_ALERTS, which has one purpose, to deliver email.


        Figure 11-16. Trigger Email Notification.


        Because PROCESS_ALERTS runs asynchronously, it has no impact on other database activity. Update transactions to the PROFESSORS table do not wait for an email to be sent.


        We start with developing an interface that will service email requests. This interface will be used by PROCESS_ALERTS. It will accept standard email parameters: sender, receiver, subject, and text. This interface is a package and has the following specification:





        CREATE OR REPLACE PACKAGE email_pkg IS
        PROCEDURE send
        (p_sender IN VARCHAR2, p_recipient IN VARCHAR2,
        p_message IN VARCHAR2, p_subject IN VARCHAR2);
        END email_pkg;

        The next step is to develop an engine that will dedicate itself to servicing signals. This can be a stand-alone procedure. Figure 11-16 shows PROCESS_ALERTS as a stand-alone procedure. At this point we need to consider using a single procedure, and a package could be a better choice.


        We need a procedure to receive alerts and a procedure to send alerts. The sending occurs in the trigger. It seems reasonable to define a package specification to support the send and receive functions. That package specification is shown next.





        CREATE OR REPLACE PACKAGE alerts_pkg IS
        PROCEDURE process_alerts;
        PROCEDURE send_alert(message IN VARCHAR2);
        END alerts_pkg;

        Figure 11-16 is redrawn to show the modified architecture. In Figure 11-17, the trigger calls the SEND_ALERT procedure to post the alert. The code used to receive the alert is in the same package.


        Figure 11-17. Revised Trigger Email Notification.


        The assumption is that the trigger will use the professor's name to construct an email address and pass that address to the procedure in the ALERTS_PKG package. Ideally, the PROFESSORS table would have a column that contains email addresses.


        The database trigger is set up to send an email only when there is a difference in the old and new salary.





        CREATE OR REPLACE TRIGGER professors_aur
        AFTER UPDATE ON professors
        FOR EACH ROW
        WHEN (OLD.SALARY <> NEW.SALARY)
        BEGIN
        alerts_pkg.send_alert(:new.prof_name||'@domain.com');
        END;

        For this model, all interfaces have been shown. We can start looking at the body for the individual packages. The email body is shown here. This body includes global declarations for the SMTP server IP address and port number. This is the mechanism by which the package ALERTS_PKG will deliver email for each alert received. The body of ALERTS_PKG will include a call to the email SEND procedure.





        CREATE OR REPLACE PACKAGE BODY email_pkg IS
        g_smtp_server CONSTANT VARCHAR2(20) := '00.00.00.00';
        g_smtp_server_port CONSTANT PLS_INTEGER := 25;

        PROCEDURE send
        (p_sender IN VARCHAR2, p_recipient IN VARCHAR2,
        p_message IN VARCHAR2, p_subject IN VARCHAR2)
        IS
        mail_conn utl_smtp.connection;
        BEGIN
        mail_conn := utl_smtp.open_connection
        (g_smtp_server, g_smtp_server_port);

        utl_smtp.helo (mail_conn, g_smtp_server);
        utl_smtp.mail (mail_conn, p_sender);
        utl_smtp.rcpt (mail_conn, p_recipient);
        utl_smtp.open_data(mail_conn);

        utl_smtp.write_data
        (mail_conn,'From: "'||p_sender
        ||'" <'||p_sender||'>'||utl_tcp.CRLF);
        utl_smtp.write_data
        (mail_conn,'To: "'||p_recipient
        ||'" <'||p_recipient||'>'||utl_tcp.CRLF);
        utl_smtp.write_data
        (mail_conn, 'Subject: '
        ||p_subject||utl_tcp.CRLF);
        utl_smtp.write_data
        (mail_conn, utl_tcp.CRLF||p_message);
        utl_smtp.close_data(mail_conn);
        utl_smtp.quit (mail_conn);
        END send;
        END email_pkg;

        The package body for the sending and receiving of alerts is shown next. The subprogram to receive alerts is coded to wait for three alerts; each wait includes a 10-sec timer. The loop also terminates when it receives an alert message of "END." The alert device name is "email_notification."


        For an anachronous application, a separate process that runs in the background will invoke PROCESS_ALERTS. Locally, PROCESS_ALERTS can be run from SQL*Plus. As coded here, it will deliver the first three emails that result from updates to the PROFESSORS table.





        CREATE OR REPLACE PACKAGE BODY alerts_pkg IS
        PROCEDURE process_alerts
        IS
        professor_email VARCHAR2(100);
        status INTEGER;
        BEGIN
        dbms_alert.register('email_notification');

        FOR I IN 1..3 LOOP

        dbms_alert.waitone


        (name => 'email_notification',
        message => professor_email,
        status => status,
        timeout => 10);

        IF status = 0 THEN
        EXIT WHEN professor_email = 'END';

        email_pkg.send
        (p_sender=>'admin@school.com',
        p_recipient=>professor_email,
        p_subject=>'Salary',
        p_message=>'Salary has changed');
        END IF;
        END LOOP;
        END process_alerts;

        PROCEDURE send_alert(message IN VARCHAR2) IS
        BEGIN
        dbms_alert.signal('email_notification', message);
        END send_alert;
        END alerts_pkg;




          [ Team LiB ]



          Item 70: Recognize ClassLoader boundaries











           < Day Day Up > 





          Item 70: Recognize ClassLoader boundaries



          The ClassLoader architecture owns responsibility for the most fundamental task of the JVM, that of loading code into the JVM and making it available for execution. Along the way, it also defines isolation boundaries so that classes of similar package/class naming won't necessarily clash with one another. In doing so, ClassLoaders introduce phenomenal flexibility into a Java server-based environment and, as a result, phenomenal complexity. Dealing with ClassLoader relationships drives developers nuts�just when we think we've got it all figured out, something else creeps in and throws everything up in the air again. Despite our desire to bury our collective head in the sand and pretend that ClassLoaders will go away if we wish hard enough, the brutal fact remains that enterprise Java developers must understand the ClassLoader relationships within their server product of choice.



          Consider an all-too-common scenario: a servlet Web application stores Beans (not the EJB kind) in user session space for use during later processing as part of the Web application�classic MVC/Model 2 architecture. At first, the Beans were deployed in a separate .jar file to the CLASSPATH of the server product itself�it seemed simpler that way. Unfortunately, doing so meant that the Beans accidentally clashed with other Beans of the same name (how many LoginBean classes can we count?) in a different Web application, so the Beans had to move into the Web application's WEB-INF/lib directory, where they're supposed to have been in the first place.



          Now, however, whenever a servlet in the Web application changes and the container does its auto-reloading magic, weird ClassCastException errors start to creep in. Say you have code like this:










          LoginBean lb = (LoginBean)session.getAttribute("loginBean");

          if (lb == null)

          {

          . . .

          }




          The servlet container keeps complaining that the object returned from the session.getAttribute call isn't, in fact, a LoginBean. You, meanwhile, are pulling your hair out in large clumps because you know for a fact that it is a LoginBean; you just put it in there on the previous page. Worse, when you try to verify that it is a LoginBean, by calling getClass.getName on the object returned, it shows up as LoginBean. Better yet, if you restart the server, the problem appears to go away completely, until the next time you change the servlet. You quietly contemplate retirement.



          The problem here is one of ClassLoaders, not code.



          To be particular, ClassLoaders are used within the Java environment not only as a loading mechanism but also a mechanism for establishing isolation boundaries between disparate parts of the code�in English, that means that my Web application shouldn't conflict in any way with your Web application, despite the fact that we're running in the same servlet container. I can name my servlets and beans by names that exactly match those in your Web application, and the two applications should run side by side without any problems.



          To understand why ClassLoaders are used to provide this isolation behavior, we have to establish some fundamental rules about ClassLoaders.





          1. ClassLoaders form hierarchies.

            When a ClassLoader is created, it always defaults to a "parent" ClassLoader. By default, the JVM starts with three: a bootstrap loader written in native code to load the runtime library (rt.jar), a URLClassLoader pointing to the extensions directory (usually jre/lib/ext in your JRE directory) called the extensions loader, and another URLClassLoader pointing to the elements dictated by the java.class.path system property, which is set via the CLASSPATH environment variable. Containers like EJB or servlet containers will augment this by putting their own ClassLoaders into the tree, usually toward the bottom or leaf nodes. The hierarchy is used to delegate loading of code to the parent before trying to load code from the child ClassLoader, thus giving the bootstrap loader first chance at loading code. If a parent has already loaded a class, no further attempt at loading the class is made.



          2. Classes are loaded lazily.

            The Java Virtual Machine, like most managed environments, wants to minimize the work it needs to do at startup and won't load a class until it becomes absolutely necessary. This means that at any given point, a class may suddenly come across a method it hasn't invoked before, which in turn references a class that hasn't been loaded yet. This triggers the JVM to load that class, which brings up the next rule of ClassLoaders. By the way, this is why old-style non-JNDI JDBC code needed to "bootstrap" the driver into the JVM. Without that, the actual driver would never be loaded since your JDBC code traditionally doesn't directly reference the driver-specific classes, nor does JDBC itself.



          3. Classes are loaded by the ClassLoader that loaded the requesting class.

            In other words, if a servlet uses the class PersonBean, then when PersonBean needs to be loaded the JVM will go back to the ClassLoader that loaded the servlet. Certainly, if you have a reference to a ClassLoader, you can explicitly use that ClassLoader instance to load a class, but this is the exception, not the rule.



          4. Classes are uniquely identified within the JVM by a combination of class name, package name, and ClassLoader instance that loaded them.

            This rule means that a given class can be loaded twice into the VM, as long as the class is loaded through two different ClassLoaders. This also implies that when the JVM checks a castclass operation (such as the LoginBean cast earlier), it checks the two objects to see if they share any common ancestry from a ClassLoader perspective. If not, a ClassCastException is thrown. This also implies that since the two classes are considered to be unique, each has its own copy of static data.



          Having established these rules, let's take a look at what this means, practically, to enterprise Java developers.



          Isolation



          In order to support the notion of isolation between Web applications, the servlet container creates a ClassLoader instance around each Web application, thereby effectively preventing the "leakage" of classes from one Web application to the other. Many servlet containers provide a "common" directory in which to put .jar files that can be seen by all Web applications as well. Therefore, most servlet containers have a ClassLoader hierarchy that looks, at a minimum, like the one shown in Figure 8.1.



          Figure 8.1. ClassLoader isolation





          Notice that this implies that a LoginBean class, deployed as part of WebAppA.war and also as part of WebAppB.war, is loaded twice into the servlet container: once through WebApp A and once through WebApp B. What happens if LoginBean has static data members?



          The answer is simple, rooted in rule 4 mentioned earlier: each LoginBean is uniquely identified by its class name and package name (LoginBean) and the ClassLoader instance that loaded it. Each Web application is loaded by a separate ClassLoader instance. Therefore, these are two entirely orthogonal classes that maintain entirely separate static data.



          This has profound implications for servlet developers�consider, for example, the ubiquitous hand-rolled ConnectionPool class. Typically this class is written to maintain a static data member that holds the Connection instances the pool wants to hand out. If we amend Figure 8.1, putting in ConnectionPool in place of LoginBean, the unsuspecting developer has three ConnectionPool instances going, not one, despite the fact that the pool itself was maintained as static data. To fix this, put the ConnectionPool class in a jar or ClassLoader higher in the ClassLoader hierarchy. Or, better yet, rely on the JDBC 3.0�compliant driver to handle Connection pooling entirely (see Item 73).



          Moral: Singletons don't work unless you know where you are in the ClassLoader hierarchy.



          Versioning



          In order to support hot reloading of servlets, the typical servlet container creates a ClassLoader instance each time a Web application changes�so, for example, when a developer recompiles a servlet and drops it into the Web application's WEB-INF/classes directory, the servlet container notes that the change has taken place and creates an entirely new ClassLoader instance. It reloads all the Web application code through that ClassLoader instance and uses those classes to answer any new incoming requests. So now the picture looks something like Figure 8.2.



          Figure 8.2. ClassLoaders running side by side to provide versioning





          Let's complicate the picture somewhat: assume SampleWebApp, version 1, created a LoginBean and stored it into session space. The LoginBean was created as part of SampleWebApp-v1's ClassLoader, so the class type (the unique tuple of package name, class name, and ClassLoader instance) associated with this object is (unnamed package)/LoginBean/SampleWebApp-v1. So far, so good.



          Now the developer touches a servlet (or hot deploys a new version of the Web application), which in turn forces the servlet container to reload the servlet, which in turn requires a new ClassLoader instance. You can see what's coming next. When the servlet tries to extract the LoginBean object out of session space, the class types don't match: the LoginBean instance is of a type loaded by ClassLoader 1 but is asked to cast to a class loaded by ClassLoader 2. Even though the classes are identical (LoginBean in both cases), the fact that they were loaded by two different ClassLoader instances means they are entirely different classes, and a ClassCastException is thrown.



          At first, it seems entirely arbitrary that the same class loaded by different ClassLoaders must be treated as an entirely different class. As with most things Java does, however, this is for a good reason. Consider some of the implications if the classes are, in fact, different. Suppose the VM allows the cast above to take place, but the new version of LoginBean doesn't, in fact, have a method that the old version has, or doesn't implement an interface the old version does. Since we allowed the cast, what should the VM do when code calls the old method?



          Some have suggested that the VM should compare the binary layouts of the two classes and allow the cast based on whether the two classes are, in fact, identical. This implies that every cast, not to mention every reference assignment, within the system would have to support this binary comparison, which would seriously hurt performance. In addition, rules would need to develop to determine when two classes were "identical"�if we add a method, is that an acceptable change? How about if we add a field?



          The unfortunate realization is that the pairing of class name and ClassLoader instance is the simplest way to determine class uniqueness. The goal, then, is to work with it, so that those annoying ClassCastException errors don't creep up.



          One approach is complete ignorance: frequently, when faced with this problem, we try to solve it by bypassing it entirely. In the interests of getting the code to work, we put the LoginBean class somewhere high enough in the ClassLoader hierarchy that it isn't subject to reloading, usually either on the container's CLASSPATH or in the container's JVM extensions directory (see Item 69). Unfortunately, that means that if LoginBean does change, the server has to be bounced in order to reload it. This can create some severe evolution problems: WebApps A and B depend on version 1 of LoginBean, but WebApp C needs version 2, which isn't backwards-compatible. If LoginBean is deployed high in the hierarchy, WebApps A and B will suddenly "break" when WebApp C is deployed. This is a great way to see how many consecutive 24-hour debugging sessions you can handle.



          Worse yet, deploying LoginBean this high in the ClassLoader hierarchy means that other Web applications might also be able to see LoginBean, even if they shouldn't. So WebApp D, which doesn't use LoginBean at all, could still see the class and potentially use it as a means of attack against WebApps A, B, or C. This is dangerous if your code is to be hosted on a server you share with others (as in the case of an ISP); other applications could now use Reflection against your LoginBean class and maybe discover a few things you'd prefer to keep secret.



          Don't despair�all isn't lost. A couple of tricks are available.



          Trick Number One is to define an interface, say, LoginBean, and put that high in the ClassLoader hierarchy, where it won't get loaded by the ClassLoader that loads the Web application. An implementation of that interface, LoginBeanImpl, resides in the Web application, and any code that wants to use a LoginBeanImpl instead references it as a LoginBean (the interface). When the Web application is bounced, the assignment of the "old" LoginBeanImpl is being assigned to the interface LoginBean, which wasn't reloaded, so the assignment succeeds and no exception is thrown. The drawback here is obvious: every sessionable object needs to be split into interface and implementation. (This is partly why EJB forces this very state of affairs for each bean: this way, EJB can shuffle ClassLoader instances around without worrying about ClassCastException errors. Not coincidentally, this is also why EJB instances are forbidden to have static data members, since statics don't play well with interfaces.)



          Trick Number Two is to store only objects from the Java base runtime library (e.g., String and Date objects) into session space, rather than custom-built objects. The Java Collections classes, Map in particular, can be quite useful here as "pseudo-classes" for holding data. Since the bootstrap ClassLoader loads the runtime library, these can never be hot-versioned and therefore won't be subject to the same problems. The danger here, however, is that the Java Collections classes will take instances of anything, meaning the temptation to just stick everything into session becomes harder to resist (see Item 39).



          Trick Number Three assumes you want or must have your custom objects but can't take the time to break them into interface and implementation pieces. In that case, mark the class as Serializable, then use Java Object Serialization to store a serialized copy of the objects into the session as a byte array. Because Java Object Serialization more or less ignores this problem, and because byte arrays are implicitly themselves Serializable (thus satisfying the Servlet 2.2 Specification requirement/suggestion that only Serializable objects be stored into session), you can store the serialized version of the object, rather than the standard object type, and instead of assigning the session object back to the LoginBean reference, deserialize it:










          // Store in session

          LoginBean lb = new LoginBean(...);

          try

          {

          ByteArrayOutputStream baos = new ByteArrayOutputStream();

          ObjectOutputStream oos = new ObjectOutputStream(baos);

          Oos.writeObject(lb);

          byte[] bytes = baos.toByteArray();

          session.setAttribute("loginBean", bytes);

          }

          catch (Exception ex)

          {

          // Handle exception

          }



          // Somewhere else, retrieve LoginBean from session

          LoginBean lb = null;

          try

          {

          byte[] bytes = (byte[])session.getAttribute("loginBean");

          ByteArrayInputStream bais = new

          ByteArrayInputStream(bytes);

          ObjectInputStream ois = new ObjectInputStream(bais);

          lb = (LoginBean)ois.readObject();

          }

          catch (Exception ex)

          {

          // Handle exception

          }




          This third approach carries a significant cost, however: it's somewhat expensive to serialize and deserialize objects, even when you follow Item 71. Fortunately, you shouldn't have to use it very often. Another problem, specifically related to servlet containers, is that byte arrays aren't JavaBean-compliant and so can't be used as targets of the standard JSP bean tags (useBean, getProperty, and setProperty).



          The key here, ultimately, is to know exactly how your Java environment sets up ClassLoader relationships, and then work with them, rather than against them. If your environment doesn't tell you straight out, some judicious exploration, via calls to getClass().getClassLoader() and walking the hierarchy, is in order. Failure to do so means mysterious ClassCastException errors when you least want to see them�in production.













             < Day Day Up > 



            Statelessness


            3 4



            Statelessness



            I don't believe I have said anything about stateless objects so far. Yet you might have heard somewhere that MTS objects are stateless. Let me put it in the clearest possible terms: this is nonsense. An object is formally defined as an entity that has behavior, identity, and—state. A stateless object therefore is not an object at all. This is not a word game. The object-oriented discipline has not evolved over decades to this point just to discover that state—a fundamental object aspect—simply could be thrown out because some new technology found it difficult to accommodate. The maintainability and complexity reduction benefits of object orientation would be hampered severely, and any such technology would be doomed. Fortunately (and expectedly), MTS and COM+ are not doomed. They have absolutely no problem with state. In this section, I will try to explain what is behind the misleading claim that MTS object are stateless and what statelessness really means in the context of
            transactional COM+ objects.



            It is true that MTS enforces transaction isolation, as we discovered in the previous section. Object state is jettisoned at the boundary of each transaction in which the object participates. This is a very sane model and always should be the practice in enterprise object programming, independent of any enforcing technology. The result of requesting a set of changes on shared state should depend only on the input to the request, the behavior of the objects performing the changes, and the state of the shared data. The state transformation can go through a number of stages, where each successive stage is dependent on objects having held their state from previous stages. Object state is therefore an important part of the transactional model. It is just bounded by transaction scope. In particular, the result of shared state transformation through an object layer should not depend on state held by these objects from requests that were made as part of prior transactions. Violation of this rule makes the
            semantics of business objects unpredictable—very much like any violation of transaction isolation.



            In many cases where you read about stateless MTS objects, the term stateless refers to this weak form of statelessness. (But in this case, I think that the term should not be used at all because of the confusion it has caused.) References to statelessness in the Platform SDK documentation fall into this category. Weak statelessness is a product of COM+'s disciplined transaction integration model, not its scalability model. Be aware that the weak statelessness constraint applies only to objects when they attempt to force a transaction boundary through calls to SetComplete and SetAbort. Otherwise, even transactional objects maintain their state until they are released, just like any other object does. Observe that the transaction boundary adjusts itself to the object lifetime boundary, and not the other way around. Transactions are forced to stay alive until their root objects die a natural death or explicitly request to be deactivated. Transactional MTS objects are not stateless;
            instead, they force a transaction boundary to occur when they naturally are willing to give up their state—after completing their work by handling a number of stateful method invocations. Transactions do not control object state; instead, object state controls transactions.21



            It is also true that MTS objects should not be shared among clients that execute as the result of separate, top-level requests, possibly on separate threads. (See Figure 13-11.) This is a demand of the scalability model, not the transaction model. It is accurate enough: if objects have no state, there is no motivation for sharing them among clients. Therefore, the true, strong form of statelessness naturally enforces the scalability model. Some have suggested applying statelessness throughout to achieve scalability. Such recommendations generally take the form of advising to call either SetComplete or SetAbort before returning from any method. This indeed amounts to giving up state in your objects. Per-instance member data (or module data, as Visual Basic calls it) is useless in such objects. All variable declarations could (and should) occur at method scope. A crucial discovery in object orientation is that grouping functions together with the external data they modify in a package
            called an object tremendously enhances design clarity, code simplicity, and maintainability. The removal of all per-instance data members is tantamount to disallowing this packaging and splitting functions from data again. Classes that do this cannot be used to leverage many of the benefits of object orientation.



            Modeling tends to be difficult with such classes. And all data necessary to accomplish a task performed by objects of the class must be passed at once. This makes function signatures long, hard to read, and unwieldy for callers. Optional parameters become a nightmare, especially for clients implemented in languages that do not support the named parameter feature. State that normally could be built up conveniently inside an object or even inside the RDBMS through multiple method invocations and separate properties now must be passed all at once. Often this causes the loss of type safety when safe arrays are used to ship data aggregates. In other cases, new data structures need to be invented to communicate aggregate data to the object, even though RDBMS tables already could hold this data, if such tables could be used to store this data piece by piece, one row at a time, with multiple method invocations building it up in the scope of a single transaction. (This state of affairsof course means that the
            object cannot be deactivated across method invocations.) In other cases, data structures might be used only within an object, for gathering all information necessary to perform a business function, where this information gathering occurs repeatedly across method invocations. But these data structures now have to be made available externally so that clients can populate them and pass them to the object all at once. This is a clear violation of encapsulation and leads to reduced maintainability. Of course, it is no surprise to find that the cost of true statelessness is tremendously high.



            Consider our CUserTracker example. Say we wanted to allow logging clients to attach some contextual information to each user tracking log entry. Callers might want to record such information as request parameters passed to them, system load conditions, the state of the data the callee is supposed to work on, and so on. Because contextual information for each log entry might consume multiple rows, we arrange the information in the RDBMS in a separate table with a foreign key relationship to the user tracking (primary) table. Columns in the secondary table include date and time information, as well as some string and numeric types because callers commonly record such data. Now how do we offer an interface to this call context facility in CUserTracker? If CUserTracker needs to be stateless, we are in trouble. We might have to force callers to pass the entire kit and caboodle in a gigantic safe array of strings. This is not type safe and is very inconvenient for C++ callers. But if CUserTracker can maintain
            state across method invocations, we can equip it with properties that correspond to the columns of the secondary table, the one that holds the request context information. CUserTracker then would have a method AddRequestContextInfoSet. Every time the method is called, the content of all the properties would be added to an internal data structure or directly to the secondary table. The invocation returns having called DisableCommit, and CUserTracker will call SetComplete only after the caller signals that the data set is complete and all information for the primary table entry is present. This is a significantly easier model, it is type safe, and the internals of CUserTracker remain encapsulated.



            The more input an object needs to do its job, the messier things get if the object does not hold state across method invocations. In general, object statelessness has no performance benefit. The work of assembling the data has to be done somewhere. If it is not assembled in the object, calls to do this work just bunch up in the object's clients. This work involves building up arrays and user-defined types of various sorts. But this should be object work, not client work! In addition, the object often can perform this work more efficiently because it can send it off to the RDBMS. Clients do not have this option. Because of the general difficulty of a stateless model and the encapsulation dangers inherent in it, as a general rule, I strongly advise against forbidding objects from holding state. In my opinion, it is like throwing the baby (a sane programming model, readability, maintainability, and encapsulation) out with the bathwater (object sharing and lack of scalability). In most scalable
            application projects, doing so is completely unnecessary because the bathwater can be dumped out all by itself.



            Finally, there is some confusion regarding which technologies encourage which programming models. It is indeed true that for scalable distributed applications network round-trips should be minimized. This practice is mandated by the first rule of scalability, which says that computing resources must be engaged efficiently. For DCOM applications, efficiency might very well mean calling a single method instead of calling three. Statelessness contributes to a better network utilization profile. But network utilization has absolutely nothing to do with COM+ (other than DCOM), MTS, the single concurrent client scalability model, or transactions. Network separation indeed can justify cumbersome method signatures and other elaborate techniques such as marshal-by-value (discussed in Chapter 8), though I am not sure about total statelessness even in that case. A call model that saves on network round-trips often is required with remote components, but complete statelessness rarely furthers the cause of
            efficiency. Analyze impact on network traffic before you design the interface of an object that often will be called remotely. Then decide which optimizations should be applied. Don't optimize where optimization will serve only to compromise maintainability. Always remember : "Premature optimization is the root of all evil" (Donald E. Knuth).



            In many scalable projects, the business objects are not separated physically from their clients. This is especially true for Web projects, in which the client is the Web server. In this case, business objects generally are invoked in process, so there is absolutely no justification for making these business objects stateless. Things might be a bit different in your project. Fat clients might be separated from business objects by a network boundary, but they might need to exercise business object interfaces frequently. If this is the case, have a look at Wade Baron's four-tier application architecture in Chapter 11. Wade shows us an appropriate architecture for such conditions that does not burden clients with stateless business objects and does not compromise scalability.



            Saying that remote objects should be stateless (in the strict, original sense of the term) goes a little far in my opinion, but it is motivated by a desire to cut down on network traffic. Saying the same thing about transactional COM+ objects simply makes no sense. The COM+ programming model offers the EnableCommit and DisableCommit methods for a reason. Even COM+'s new auto-done property, which changes a method's default status from EnableCommit to SetComplete (or SetAbort if you return an error), applies only at the method level, not the component or even the interface level. The COM+ programming model, including the scalability and transaction integration models, is rich, expressive, and powerful. Don't sell it short by buying into the myth that making your "objects" stateless somehow will improve scalability or help your project in any way.



            Recipe 5.4 Mapping a JSP to Its Page Implementation Class



            [ Team LiB ]






            Recipe 5.4 Mapping a JSP to Its Page Implementation Class




            Problem



            You have already precompiled a JSP and want to specify a
            mapping to the JSP page implementation class in your deployment
            descriptor.





            Solution



            Cut and paste the servlet and
            servlet-mapping elements generated automatically
            by JspC into web.xml.
            Create the proper package-related directories in the
            WEB-INF/classes directory of your web
            application, then place the precompiled JSPs into that directory.





            Discussion



            Precompiling JSPs allows you to remove the JSP page syntax files from
            your web application and just use the resulting servlet class files.
            You can then use the servlet-mapping element in
            web.xml to map a JSP-style URL (e.g.,
            default.jsp) to the compiled servlet class. Here
            is how to accomplish this task:



            1. Precompile the JSP(s) as described in Recipe 5.1 or Recipe 5.2, including the
              compilation of Java source files into class files using
              javac or another compiler tool.

            2. Cut and paste the servlet and
              servlet-mapping elements generated automatically
              by JspC into your deployment descriptor (if you
              are using Tomcat), or add those elements manually to
              web.xml (if you are using WebLogic or another
              container).

            3. Make sure the servlet-mapping's
              url-pattern element points to a JSP-style
              filename, such as default.jsp, or an extension
              mapping such as *.jsp.

            4. Place the class or classes, including the package-related
              directories, in WEB-INF/classes, or inside of a
              JAR file that is stored in WEB-INF/lib.


            When the web users request the URL specified by the
            servlet-mapping for that JSP page implementation
            class, the web container will now direct that request to the mapped
            servlet class.



            Example 5-8 shows a servlet configuration for a
            precompiled JSP.




            Example 5-8. A web.xml entry for a precompiled JSP

            <servlet>
            <servlet-name>org.apache.jsp.precomp_jsp</servlet-name>
            <servlet-class>org.apache.jsp.precomp_jsp</servlet-class>
            </servlet>
            <servlet-mapping>
            <servlet-name>org.apache.jsp.precomp_jsp</servlet-name>
            <url-pattern>/precomp.jsp</url-pattern>
            </servlet-mapping>



            The directory structure for this class in your web application should
            be something like:
            /WEB-INF/classes/org/apache/jsp/precomp_jsp.class.
            If the context path for your web application is
            /home, users can request this
            JSP's implementation class (a servlet, behind the
            scenes) with a URL similar to
            http://localhost:8080/home/precomp.jsp.





            See Also



            Recipe 5.1-Recipe 5.3; Chapter JSP.11.4 of the JSP
            2.0 specification.








              [ Team LiB ]



              Solution




              I l@ve RuBoard









              Solution




              class D : public B1, public B2
              {
              string DoName() { return "D"; }
              };

              Demonstrate the best way you can find to "work around" not using multiple inheritance by writing an equivalent (or as near as possible) class D without using MI. How would you get the same effect and usability for D with as little change as possible to syntax in the calling code?


              There are a few strategies, each with its weaknesses, but here's one that gets quite close.



              class D : public B1
              {
              public:
              class D2 : public B2
              {
              public:
              void Set ( D* d ) { d_ = d; }
              private:
              string DoName();
              D* d_;
              } d2_;

              D() { d2_.Set( this ); }

              D( const D& other ) : B1( other ), d2_( other.d2_ )
              { d2_.Set( this ); }

              D& operator=( const D& other )
              {
              B1::operator=( other );
              d2_ = other.d2_;
              return *this;
              }

              operator B2&() { return d2_; }

              B2& AsB2() { return d2_; }

              private:
              string DoName() { return "D"; }
              };

              string D::D2::DoName(){ return d_->DoName(); }

              Before reading on, take a moment to consider the code and think about the purpose of each class or function.


              Drawbacks


              The workaround does a pretty good job of implementing MI, automates most of MI's behavior, and allows all of MI's usability, as long as you rely on coding discipline to fill in the parts that are not completely automated. In particular, here are some drawbacks of this workaround that show which parts of the MI feature are not completely automated.



              • Providing operator B2&() arguably gives references special (inconsistent) treatment over pointers.


              • Calling code has to invoke D::AsB2() explicitly to use a D as a B2 (in the test harness, this means changing "B2* pb2 = &d;" to "B2* pb2 = &d.AsB2();").


              • A dynamic_cast from D* to B2* still doesn't work (it's possible to work around this if you're willing to use the preprocessor to redefine dynamic_cast calls, but that's an extreme solution).



              Interestingly, you may have observed that the D object layout in memory is similar to what multiple inheritance would give. That's because we're trying to simulate MI, without all the syntactic sugar and convenience that built-in language support would provide.


              You may not need MI often, but when you need it, you really need it. This Item is intended to demonstrate that having the required language support for this kind of useful feature is far better than trying to roll your own, even if you can duplicate the functionality exactly through a combination of other features and coding discipline.









                I l@ve RuBoard



                A.1 Borland C++ Builder and Kylix








                 

                 












                A.1 Borland C++ Builder and Kylix





                Borland has several extensions to C++ to support its Rapid

                Application Development products: C++ Builder (for Microsoft Windows)

                and Kylix (for Linux). This section presents highlights of the

                RAD extensions.







                _ _closure




                In C++ Builder, a closure is like a pointer to a member function that

                has been bound to a specific object. Given a closure, you can call it

                the way you would call an ordinary function. To declare a closure

                type or object, use _ _closure as a modifier for

                the name of a function pointer:





                typedef int (* _  _closure MemFunc)(int);

                MemFunc func;

                struct demo {

                int sqr(int x) { return x * x; }

                };

                demo d;

                func = d.sqr;

                int n = func(10); // n = 100




                _ _declspec




                The _ _declspec keyword takes an attribute in

                parentheses and serves as a declaration specifier. Depending on the

                attribute, it can be used to modify a function, object, or class. For

                example, _ _declspec(noreturn) is a function

                specifier that tells the compiler that the function does not return,

                which permits additional optimization and error-checking (for

                example, eliminating statements that follow a call to the

                noreturn function):





                void _  _declspec(noreturn) abort(  );




                Other attributes include:







                thread




                A storage-class specifier that declares an object to be local to a

                thread; that is, each runtime thread has a separate copy of the

                object.





                dllexport




                A function specifier that tells the linker to export the function

                name from a dynamic-link library (DLL).





                uuid( string-literal)




                Modifies a class declaration. It associates a universally unique

                identifier (UUID) with the class, which is required for implementing

                COM objects in Windows. A class's UUID can be

                retrieved with the _ _uuidof operator.









                _ _int64




                The _ _int64 type is a 64-bit integer type. In

                current releases of C++ Builder and Kylix, long is

                32 bits. A 64-bit integer literal is written with a suffix of

                i64 (e.g., 10000000000000i64).





                _ _property




                A property is a class member that is used like a data member, but it

                can have the semantics of a member function. Properties are the

                foundation for the RAD features of C++ Builder and Kylix. A property

                is associated with a reader and writer, which can be data member

                names or member function names:







                class TControl {

                private:

                int height_;

                void set_height(int h);

                . . .

                _ _published:

                _ _property int height { read=height_, write=set_height };

                };

                TControl * ctl = new TControl;

                ctl->height = 10; // Calls ctl->set_height(10)

                int h = ctl->height; // Gets ctl->height_




                _ _published




                The _ _published access

                specifier label yields the same accessibility as the

                public keyword, but it also directs the compiler

                to store additional runtime type information (RTTI) for the published

                declarations. The RAD features use the RTTI when the user designs an

                application





                _ _thread




                The _ _thread keyword is a synonym for _

                _declspec(thread)
                .





                _ _uuidof




                The _ _uuidof operator takes an expression as an

                operand and returns the UUID of the expression's

                class. The class declares its UUID with _

                _declspec(uuid(
                . . . )). A class can

                implement the standard COM member function,

                QueryInterface, with _ _uuidof:







                class demo {

                virtual HRESULT QueryInterface(const UUID& iid, void** obj)

                {

                if (iid == _ _uuidof(IUnknown)) {

                *obj = reinterpret_cast<IUnknown*>(this);

                static_cast<IUnknown*>(*obj)->AddRef( );

                return S_OK;

                }

                return E_NOINTERFACE;

                }

                };

















                   

                   


                  14.7 How Much Data Is Queued?



                  [ Team LiB ]






                  14.7 How Much Data Is Queued?


                  There are times when we want to see how much data is queued to be read on a socket, without reading the data. Three techniques are available:


                  1. If the goal is not to block in the kernel because we have something else to do when nothing is ready to be read, nonblocking I/O can be used. We will describe this in Chapter 16.

                  2. If we want to examine the data but still leave it on the receive queue for some other part of our process to read, we can use the MSG_PEEK flag (Figure 14.6). If we want to do this, but we are not sure that something is ready to be read, we can use this flag with a nonblocking socket or combine this flag with the MSG_DONTWAIT flag.

                    Be aware that the amount of data on the receive queue can change between two successive calls to recv for a stream socket. For example, assume we call recv for a TCP socket specifying a buffer length of 1,024 along with the MSG_PEEK flag, and the return value is 100. If we then call recv again, it is possible for more than 100 bytes to be returned (assuming we specify a buffer length greater than 100), because more data can be received by TCP between our two calls.

                    In the case of a UDP socket with a datagram on the receive queue, if we call recvfrom specifying MSG_PEEK, followed by another call without specifying MSG_PEEK, the return values from both calls (the datagram size, its contents, and the sender's address) will be the same, even if more datagrams are added to the socket receive buffer between the two calls. (We are assuming, of course, that some other process is not sharing the same descriptor and reading from this socket at the same time.)

                  3. Some implementations support the FIONREAD command of ioctl. The third argument to ioctl is a pointer to an integer, and the value returned in that integer is the current number of bytes on the socket's receive queue (p. 553 of TCPv2). This value is the total number of bytes queued, which for a UDP socket includes all queued datagrams. Also be aware that the count returned for a UDP socket by Berkeley-derived implementations includes the space required for the socket address structure containing the sender's IP address and port for each datagram (16 bytes for IPv4; 24 bytes for IPv6).






                    [ Team LiB ]



                    10.28 Validation Using Table Metadata




                    I l@ve RuBoard










                    10.28 Validation Using Table Metadata




                    10.28.1 Problem



                    You need to check
                    input values against the legal members of an ENUM
                    or SET column.





                    10.28.2 Solution



                    Get the column definition, extract the list of members from it, and
                    check data values against the list.





                    10.28.3 Discussion



                    Some forms of validation involve checking input values against
                    information stored in a database. This includes values to be stored
                    in an ENUM or SET column, which
                    can be checked against the valid members stored in the column
                    definition. Database-backed validation also applies when you have
                    values that must match those listed in a lookup table to be
                    considered legal. For example, input records that contain customer
                    IDs can be required to match a record in a
                    customers table, or state abbreviations in
                    addresses can be verified against a table that lists each state. This
                    section describes ENUM- and
                    SET-based validation, and Recipe 10.29 discusses how to use lookup tables.



                    One way to check input values that correspond to the legal values of
                    ENUM or SET columns is to get
                    the list of legal column values into an array using the information
                    returned by SHOW COLUMNS,
                    then perform an array membership test. For example, the
                    favorite-color column color from the
                    profile table is an ENUM that
                    is defined as follows:



                    mysql> SHOW COLUMNS FROM profile LIKE 'color'\G
                    *************************** 1. row ***************************
                    Field: color
                    Type: enum('blue','red','green','brown','black','white')
                    Null: YES
                    Key:
                    Default: NULL
                    Extra:


                    If you extract the list of enumeration members from the
                    Type value and store them in an array
                    @members, you can perform the membership test like
                    this:



                    $valid = grep (/^$val$/i, @members);


                    The pattern constructor begins and ends with
                    ^ and $ to require
                    $val to match an entire enumeration member (rather
                    than just a substring). It also is followed by an
                    i to specify a case-insensitive comparison,
                    because ENUM columns are not case sensitive.



                    In Recipe 9.7, we wrote a function
                    get_enumorset_info( ) that returns
                    ENUM or SET column metadata.
                    This includes the list of members, so it's easy to
                    use that function to write another utility routine,
                    check_enum_value(
                    )
                    , that gets the legal enumeration values and
                    performs the membership test. The routine takes four arguments: a
                    database handle, the table name and column name for the
                    ENUM column, and the value to check. It returns
                    true or false to indicate whether or not the value is legal:



                    sub check_enum_value
                    {
                    my ($dbh, $tbl_name, $col_name, $val) = @_;

                    my $valid = 0;
                    my $info = get_enumorset_info ($dbh, $tbl_name, $col_name);
                    if ($info && $info->{type} eq "enum")
                    {
                    # use case-insensitive comparison; ENUM
                    # columns are not case sensitive
                    $valid = grep (/^$val$/i, @{$info->{values}});
                    }
                    return ($valid);
                    }


                    For single-value testing, such as to validate a value submitted in a
                    web form, that kind of test works well. However, if
                    you're going to be testing a lot of values (like an
                    entire column in a datafile), it's better to read
                    the enumeration values into memory once, then use them repeatedly to
                    check each of the data values. Furthermore, it's a
                    lot more efficient to perform hash lookups
                    than array lookups (in Perl at least). To do so, retrieve the legal
                    enumeration values and store them as keys of a hash. Then test each
                    input value by checking whether or not it exists as a hash key.
                    It's a little more work to construct the hash, which
                    is why check_enum_value( )
                    doesn't do so. But for bulk validation, the improved
                    lookup speed more than makes up for the hash construction
                    overhead.[4]


                    [4] If you want to check for yourself the
                    relative efficiency of array membership tests versus hash lookups,
                    try the lookup_time.pl script in the
                    transfer directory of the
                    recipes distribution.



                    Begin by getting the
                    metadata for the column, then convert the list of legal enumeration
                    members to a hash:



                    my $ref = get_enumorset_info ($dbh, $tbl_name, $col_name);
                    my %members;
                    foreach my $member (@{$ref->{values}})
                    {
                    # convert hash key to consistent case; ENUM isn't case sensitive
                    $members{lc ($member)} = 1;
                    }


                    The loop makes each enumeration member exist as the key of a hash
                    element. The hash
                    key is
                    what's important here; the value associated with it
                    is irrelevant. (The example shown sets the value to
                    1, but you could use undef,
                    0, or any other value.) Note that the code
                    converts the hash keys to lowercase before storing them. This is
                    done because hash key lookups in Perl are
                    case sensitive. That's fine if the values that
                    you're checking also are case sensitive, but
                    ENUM columns are not. By converting the
                    enumeration values to a given lettercase before storing them in the
                    hash, then converting the values you want to check similarly, you
                    perform in effect a case insensitive key existence test:



                    $valid = exists ($members{lc ($val)});


                    The preceding example converts enumeration values and input values to
                    lowercase. You could just as well use uppercase�as long as you
                    do so for all values consistently.



                    Note that the existence test may fail if the input value is the empty
                    string. You'll have to decide how to handle that
                    case on a column-by-column basis. For example, if the column allows
                    NULL values, you might interpret the empty string
                    as equivalent to NULL and thus as being a legal
                    value.



                    The validation procedure for
                    SET values is
                    similar to that for ENUM values, except that an
                    input value might consist of any number of SET
                    members, separated by commas. For the value to be legal, each element
                    in it must be legal. In addition, because "any
                    number of members" includes
                    "none," the empty string is a legal
                    value for any SET column.



                    For one-shot testing of individual input values, you can use a
                    utility routine check_set_value(
                    )
                    that is similar to check_enum_value(
                    )
                    :



                    sub check_set_value
                    {
                    my ($dbh, $tbl_name, $col_name, $val) = @_;

                    my $valid = 0;
                    my $info = get_enumorset_info ($dbh, $tbl_name, $col_name);
                    if ($info && $info->{type} eq "set")
                    {
                    #return 1 if $val eq ""; # empty string is legal element
                    # use case-insensitive comparison; SET
                    # columns are not case sensitive
                    $valid = 1; # assume valid until we find out otherwise
                    foreach my $v (split (/,/, $val))
                    {
                    if (!grep (/^$v$/i, @{$info->{values}}))
                    {
                    $valid = 0; # value contains an invalid element
                    last;
                    }
                    }
                    }
                    return ($valid);
                    }


                    For bulk testing, construct a hash from the legal
                    SET members. The
                    procedure is the same as for producing a hash from
                    ENUM elements:



                    my $ref = get_enumorset_info ($dbh, $tbl_name, $col_name);
                    my %members;
                    foreach my $member (@{$ref->{values}})
                    {
                    # convert hash key to consistent case; SET isn't case sensitive
                    $members{lc ($member)} = 1;
                    }


                    To validate a given input value against the SET
                    member hash, convert it to the same lettercase as the hash keys,
                    split it at commas to get a list of the individual elements of the
                    value, then check each one. If any of the elements are invalid, the
                    entire value is invalid:



                    $valid = 1;         # assume valid until we find out otherwise
                    foreach my $elt (split (/,/, lc ($val)))
                    {
                    if (!exists ($members{$elt}))
                    {
                    $valid = 0; # value contains an invalid element
                    last;
                    }
                    }


                    After the loop terminates, $valid is true if the
                    value is legal for the SET column, and false
                    otherwise. Empty strings are always legal SET
                    values, but this code doesn't perform any
                    special-case test for an empty string. No such test is necessary,
                    because in that case the split( ) operation
                    returns an empty list, the loop never executes, and
                    $valid remains true.










                      I l@ve RuBoard



                      Chapter 4. The Linux KernelA Different Perspective











                      Chapter 4. The Linux KernelA Different Perspective




                      In this chapter


                      • Background page 66

                      • Linux Kernel Construction page 70

                      • Kernel Build System page 79

                      • Obtaining a Linux Kernel page 96

                      • Chapter Summary page 97


                      If you want to learn about kernel internals, many good books are available on kernel design and operation. Several are presented in Section 4.5.1, "Suggestions for Additional Reading," in this and other chapters throughout the book. However, very little has been written about how the kernel is organized and structured from a project perspective. What if you're looking for the right place to add some custom support for your new embedded project? How do you know which files are important for your architecture?


                      At first glance, it might seem an almost impossible task to understand the Linux kernel and how to configure it for a specific platform or application. In a recent Linux kernel snapshot, the Linux kernel source tree consists of more than 20,000 files that contain more than six million linesand that's just the beginning. You still need tools, a root file system, and many Linux applications to make a usable system.


                      This chapter introduces the Linux kernel and covers how the kernel is organized and how the source tree is structured. We then examine the components that make up the kernel image and discuss the kernel source tree layout. Following this, we present the details of the kernel build system and the files that drive the kernel configuration and build system. This chapter concludes by examining what is required for a complete embedded Linux system.















                      'Am I Affected?'




                      I l@ve RuBoard









                      "Am I Affected?"


                      If you're using libraries built for possible multithreaded use and performance matters to you, you can and should do the same thing we did. Run profiles of your code to determine the data structures you use most. Do this by member function hit count, not necessarily by time spent in the function. Create sample programs that manipulate those data structures in a loop, and time those programs running with single- and multithreaded versions of the corresponding libraries. If there's a substantial difference, tell your vendor, especially if you can tell from the size of the difference whether it's less efficient than it needs to be.


                      Library implementers are generally reasonable people. Certainly, all those I've met are. It's their business to provide the highest-quality code to their customers. If there's a competitive advantage they can gain over other library vendors, they're usually happy to hear about it and provide it. Getting rid of COW implementations falls into the "good optimizations" category in most cases (and general-purpose libraries care most about "most cases").


                      So why have COW and similar "optimizations" been so prevalent historically? Probably the biggest reason is inertia and the weight of years. COW has traditionally been a common optimization for decades. Even at the end of the 1990s, despite the volume of trade press about multithreaded code, it was far from clear how much we as an industry were really writing multithreaded business applications. That is finally changing. Now that it is becoming clearer to library vendors that a significant portion of their customers are using multithreaded builds of their libraries, a few major vendors have already decided to abandon possible false optimizations like copy-on-write. To me, and to many, this is a Good Thing. It is to be hoped that the rest of the vendors will follow suit.


                      The performance penalty COW exacts in a multithreaded program can be startling. Remember, though, the problem isn't just with COW. Any shared-information optimization that joins two objects in some way "under the covers" will have the same problems.








                        I l@ve RuBoard



                        Section 12.4. Calling Stored Programs from Application Code










                        12.4. Calling Stored Programs from Application Code


                        Most languages used to build applications that interact with MySQL are able to fully exploit stored programs
                        , although in some languages, support for advanced features such as multiple result sets is a recent addition. In the following chapters we will explain in detail how to use stored programs from within PHP, Java, Perl, Python, and the .NET languages VB.NET and C#. In this section we want to give you an introduction to the general process of calling

                        a stored program from an external programming language.


                        In general, the techniques for using stored programs differ from those for standard SQL statements in two significant respects:


                        • While SQL statement calls may take parameters
                          , stored programs can also have OUT or INOUT parameters. This means that you need to understand how to access the value of an OUT or INOUT parameter once the stored program execution completes.

                        • A SELECT statement can return only one result set, while a stored program can return any number of result sets, and you might not be able to anticipate the number or structure of these result sets.


                        So, calling a stored program requires a slightly different program flow from standard SQL processing. The overall sequence of events is shown in the UML "retro" diagram (e.g., flowchart) in Figure 12-3.


                        Here's a brief description of each of these steps. Remember that in the next five chapters, we will be showing you how to follow these steps in various languages.



                        12.4.1. Preparing a Stored Program Call for Execution






                        We'll normally want to call a stored program more than once in our application. Typically, we first create a statement handle for the stored program. We then iteratively execute the program, perhaps providing different values for the program's parameters with each execution.


                        It's usually possible to bypass the preparation stage and execute a stored program directlyat least if the stored program returns no result sets. However, if the stored program takes parameters and you execute the stored program more than once in your program, we recommend that you go to the extra effort of preparing the statement that includes your stored program call.




                        12.4.2. Registering Parameters


                        We can pass parameters into stored programs that require them as literals (e.g., concatenate the text of the parameter values into the stored program CALL statement).



                        Figure 12-3. General processing flow when calling a stored program from an external language



                        However, in all of the languages we discuss in subsequent chapters, there are specific parameter-handling methods that allow us to re-execute a stored program with new parameters without having to re-prepare the stored program call. As we said previously, it's best to use these explicit methods if you are going to execute the stored program more than onceboth because it is slightly more efficient and because, in some cases, only the prepared statement methods offer full support for bidirectional parameters and multiple result sets.


                        The methods for passing parameters to stored programs are usually the same as the methods used to pass parameters (or "bind variables") to normal SQL statements.




                        12.4.3. Setting Output Parameters


                        Some languages allow us to specifically define and process output parameters

                        . In other languages, we can only access the values of OUT or INOUT parameters by employing "user variables" (variables prefixed with @) to set and retrieve the parameter values.


                        Both techniquesthe direct API calls provided by .NET and JDBC and the session variable solution required by other languagesare documented in the relevant language-specific chapters that follow.




                        12.4.4. Executing the Stored Program




                        Once the input parameters are set andin the case of .NET and Javaonce the output parameters are registered, we can execute the stored program. The method for executing
                        a stored program is usually the same as the method for executing a standard SQL statement.


                        If the stored program returns no result sets
                        , output parameters can immediately be accessed. If the stored program returns one or more result sets, all of those result sets must be processed before the output parameter values can be retrieved.




                        12.4.5. Retrieving Result Sets


                        The process of retrieving


                        a single result set from a stored program is identical to the process of retrieving a result set from other SQL statementssuch as SELECT or SHOWthat return result sets.


                        However, unlike SELECT and SHOW statements, a stored program may return multiple result sets, and this requires a different flow of control in our application. To correctly process all of the result sets that may be returned from a stored program, the programming language API must include a method to switch to the "next" result set and possibly a separate method for determining if there are any more result sets to return.


                        JDBC and ADO.NET languages have included these methods since their earliest incarnations (for use with SQL Server and other RDBMSs that support multiple result sets), and these interfaces have been fully implemented for use with MySQL stored programs
                        . Methods exist to retrieve multiple result sets in PHP, Perl, and Python, but these methods are relatively immaturein some cases, they were implemented only in response to the need to support stored programs in MySQL 5.0.




                        12.4.6. Retrieving Output Parameters


                        Once all result sets have been retrieved, we are able to retrieve any stored program output parameters. Not all languages provide methods for directly retrieving the values of output parameterssee the "Setting Output Parameters" section earlier for a description of a language-independent method of retrieving output parameters indirectly through user variables.


                        JDBC and ADO.NET provide specific calls that allow you to directly retrieve the value of an output parameter.




                        12.4.7. Closing or Re-Executing the Stored Program







                        Now that we have retrieved the output parameters, the current stored program execution is complete. If we are sure that we are not going to re-execute the stored program, we should close it using language-specific methods to release all resources associated with the stored program execution. This usually means closing
                        the prepared statement object associated with the stored program call. If we want to re-execute the stored program, we can modify the input parameters and use the language-specific execute method to run the stored program as many times as needed. Then you should close the prepared statement and release resources.




                        12.4.8. Calling Stored Functions


                        In some languagesJDBC and .NET, in particularstored functions
                        can be invoked directly, and you have language-specific techniques for obtaining the stored function return value. However, in other languages, you would normally need to embed the stored function in a statement that supports an appropriate expression such as a single-line SELECT statement.