Friday, November 13, 2009

19.6 Lockset Analysis













19.6 Lockset Analysis


Data races are hard to reveal with testing, due to nondeterministic interleaving of threads in a concurrent program. Statically exploring the execution space is computationally expensive, and suffers from the approximated model of computation, as discussed in Chapter 8. Dynamic analysis can greatly amplify the sensitivity of testing to detect potential data races, avoiding the pessimistic inaccuracy of finite state verification while reducing the optimistic inaccuracy of testing.


Data races are commonly prevented by imposing a locking discipline, such as the rule every variable shared between threads must be protected by a mutual exclusion lock. Dynamic lockset analysis reveals potential data races by detecting violation of the locking discipline.


Lockset analysis identifies the set of mutual exclusion locks held by threads when accessing each shared variable. Initially, each shared variable is associated with all available locks. When a thread accesses a shared variable v, lockset analysis intersects the current set of candidate locks for v with the locks held by that thread. The set of candidate locks that remains after executing a set of test cases is the set of locks that were always held by threads accessing that variable. An empty set of locks for a shared variable v indicates that no lock consistently protects v.



The analysis of the two threads in Figure 19.4 starts with two locks associated with variable x. When thread A locks lck1 to access x, the lockset of x is intersected with the locks hold by A. When thread B locks lck2 to access x, the intersection of the lockset of x with the current set of locks becomes empty, indicating that no locks consistently protect x.






Open table as spreadsheet














Thread



Program trace



locks held



lockset (x)



thread A



lock(lck1)
x=x+1;
unlock(lck1)



{ }
{lck1}



{lck1, lck2}
{lck1}



thread B



lock(lck2)
x=x+1;
unlock(lck2)



{ }
{lck2}
{ }



{ }






Figure 19.4: Threads accessing the same shared variable with different locks. (Adapted from Savage et al. [SBN+97])

This simple locking discipline is violated by some common programming practices: Shared variables are frequently initialized without holding a lock; shared variables written only during initialization can be safely accessed without locks; and multiple readers can be allowed in mutual exclusion with single writers. Lockset analysis can be extended to accommodate these idioms.


Initialization can be handled by delaying analysis till after initialization. There is no easy way of knowing when initialization is complete, but we can consider the initialization completed when the variable is accessed by a second thread.


Safe simultaneous reads of unprotected shared variables can also be handled very simply by enabling lockset violations only when the variable is written by more than one thread. Figure 19.5 shows the state transition diagram that enables lockset analysis and determines race reports. The initial virgin state indicates that the variable has not been referenced yet. The first access moves the variable to the exclusive state. Additional accesses by the same thread do not modify the variable state, since they are considered part of the initialization procedure. Accesses by other threads move to states shared and shared-modified that record the type of access. The variable lockset is updated in both shared and shared-modified states, but violations of the policy are reported only if they occur in state shared-modified. In this way, read-only concurrent accesses do not produce warnings.






Figure 19.5: The state transition diagram for lockset analysis with multiple read accesses.

To allow multiple readers to access a shared variable and still report writers' data races, we can simply distinguish between the set of locks held in all accesses from the set of locks held in write accesses.














Tests Passing; Solution Doesn't Work










Tests Passing; Solution Doesn't Work


One of the more frustrating project situations is to find that tests are reported as passing but the solution under the test still doesn't work for observers outside the test team. In these cases, you want to identify why the tests do not seem to find the same issues that other users do. Figures 9.159.18 are examples of this case.



High Bug Find Rate


Frequently you see a high test pass rate but still see a large incoming bug rate (or worse, customers or beta users are reporting lots of bugs that testing seems to be missing).


This can occur for several reasons:


  • The tests might be too gentle for this stage of the solution. In early iterations, gentle tests are good, but as the solution matures, tests should exercise broader scenarios and integrations. These tests might be missing.

  • Tests might be stale or be testing the wrong functionality.

  • It might be time to switch test techniques. (See Chapter 7.)


Consider Figures 9.15, 9.16, and 9.17.




Figure 9.15.

On the Quality Indicators chart, the test pass rate is high, but active bugs are also high.












Figure 9.16.

Correspondingly, on the Bug Rates chart, active bugs are high because find rate stays high.











Figure 9.17.

Tests aren't finding the bugs. On this report, many of the bugs found have no corresponding test. This might be a sign that testing is looking elsewhere. And, if you are expecting regression testing to prevent their undiscovered recurrence, you will need regression tests that you don't have yet.

[View full size image]







Tests Are Stale


Tests do not necessarily evolve at the same rate as the code under test. This risk is present especially when tests are heavily automated. In this situation, you see high test pass rates with ongoing code churn and diminishing code coverage (see Figure 9.18).




Figure 9.18.

This Quality Indicators chart shows a high rate of code churn and a low rate of code coverage from testing, yet test pass rates remain high. This suggests that the tests being run are not exercising the new code. Don't be lulled by the high test pass ratethese tests are clearly not testing all the new development work.

[View full size image]















Identifying Database Servers





Identifying
Database Servers



Identifying database servers is even trickier
than identifying front-end and internal application servers. Identifying
front-end and internal application servers is easier because both communicate
in HTTP. Their signatures work their way into various elements of HTTP, such as
the URL, HTTP header, and cookies.



In contrast, database servers communicate
with internal application servers in SQL. The only elements of a URL that get
passed to the database interface are the values being exchanged by means of
various input fields and URL parameters. Thus the only way to identify back-end
databases through URLs is to force them to generate errors that are reflected
by the application server and end up being sent back to the Web browser.



Let's consider two URLs:



style='font-size:10.0pt;font-family:Symbol'>�        
http://www.example.com/public/index.php?ID=27



style='font-size:10.0pt;font-family:Symbol'>�        
http://www.example.org/Profile.cfm?id=3&page=1



The first URL has a PHP script, class=docemphasis1>index.php, which seems to make use of a database as
suggested by the URL parameter "ID=27."
The second URL is a ColdFusion application, which again seems to perform
database queries based on the parameter id.



To force the database servers to return an
error involves tampering with the values passed to the parameters in both
cases. For the first URL, we substitute a nonnumeric ID
value for "27." For the second URL,
we prematurely truncate the query by replacing the value class=docemphasis1>3
with a single quotation mark. style='color:#003399'>Figures 6-11 and style='color:#003399'>6-12, respectively, show how the errors
appear.



style='font-size:10.5pt;font-family:Arial'>Figure 6-11. Forcing a database
error with PHP




style='font-size:10.5pt;font-family:Arial'>Figure 6-12. Forcing a database
error with ColdFusion




We leave it to you to figure out how much
damage is done by simply obtaining the types of information displayed in these
error messages! Hint: The ColdFusion SQL Server
error message contains enough information to launch a URL request that possibly
could cause remote command execution with Administrator privileges on the
database server of the Web application.



 





Chapter 8. Bottlenecks










Chapter 8. Bottlenecks


You've created a killer application. You store all your data as UTF-8, you receive and process email like it was candy, your data is well filtered, and you use more external services than you can count. It's going great, your users love you, and the venture capitalists are circling. And then your application. Grinds. To. A. Halt.


As applications grow, weak spots reveal themselves. Techniques that worked for 10 requests a second start to fail for 100 requests a second. Databases with 10,000 rows work fine but start to choke when they reach 100,000 or 1,000,000. In an ideal world, we would find and solve all of these problems before they happen in production, but there's always a chance we'll miss something.


In this chapter we'll look at techniques for identifying
and fixing bottlenecks
in our architecture, both before they happen and when they start to bog our systems down. We'll talk about ways to increase the performance we can get out of our existing hardware so we're making the most of what we have, before we move on to talking about scaling in the next chapter.












Have We Tested the Changes?










Have We Tested the Changes?


Throughout the lifecycle, the application changes. Regressions are bugs in the software under test that did not appear in previous versions. Regression testing is the term for testing a new version of the software to find those bugs. Almost all types of tests can be used as regression tests, but in keeping with the tenet of "Important problems fast," your regression testing strategy must be very efficient.


Ideally, you should test the most recent changes first. Not only does this mitigate the risk of unforeseen side effects of the changes, but also if you do find bugs and report them, the more recent changes are more current in everyone's memory.


One of the challenges in most test teams is identifying what exactly the changes are. Fortunately, the daily build report shows you exactly what changesets have made it into the build and what work items (scenarios, QoS, tasks, and bugs) have been resolved, thereby identifying the functionality that should be tested first (see Figure 7.13). Moreover, if you have reasonable build verification tests (BVTs), then you can check their results and code coverage.





Figure 7.13.

One of the many purposes of the daily build report is to focus testing activity on the newly built functionality. This list of work items resolved in the build is like an automatic release note, showing what new functionality needs to be tested.

[View full size image]














How Fool's Gold Pans Out



[ Team LiB ]





How Fool's Gold Pans Out


In conclusion, we hold the following software truths to be self-evident (or evident after careful examination, anyway):


  • The success of a software project depends on not writing source code too early in the project.

  • You can't trade defect count for cost or schedule unless you're working on life-critical systems. Focus on defect count; cost and schedule will follow.

  • Silver bullets are hazardous to a project's health, though software industry history suggests that vendors will continue to claim otherwise.

  • Half-hearted process improvement is an especially damaging kind of silver bullet because it undermines future improvement attempts.

  • Despite its name, software isn't soft, unless it's made that way in the first place, and making it soft is expensive.


The software world has had 50 years to learn these lessons. The most successful people and organizations have taken them to heart. Learning to resist software's fool's gold consistently is one of the first steps the software industry needs to take on the road to creating a true profession of software engineering.





    [ Team LiB ]



    Defining the Process Domain










     < Free Open Study > 





    Defining the Process Domain



    Domain processes are the interrelated activities that are specific to the functioning of the organization for which a software project is developed. The first step is to determine the domain in which the product will eventually be used. For all domain analysis, the critical point of view is that of the ultimate end-user. Valid analyses and evaluations of options must be done from this view. If there is no "real" customer available, then we must rely on a secondary "voice of the customer." This is usually someone from the marketing organization. Even when a matrix is viewed from the "developer's" perspective, the customer is ever present.



    The measure of quality for a software project is based on how well the software solves specific domain-related problems. For the software customer, the view is from his business domain, not that of a computer scientist or software engineer. For this reason, to deliver quality software, the project manager must understand the domain for which the software solves specific needs. For software product development, there are six classes of product domains:



    1. Consumer

    2. Business

    3. Industrial

    4. Real-time

    5. Really timely

    6. Scientific



    Individuals buy consumer products for personal use. This use could be at home, while traveling, or at work. The key here is that the consumer market is a mass market and is usually addressing a price-sensitive purchaser. Examples of consumer products utilizing software are cellular phones, automobiles, televisions, personal computers, and personal digital assistants.



    The majority of software products are targeted at the business product domain. Here the key is providing a cost-effective product to the business customer that will improve overall business profits. These products are usually expensive compared to consumer products and have maintenance, service, and installation services available as a necessary part of the product. Examples of these types of products are database tools such as Oracle™, enterprise resource planning products such as PeopleSoft™, development tool suites such as WebSphere™ and VisualAge™, and operating systems such as Solaris™.



    Industrial products are a specific subclass of business products. These are software tools that are purchased for the specific purposes of machine automation, factory automation and integration, and embedded control software. These are special-purpose and usually focused on a specific industry such as automotive, food processing, and semiconductor fabrication. This domain has the highest percentage of product customization and integration with legacy systems. Examples of these products are factory automation software from FactoryWorks™, embedded development systems from Wind River, and process modeling tools such as Hewlett-Packard's Vee™.



    Real-time products are used to control processes that have a defined and finite time budget. Real-time systems are used for data collection of events that last less than a microsecond. Real-time products control embedded medical devices such as pacemakers, where information must literally be processed between heartbeats. These products also work in the interface between the collection of analog data such as voice or music and its conversion to digital data that can be stored on a magnetic disk or CD-ROM. All real-time software is written specifically for the target hardware on which it executes.



    Really timely, as opposed to real-time, software products must execute within a time budget that does not irritate the end user. Examples of this are the software that runs ATM machines and does credit card verification while ordering over the Internet. Most really timely software products are a part of either business or industrial software products. They are broken out as a subclass because of the potential for causing customer irritability if they do not function effectively.



    Scientific products simulate real-world activities using mathematics. Real-world objects are turned into mathematical models. Executing formulas simulates the actions of the real-world objects. For example, some of an airplane's flight characteristics can be simulated in the computer. Rivers, lakes, and mountains can be simulated. Virtually any object with known characteristics can be modeled and simulated. Simulations use enormous calculations and often require supercomputer speed. As personal computers become more powerful, more laboratory experiments will be converted into computer models that can be interactively examined by students without the risk and cost of the actual experiments. Members of this product domain are Matlib™ for large mathematical formula development, Analytica™ for developing large-scale business models, and Expert Choice for developing large-scale decision support systems. Scientific software products are usually special-purpose tool kits for problem solving.



    The question now arises, "What about the government market?" For the six classes of software product domains as defined, all of them could be "government" customers. Where the separation of government from private customers comes into play is in the areas of business plans, legal concerns, contracting, and risk analysis.



    Four classes of product systems look at ways that the software product will be built and delivered from the developer's perspective. These four have different product development plans and life cycles. Although all product development is an iterative process, in the real business world there is usually an existing product portfolio. During the conceptual stage, the project manager will have worked on the product concept and selected a preliminary life cycle. That earlier work influences the selection of one or more of these product system classes:



    1. New software product

    2. Re-engineering of existing product

    3. Component integration

    4. Heroic maintenance



    A new software product starts with a set of requirements and moves through its development life cycle to delivery. It will use some sort of development tools and possibly object libraries, where appropriate. This is building a truly new software product for taking advantage of a new technology such as the Internet or using a new set of programming tools such as Java. It may also be a new market opportunity because of changes in government regulations such as telecommunications or banking deregulation.



    Re-engineering existing product is simply that. This product already exists in a form that may use outmoded software technology or be hosted on obsolete hardware. An example would be a DOS-based data collection system that would be re-engineered to run on Linux.



    Taking available commercial-off-the-shelf (COTS) products and integrating them into a product is component integration. An example of this type of product is taking an available embedded database tool along with a script-generation tool and a graphical user interface (GUI) generator to produce a new product that is used for integrating factory equipment into the overall manufacturing execution system.



    Heroic maintenance occurs when a company wants to wring the last bit of revenue out of an existing software product that has been passed over for re-engineering. Software product companies take great care in the management and timing of the releases of new capabilities within their product portfolios. When completely new products or massively re-engineered products are released, there is always a potential to cannibalize existing customer sales instead of having new customers buy the new product. Timing decisions may result in the delay of the newest product and the release of the old product with heroic work done to dress it up in new clothes. An example of this is once again our DOS system: Instead of re-engineering the entire system, the command-line interface was replaced with a pseudo-GUI. This is known in the software industry as "same dog, new collar!"



    The first matrix to be developed by the project manager involves identifying the product domain type, illustrated in Figure 5-3. This product domain type resides at the intersection of the six product domain classes and the product type classes. A software product can be defined to exist in multiple cells on this matrix. For example, suppose that there is a new, Web-based software product for registering personal DVD movies in a trading club in which points are earned to borrow (or a small fee is paid, if points are insufficient). This product would "live" in the consumer and really timely product domain classes as both a new software product and component integration. Although the concept of the product is new and new software would be developed, there are many libraries of components available for use. This example is represented in the matrix with an X in the relevant cell.



    Figure 5-3. Step 1�Identify the Product Domain Type



    Another example product is an enhancement to an existing factory-integration product to take information from past process steps and determine the optimum process for the product through the factory based on past production yield information and customer orders. We can tell immediately that this will be re-engineering an existing product, but some new software also will be developed. This product may touch four of the product domain classes: business, industrial, real-time, and scientific. Business could be touched because of accessing historic information on production yields. Industrial and real-time apply because it will be operating on factory automation equipment. The scientific piece comes in with the optimization algorithms necessary for determining the best individual product flow through the factory. This example is represented in the matrix with an O in the relevant cell.



    The third part of defining the process domain is the product component classes. This set of classes is also viewed from the perspective of the end-user. There are six members of the class, and the key question to ask is, "What does the customer expect to have delivered?" The software development project manager must discover whether the end-user has a product expectation. Six product component classes exist:



    1. Software

    2. Hardware

    3. People

    4. Database

    5. Documentation

    6. Procedures



    If a project is to develop a "pure" software product, the end-user has an expectation that he will get an installation set of media or an access key to a remote site to download the product. This is the way most consumer software is purchased�the only items received are the media or a digital file.



    Many products are turnkey: The developed software is hosted on hardware. Buying a cellular phone usually dictates the software running within the hardware. Although the software is a critical system component, the customer purchases the hardware.



    People are a critical part of many software products. Enterprise-wide software systems used for financial management, factory control, and product development may require consulting services to be "purchased" along with the software to aid in product installation, integration, and adoption into a specific environment.



    Database products, although most definitely software, are separated as a distinct class because of the expectations that accompany the purchase of this class of complex software. A database product is usually purchased as a separate, general-purpose tool kit to be used as an adjunct to all of a company's other information systems. More "software" products are delivered with an embedded database package within the product. It is important for the customer to realize that he is purchasing not only the "new" software, but also a database product.



    Documentation is almost always a part of the product. In some cases, it may be books and manuals purchased as a "shrink-wrapped," off-the-shelf software product. Many complex enterprise software products have third-party authors writing usage and tips books sold through commercial bookstores. If downloaded, the digital files may include a "readme" file and possibly complete soft copy documentation. Acquiring software from some sources such as SourceForge (www.SourceForge.com) may provide no documentation other that the software's source code.



    Procedures or business rules are a final component class. In situations in which the customer is buying systems and software used for decision support, equipment control, and component integration, it is important for the customer to understand the procedure deliverables. Usually the custom development of the business rules for an organization are done by either the organization itself or consultants hired from the software company. This can be a very gray area, and it is important that the project manager understand all of the project deliverables early in the development life cycle, especially those that can cause customer dissatisfaction and demonstrate a lack of quality.



    Now that the third set of domain classes has been defined, the project manager can fill out the last two matrices. The next one to complete is the identification of the critical product components, shown in Figure 5-4. This matrix is a table of the product component classes matched with the classes of product systems. This matrix provides us with the deliverables for the defined product based on whether it is new software, re-engineered software, a component integration product, or the heroic maintenance of a legacy product within the company's portfolio. Remember, the Web example is the X and the factory integration is the O.



    Figure 5-4. Step 2�Identify Critical Product Components



    For example, our Web-based software product for registering personal DVD movies was determined to be both a new software product and component integration. The critical product components for this product are software and documentation. It is Web-based and runs from a browser running on the customer's personal hardware. The customer will see no database, people, or procedures. The only documentation may be the instructions on the Web page itself.



    Our other example product, an enhancement to an existing factory integration product, involves re-engineering an existing product and some new software development. Based on how the product is to be marketed, the customer will see all the component classes except hardware. He will expect software to be delivered along with a field applications engineer to do the installation and acceptance testing within the customer's factory. The customer will also expect a database to keep the real-time product status and yield information along with the procedures for running the optimization algorithms. Documentation will be critical to both a company's engineers doing the installation and the customers after the product is accepted.



    The third matrix that the project manager produces to define the domain is to link the product domains to the delivered components. This matrix shown in Figure 5-5 is a table of the product component classes matched with the product domain classes. This matrix provides us with the deliverables for the defined product based on whether it is going to be installed into a consumer, business, industrial, real-time, really timely, or scientific domain.



    Figure 5-5. Step 3�Link Product Domains with Components



    Using our two examples, the Web-based software product for registering personal DVD movies in a trading club would "live" in the consumer and really timely product domain classes. The deliverables would be software and documentation. The second example, an enhancement to an existing factory integration product, touches four of the product domain classes: business, industrial, real-time, and scientific. The deliverable components are software, people, database, documentation, and procedures.



    Chapter 4, "Selecting Software Development Life Cycles," provided the descriptions of commonly used software development life cycles and the selection criteria for each. When compared to the overall company versus product life cycles, the software development life cycle is assumed within the acquisition phase of the product life cycle. Figure 5-1 shows this.



    A project manager must understand the relationship within his organization of software development within product life cycles. A typical product development life cycle begins with a development or acquisition phase during which the product is built or acquired. Figure 5-6 represents the product development phase. The project manager works hand in hand with the product manager to plan for the manufacturing of the product. This phase is the production ramp. Investment is made on the infrastructure for product manufacturing, and first products are built. After the production ramp, the software portion of the product is out of the hands of the software project manager, except for problem fixes.



    Figure 5-6. Software Development Life Cycle



    Figure 5-7 shows the entire product life cycle plotted in months versus thousands of dollars invested. The dollars of investment on the left side of the graph and below the zero line is the estimated investment in the product. The dollars above the zero line are the estimated revenue dollars that the product will earn. This type of information is usually developed by marketing and is a critical part of the return on investment that the product will make.



    Figure 5-7. Software Product Life Cycle



    Finally, the relationship between the development and product life cycles is graphically represented in Figure 5-8. This relationship is critical to keep in mind as the project manager works through the product development process. In the product world, the product life cycle drives decisions and investment. Only the investment part of the software development life cycle is important to product managers planning product portfolios.



    Figure 5-8. Software Product and Development Life Cycles












       < Free Open Study >