Wednesday, November 11, 2009

10.8 Summary



[ Team LiB ]






10.8 Summary


We have looked at a simple SCTP client and server spanning about 150 lines of code. Both the client and server used the one-to-many-style SCTP interface. The server was constructed in an iterative style, common when using the one-to-many-style interface, receiving each message and responding on either the same stream the message was sent on or on one stream higher. We then looked at the head-of-line blocking problem. We modified our client to emphasize the problem and to show how SCTP streams can be used to avoid this problem. We looked at how the number of streams can be manipulated using one of the many socket options available to control SCTP behavior. Finally, we again modified our server and client so that they could be made to either abort an association including a user upper layer reason code, or in our server's case, shut down the association gracefully after sending a message.


We will examine SCTP more deeply in Chapter 23






    [ Team LiB ]



    Chapter 1 Module Decomposition View











    Team-Fly

     

     

















    Documenting Software Architectures: Views and Beyond
    By
    Paul Clements, Felix Bachmann, Len Bass, David Garlan, James Ivers, Reed Little, Robert Nord, Judith Stafford
    Table of Contents
    Appendix A. 
    Excerpts from a Software Architecture Documentation Package







    Chapter 1 Module Decomposition View


    The decomposition view consists of 14 view packets. View packet 1 shows the decomposition of the entire ECS system into a group of three segments, each of which is further decomposed into a number of subsystems. Subsequent view packets (2�14) show the further decomposition of each of the subsystems.


    1.1 Module Decomposition View Packet 1: The ECS System


    1.1.1 Primary Presentation[9]

    [9] This is an example of a textual primary presentation. Text, such as a table or an outline, is sometimes superior to graphical presentations, which can easily become cluttered and difficult to lay out when the number of elements is large or when more than one level of decomposition is shown. A tabular primary presentation also can be combined with the element catalog in many cases, although that option is not exercised in this example.





















    SystemSegment
    ECSScience Data Processing Segment (SDPS)
     Communications and System Management Segment (CSMS)
     Flight Operations Segment (FOS)


    1.1.2 Element Catalog

    1.1.2.1 Elements and Their Properties

    Properties of ECS modules are



    • Name, given in the following table


    • Responsibility, given in the following table


    • Visibility; all elements are visible across the entire system


    • Implementation information: See the implementation view in Volume II, Chapter 9






















    Element NameElement Responsibility
    SDPSThe Science Data Processing Segment (SDPS) receives, processes, archives, and manages all data from EOS and other NASA Probe flight missions. It provides support to the user community in accessing the data and products resulting from research activities that use this data. SDPS also promotes, through advertisement services, the effective use and exchange of data within the user community. Finally, the SDPS plays a central role in providing the science community with the proper infrastructure for development, experimental use, and quality checking of new Earth science algorithms. SDPS is a distributed system, and its components are located at eight Distributed Active Archive Centers (DAACs).
    CSMSThe Communications and System Management Segment (CSMS) focuses on the system components involved with the interconnection of user and service providers and with system management of the ECS components. The CSMS is composed of three major subsystems; here, they are introduced simply to explain the system configuration. They are the Communications Subsystem (CSS), the Internet-working Subsystem (ISS), and System Management Subsystem (MSS). The MSS, which includes several decentralized local system management capabilities at the DAAC sites and the mission operation center, provides system management services for the EOS ground system resources. The services provided by the MSS, even though they rely on the CSS-provided services, are largely allocable to the application domain.
    FOSThe Flight Operations Segment (FOS) manages and controls the EOS spacecraft and instruments. The FOS is responsible for mission planning, scheduling, control, monitoring, and analysis in support of mission operations for U.S. EOS spacecraft and instruments. The FOS also provides investigator-site ECS software (the Instrument Support Terminal (IST) toolkit) to connect a Principal Investigator (PI) or Team Leader (TL) facility to the FOS in remote support of instrumental control and monitoring. PI/TL facilities are outside the FOS but connected to it by way of the EOSDIS Science Network (ESN). The FOS focuses on the command and control of the flight segment of EOS and the interaction it has with the ground operations of the ECS.


    1.1.2.2 Relations and Their Properties

    The relation type in this view is is-part-of. There are no exceptions or additions to the relations shown in the primary presentation.


    1.1.2.3 Element Interfaces

    Element interfaces for segments are given in subsequent decompositions.


    1.1.2.4 Element Behavior

    Not applicable.


    1.1.3 Context Diagram

    [omitted]


    1.1.4 Variability Guide

    None.


    1.1.5 Architecture Background

    1.1.5.1 Design Rationale



    • Rationale for three segments:
      In the system design phase, broadly scoped decisions were made about the overall architecture of the ECS. Three major activities of the system were established and influenced by the system specification. These activities were Flight Operations, Science Data Processing, and Communications and Systems Management; each of these activities was allocated as the responsibility of a segment. During this activity, the ECS was organized into subsystems under each segment, based primarily on the analysis of ECS requirements. After taking into account the science and evolutionary requirements of the ECS, it should be noted that not all segment-designated requirements were allocated to the implied segment; rather, requirements were allocated to the design subsystems that logically would implement the designated functional and performance requirements.




    [etc.]


    1.1.5.2 Results of Analysis



    • Change analysis:
      In June 2001, a change analysis was performed on the ECS architecture, using the Architecture Trade-off Analysis Method (ATAM). ATAM is a scenario-based method. Several scenarios dealt with likely changes and were applied to the module decomposition view. Results of the analysis can be found in the ATAM final report, available at http://www.ourinternalwebsite/ECS/documentation/architecture/atam_final_report.




    [etc.]


    1.1.5.3 Assumptions


    • Future network upgrades will provide performance and bandwidth equal or superior to current capabilities. Given that, changes in system communication and management functions are unlikely to have any detrimental impact on the amount or type of science data processing that can be performed by ECS.


    • Future needs for science data processing will require more, not less, capacity for computation, data storage, and communication.


    • Communications and system management functions are independent of specific data processing algorithms used and data products produced by ECS.


    • Communications and system management functions are independent of specific flight operation functions, such as planning and command transmission.



    [etc.]


    1.1.6 Other Information

    [omitted]


    1.1.7 Related View Packets


    • Parent: None.


    • Children


      - Module Decomposition View Packet 2: The Science Data Processing Segment (Volume II, Section 1.2, page 418)

      - Module Decomposition View Packet 10: The Communications and System Management Segment (CSMS), (Volume II, Section 1.10, page 422)

      - Module Decomposition View Packet 14: Flight Operations Segment (FOS), (Volume II, Section 1.14, page 424)


    • Siblings: None in this view. View packets in other views that express the same scope as this one�namely, the whole system�include


      - Module Layered View Packet 1: The ECS System, (Volume II, Section 4.1, page 435)

      - C&C Pipe-and-Filter View Packet 1: The ECS System, (Volume II, Section 5.1, page 439)

      - Allocation Deployment View Packet 1: The ECS System, (Volume II, Section 8.1, page 457)

      - Allocation Implementation View Packet 1: The ECS System, (Volume II, Section 9.1, page 461)

      - Allocation Work Assignment View Packet 1: The ECS System, (Volume II, Section 10.1, page 464)



    1.2 Module Decomposition View Packet 2: The Science Data Processing Segment


    1.2.1 Primary Presentation






























    SegmentSubsystem
    Science Data Processing Segment (SDPS)Client
    Interoperability
    Ingest
    Data Management
    Data Processing
    Data Server
    Planning


    1.2.2 Element Catalog

    1.2.2.1 Elements and Their Properties

    Properties of SDPS modules are



    • Name, given in the following table


    • Responsibility, given in the following table


    • Visibility; all elements are visible across the entire system


    • Implementation information: See the implementation view in Volume II, Chapter 9.






































    Element NameElement Responsibility
    ClientThe SDPS Client subsystem provides a collection of components through which users access the services and data available in ECS and other systems interoperable with ECS. The Client subsystem also includes the services needed to interface an application, such as a science algorithm, with ECS for data access or to make use of ECS provided toolkits.
    InteroperabilityIn general, support for the communication between SDPS clients and services is provided by CSMS, as described elsewhere. Any additional functions SDPS may require to support the interoperation of its components form part of the SDPS Interoperability subsystem and involve primarily the advertising service, which advertises service offerings.
    IngestA provider site within EOSDIS will normally need to ingest a wide variety of data types to support the services it wishes to offer. This data may be delivered through a wide variety of interfaces�network file transfer, machine-to-machine transfer, media, hard copy, and so on�with a wide variety of management approaches to these interfaces. This interface heterogeneity and the need to support extendability and new data/interfaces as algorithms and provider functionality changes is handled within the Ingest subsystem.
    Data ManagementThe Data Management subsystem is responsible for supporting the location, search, and access of data and service objects made available in the SDPS. The components of the subsystem decouple the location, search, and access functions from the components performing the data server and client interface functions, in order to accommodate the anticipated variety of users' search and access needs and to provide a growth path as capabilities evolve.
    Data ProcessingThe Data Processing subsystem is responsible for managing, queuing, and executing processes on the processing resources at a provider site. Requests for processing are submitted from the Planning subsystem, which in turn have been triggered by data arrival or user request (Data Server) or through Planning itself, such as reprocessing.
    Data ServerThis subsystem has the responsibility for storing Earth science and related data in a persistent fashion, providing search and retrieval access to this data, and supporting the administration of the data and the supporting hardware devices and software products. As part of its retrieval function, the subsystem also provides for the distribution of data on physical media.
    PlanningThe Planning subsystem supports the operations staff in developing a production plan based on a locally defined strategy, reserving the resources to permit the plan to be achieved and the implementation of the plan as data and processing requests are received. It also allows the site operations staff to negotiate on a common basis with other provider sites and EOSDIS management, via MSS, if any change to their production plan causes conflict with other provider sites' plans, such as where dependencies between processing algorithms cannot be fulfilled.


    1.2.2.2 Relations and Their Properties

    The relation type of concern in this view is is-part-of. Every subsystem is part of exactly one segment, namely, the Science Data Processing Segment, as shown in the primary presentation.


    1.2.2.3 Element Interfaces

    [omitted][10]


    [10] For examples of interface specifications, see Chapter 7.


    1.2.2.4 Element Behavior

    Not applicable.


    1.2.3 Context Diagram


    1.2.4 Variability Guide

    None.


    1.2.5 Architecture Background

    [omitted]


    1.2.6 Other Information

    [omitted]


    1.2.7 Related View Packets


    • Parent: Module Decomposition View Packet 1: The ECS System (Volume II, Section 1.1, page 414)


    • Children


      - Module Decomposition View Packet 3: The Client Subsystem (Volume II, Section 1.3, page 422)

      - Module Decomposition View Packet 4: The Interoperability Subsystem (Volume II, Section 1.4, page 422)

      - Module Decomposition View Packet 5: The Ingest Subsystem (Volume II, Section 1.5, page 422)

      - Module Decomposition View Packet 6: The Data Management Subsystem (Volume II, Section 1.6, page 422)

      - Module Decomposition View Packet 7: The Data Processing Subsystem (Volume II, Section 1.7, page 422)

      - Module Decomposition View Packet 8: The Data Server Subsystem (Volume II, Section 1.8, page 422)

      - Module Decomposition View Packet 9: The Planning Subsystem (Volume II, Section 1.9, page 422)


    • Siblings


      - Module Decomposition View Packet 10: The Communications and System Management Segment (CSMS) (Volume II, Section 1.10, page 422)

      - Module Decomposition View Packet 14: Flight Operations Segment (FOS) (Volume II, Section 1.14, page 424)



    [The following view packets are omitted from the example. Each of them would refer to Module Decomposition View Packet 2: The Science Data Processing Segment as their parent view packet and to one another as their sibling view packets. Finally, each could be further decomposed into finer-grained modules; in fact, for a system the size of ECS, this would be highly likely.]


    1.3 Module Decomposition View Packet 3: The Client Subsystem


    1.4 Module Decomposition View Packet 4: The Interoperability Subsystem


    1.5 Module Decomposition View Packet 5: The Ingest Subsystem


    1.6 Module Decomposition View Packet 6: The Data Management Subsystem


    1.7 Module Decomposition View Packet 7: The Data Processing Subsystem


    1.8 Module Decomposition View Packet 8: The Data Server Subsystem


    1.9 Module Decomposition View Packet 9: The Planning Subsystem


    1.10 Module Decomposition View Packet 10: The Communications and System Management Segment (CSMS)


    1.10.1 Primary Presentation


















    SegmentSubsystem
    Communications and System Management Segment (CSMS)System Management
    Communications
    Internetworking


    1.10.2 Element Catalog

    1.10.2.1 Elements and Their Properties

    Properties of CSMS modules are



    • Name, given in the following table


    • Responsibility, given in the following table


    • Visibility; all elements are visible across the entire system


    • Implementation information: For this information, see the implementation view in Volume II, Chapter 9.






















    Element NameElement Responsibility
    System ManagementThe System Management subsystem is made of two classes: the manager and the managed objects. The manager uses management applications�typically, fault, performance, accounting, configuration, and security management; communications services�agents that manage the communications traffic between the manager and the services; and an information model that defines the information flow between the manager and the managed objects. The manager also uses several applications to monitor and to configure system resources, or managed objects, as required.
    CommunicationsThe Communications subsystem consists of the session, presentation, and application layers from an open system interconnection-reference model perspective. The Communications subsystem provides support for peer-to-peer, advanced distributed, messaging, management, and event-handling communications facilities. The Communications subsystem is functionally dependent on the services provided by the Internetworking subsystem.
    InternetworkingThe Internetworking subsystem consists of the physical, data link, network, and transport layers, according to the open systems interconnection-reference model specified by ISO 7498:1994, Open System Interconnection. The Internetworking subsystem supports alternative transports between communicating end stations; alternative networking methods between end systems and intermediate systems; and alternative circuit, packet, or cell-based LAN and WAN distribution services.


    [The remainder of this view packet is omitted from the example.]


    1.11 Module Decomposition View Packet 11: The System Management Subsystem


    [omitted]


    1.12 Module Decomposition View Packet 12: The Communications Subsystem


    [omitted]


    1.13 Module Decomposition View Packet 13: The Internetworking Subsystem


    [omitted]


    1.14 Module Decomposition View Packet 14: Flight Operations Segment (FOS)


    1.14.1 Primary Presentation

































    SegmentSubsystem
    Flight Operations Segment (FOS)Planning and Scheduling
    Data Management
    Command Management
    Commanding
    Resource Management
    Telemetry
    User Interface
    Analysis


    1.14.2 Element Catalog

    1.14.2.1 Elements and Their Properties

    Properties of FOS modules are



    • Name, given in the following table


    • Responsibility, given in the following table


    • Visibility: all elements are visible across the entire system


    • Implementation information; See the implementation view in Volume II, Chapter 9.










































    Element NameResponsibility
    Planning and SchedulingThe Planning and Scheduling subsystem integrates plans and schedules for spacecraft, instrument, and ground operations and coordinates DARs for U.S. instruments and multi-instrument observations, if any. Planning and Scheduling provides the operational staff with a common set of capabilities to perform what-if analyses and to visualize plans and schedules.
    Data ManagementThe Data Management subsystem is responsible for maintaining and updating the Project Database (PDB) and the FOS history log.
    Command ManagementThe Command Management subsystem manages the preplanned command data for the spacecraft and instruments. Based on inputs received from Planning and Scheduling, Command Management collects and validates the commands, software memory loads, table loads, and instrument memory loads necessary to implement the instrument and spacecraft scheduled activities.
    CommandingThe Commanding subsystem is responsible for transferring command data�real-time commands or command loads�to EDOS for uplink to the spacecraft during each real-time contact. Command data can be received in real time by the operational staff or as preplanned command groups generated by the Command Management subsystem. The Commanding subsystem is also responsible for verifying command execution on-board the spacecraft.
    Resource ManagementThe Resource Management subsystem provides the capability to manage and monitor the configuration of the EOC: configuring EOC resources for multimission support, facilitating failure recovery during real-time contacts, and managing the real-time interface with the NCC.
    TelemetryThe Telemetry subsystem receives and processes housekeeping telemetry in CCSDS packets from EDOS. After the packet decommutation, the telemetry data is converted to engineering units and checked against boundary limits.
    User InterfaceThe User Interface subsystem provides character-based and graphical display interfaces for FOS operators interacting with all the previously described FOS subsystems.
    AnalysisThe Analysis subsystem is responsible for managing the on-board systems and for the overall mission monitoring. Its functions include performance analysis and trend analysis. It also cooperates with the Telemetry subsystem to support fault detection and isolation.


    [The remainder of this view packet is omitted from the example. Subsequent view packets that further decompose this segment's eight subsystems�Planning and Scheduling, Data Management, Command Management, Commanding, Resource Management, Telemetry, User Interface, and Analysis�are also omitted.]












      Team-Fly

       

       





      Top



      3.4. Bug Tracker











       < Day Day Up > 







      3.4. Bug Tracker





      Bug tracking is a broad topic; various

      aspects of it are discussed throughout this book. Here

      I'll try to concentrate mainly on setup and

      technical considerations, but to get to those, we have to start with

      a policy question: exactly what kind of information should be kept in

      a bug tracker?





      The term bug tracker is misleading. Bug

      tracking systems are also frequently used to track new feature

      requests, one-time tasks, unsolicited patches�really anything

      that has distinct beginning and end states, with optional transition

      states in between, and that accrues information over its lifetime.

      For this reason, bug trackers are also called issue

      trackers
      , defect trackers,

      artifact trackers, request

      trackers
      , trouble ticket systems,

      etc. See Appendix B for a list of software.





      In this book, I'll continue to use bug

      tracker
      for the software that does the tracking, because

      that's what most people call it, but will use

      issue to refer to a single item in the bug

      tracker's database. This allows us to distinguish

      between the behavior or misbehavior that the user encountered (that

      is, the bug itself), and the tracker's

      record of the bug's discovery,

      diagnosis, and eventual resolution. Keep in mind that although most

      issues are about actual bugs, issues can be used to track other kinds

      of tasks too.





      The classic issue life cycle looks like this:





      1. Someone files the issue. She provides a summary, an initial

        description (including a reproduction recipe, if applicable; see

        Section 8.1.5 in Chapter 8 for how

        to encourage good bug reports), and whatever other information the

        tracker asks for. The person who files the issue may be totally

        unknown to the project�bug reports and feature requests are as

        likely to come from the user community as from the developers.

        Once filed, the issue is in what's called an

        open state. Because no action has been taken

        yet, some trackers also label it as unverified

        and/or unstarted. It is not assigned to

        anyone; or, in some systems, it is assigned to a fake user to

        represent the lack of real assignation. At this point, it is in a

        holding area: the issue has been recorded, but not yet integrated

        into the project's consciousness.

      2. Others read the issue, add comments to it, and perhaps ask the

        original filer for clarification on some points.

      3. The bug gets reproduced. This may be the most

        important moment in the life cycle. Although the bug is not actually

        fixed yet, the fact that someone besides the original filer was able

        to make it happen proves that it is genuine, and, no less

        importantly, confirms to the original filer that

        she's contributed to the project by reporting a real

        bug.

      4. The bug gets diagnosed: its cause is

        identified, and if possible, the effort required to fix it is

        estimated. Make sure these things get recorded in the issue; if the

        person who diagnosed the bug suddenly has to step away from the

        project for a while (as can often happen with volunteer developers),

        someone else should be able to pick up where she left off.

        In this stage, or sometimes the previous one, a developer may

        "take ownership" of the issue and

        assign it to herself (Section 8.1.1.1 in Chapter 8 examines the

        assignment process in more detail). The issue's

        priority may also be set at this stage. For

        example, if it is so severe that it should delay the next release,

        that fact needs to be identified early, and the tracker should have

        some way of noting it.

      5. The issue gets scheduled for resolution. Scheduling

        doesn't necessarily mean naming a date by which it

        will be fixed. Sometimes it just means deciding which future release

        (not necessarily the next one) the bug should be fixed by, or

        deciding that it need not block any particular release. Scheduling

        may also be dispensed with, if the bug is quick to fix.

      6. The bug gets fixed (or the task completed, or the patch applied, or

        whatever). The change or set of changes that fixed it should be

        recorded in a comment in the issue, after which the issue is

        closed and/or marked as

        resolved.



      There are some common variations on this life cycle. Sometimes an

      issue is closed very soon after being filed, because it turns out not

      to be a bug at all, but rather a misunderstanding on the part of the

      user. As a project acquires more users, more and more such invalid

      issues will come in, and developers will close them with increasingly

      short-tempered responses. Try to guard against the latter tendency.

      It does no one any good, as the individual user in each case is not

      responsible for all the previous invalid issues; the statistical

      trend is visible only from the developers' point of

      view, not the user's. (In Section 3.4.2 later in this chapter,

      we'll look at techniques for reducing the number of

      invalid issues.) Also, if different users are experiencing the same

      misunderstanding over and over, it might mean that aspect of the

      software needs to be redesigned. This sort of pattern is easiest to

      notice when there is an issue manager monitoring the bug database;

      see Section 8.2.4 in Chapter 8.





      Another common life cycle variation is for the issue to be closed as

      a duplicate soon after Step 1. A duplicate is

      when someone files an issue that's already known to

      the project. Duplicates are not confined to open issues:

      it's possible for a bug to come back after having

      been fixed (this is known as a regression), in

      which case the preferred course is usually to reopen the original

      issue and close any new reports as duplicates of the original one.

      The bug tracking system should keep track of this relationship

      bidirectionally, so that reproduction information in the duplicates

      is available to the original issue, and vice versa.





      A third variation is for the developers to close the issue, thinking

      they have fixed it, only to have the original reporter reject the fix

      and reopen it. This is usually because the developers simply

      don't have access to the environment necessary to

      reproduce the bug, or because they didn't test the

      fix using the exact same reproduction recipe as the reporter.





      Aside from these variations, there may be other small details of the

      life cycle that vary depending on the tracking software. But the

      basic shape is the same, and while the life cycle itself is not

      specific to open source software, it has implications for how open

      source projects use their bug trackers.





      As Step 1 implies, the tracker is as much a public face of the

      project as the mailing lists or web pages. Anyone may file an issue,

      anyone may look at an issue, and anyone may browse the list of

      currently open issues. It follows that you never know how many people

      are waiting to see progress on a given issue. While the size and

      skill of the development community constrains the rate at which

      issues can be resolved, the project should at least try to

      acknowledge each issue the moment it appears. Even if the issue

      lingers for a while, a response encourages the reporter to stay

      involved, because she feels that a human has registered what she has

      done (remember that filing an issue usually involves more effort

      than, say, posting an email). Furthermore, once an issue is seen by a

      developer, it enters the project's consciousness, in

      the sense that the developer can be on the lookout for other

      instances of the issue, can talk about it with other developers, etc.





      The need for timely reactions implies two things:





      • The tracker must be connected to a mailing list, such that every

        change to an issue, including its initial filing, causes a mail to go

        out describing what happened. This mailing list is usually different

        from the regular development list, since not all developers may want

        to receive automated bug mails, but (just as with commit mails) the

        Reply-to header should be set to the development mailing list.

      • The form for filing issues should capture the

        reporter's email address, so she can be contacted

        for more information. (However, it should not

        require the reporter's email

        address, as some people prefer to report issues anonymously. See

        Section 3.7.1.2 later in this chapter

        for more on the importance of anonymity.)





      3.4.1. Interaction with Mailing Lists





      Make sure the bug tracker doesn't turn into a

      discussion forum. Although it is important to maintain a human

      presence in the bug tracker, it is not fundamentally suited to

      real-time discussion. Think of it rather as an archiver, a way to

      organize facts and references to other discussions, primarily those

      that take place on mailing lists.





      There are two reasons to make this distinction. First, the bug

      tracker is more cumbersome to use than the mailing lists (or than

      real-time chat forums, for that matter). This is not because bug

      trackers have bad user interface design; it's just

      that their interfaces were designed for capturing and presenting

      discrete states, not free-flowing discussions. Second, not everyone

      who should be involved in discussing a given issue is necessarily

      watching the bug tracker. Part of good issue management (see

      "Share Management Tasks as Well as Technical

      Tasks" in Chapter 8) is to make sure each issue is

      brought to the right peoples' attention, rather than

      requiring every developer to monitor all issues. In Section 6.5 in Chapter 6,

      we'll look at ways to make sure people

      don't accidentally siphon discussions out of

      appropriate forums and into the bug tracker.





      Some bug trackers can monitor mailing lists and automatically log all

      emails that are about a known issue. Typically they do this by

      recognizing the issue's identifying number in the

      subject line of the mail, as part of a special string; developers

      learn to include these strings in their mails to attract the

      tracker's notice. The bug tracker may either save

      the entire email, or (even better) just record a link to the mail in

      the regular mailing list archive. Either way, this is a very useful

      feature; if your tracker has it, make sure both to turn it on and to

      remind people to take advantage of it.









      3.4.2. Prefiltering the Bug Tracker





      Most issue databases eventually suffer from the same problem: a

      crushing load of duplicate or invalid issues filed by well-meaning

      but inexperienced or ill-informed users. The first step in combatting

      this trend is usually to put a prominent notice on the front page of

      the bug tracker, explaining how to tell if a bug is really a bug, how

      to search to see if it's already been filed, and

      finally, how to effectively report it if one still thinks

      it's a new bug.





      This will reduce the noise level for a while, but as the number of

      users increases, the problem will eventually come back. No individual

      user can be blamed for it. Each one is just trying to contribute to

      the project's well-being, and even if their first

      bug report isn't helpful, you still want to

      encourage them to stay involved and file better issues in the future.

      In the meantime, though, the project needs to keep the issue database

      as free of junk as possible.





      The two things that will do the most to prevent this problem are:

      making sure there are people watching the bug tracker who have enough

      knowledge to close issues as invalid or duplicates the moment they

      come in, and requiring (or strongly encouraging) users to confirm

      their bugs with other people before filing them in the tracker.





      The first technique seems to be used universally. Even projects with

      huge issue databases (say, the Debian bug tracker at http://bugs.debian.org/, which contained

      315,929 issues as of this writing) still arrange things so that

      someone sees each issue that comes in. It may be

      a different person depending on the category of the issue. For

      example, the Debian project is a collection of software packages, so

      Debian automatically routes each issue to the appropriate package

      maintainers. Of course, users can sometimes misidentify an

      issue's category, with the result that the issue is

      sent to the wrong person initially, who may then have to reroute it.

      However, the important thing is that the burden is still

      shared�whether the user guesses right or wrong when filing,

      issue watching is still distributed more or less evenly among the

      developers, so each issue is able to receive a timely response.





      The second technique is less widespread, probably because

      it's harder to automate. The essential idea is that

      every new issue gets "buddied" into

      the database. When a user thinks he's found a

      problem, he is asked to describe it on one of the mailing lists, or

      in an IRC channel, and get confirmation from someone that it is

      indeed a bug. Bringing in that second pair of eyes early can prevent

      a lot of spurious reports. Sometimes the second party is able to

      identify that the behavior is not a bug, or is fixed in recent

      releases. Or she may be familiar with the symptoms from a previous

      issue, and can prevent a duplicate filing by pointing the user to the

      older issue. Often it's enough just to ask the user

      "Did you search the bug tracker to see if

      it's already been reported?" Many

      people simply don't think of that, yet are happy to

      do the search once they know someone's

      expecting them to.





      The buddy system can really keep the issue database clean, but it has

      some disadvantages too. Many people will file solo anyway, either

      through not seeing, or through disregarding, the instructions to find

      a buddy for new issues. Thus it is still necessary for volunteers to

      watch the issue database. Furthermore, because most new reporters

      don't understand how difficult the task of

      maintaining the issue database is, it's not fair to

      chide them too harshly for ignoring the guidelines. Thus the

      volunteers must be vigilant, and yet exercise restraint in how they

      bounce unbuddied issues back to their reporters. The goal is to train

      each reporter to use the buddying system in the future, so that there

      is an ever-growing pool of people who understand the issue-filtering

      system. On seeing an unbuddied issue, here are the ideal steps:





      1. Immediately respond to the issue, politely thanking the user for

        filing, but pointing them to the buddying guidelines (which should,

        of course, be prominently posted on the web site).

      2. If the issue is clearly valid and not a duplicate, approve it anyway,

        and start it down the normal life cycle. After all, the

        reporter's now been informed about buddying, so

        there's no point wasting the work done so far by

        closing a valid issue.

      3. Otherwise, if the issue is not clearly valid, close it, but ask the

        reporter to reopen it if they get confirmation from a buddy. When

        they do, they should put a reference to the confirmation thread

        (e.g., a URL into the mailing list archives).



      Remember that although this system will improve the signal/noise

      ratio in the issue database over time, it will never completely stop

      the misfilings. The only way to prevent misfilings entirely is to

      close off the bug tracker to everyone but developers�a cure

      that is almost always worse than the disease. It's

      better to accept that cleaning out invalid issues will always be part

      of the project's routine maintenance, and to try to

      get as many people as possible to help.





      See also Section 8.2.4 in Chapter

      8.



















         < Day Day Up > 



        Acknowledgments









        Acknowledgments




        Even though this book is essentially "my" book, it has been influenced in many ways (all of them good) by multiple individuals. Because the roles that each of these individuals played in the creative process were very significant, I would like to take the time to thank as many of them as I can remember here.


        Mary Ann Woychowsky, for understanding my "zoning out" when writing and for asking, "I guess the book is finished, right?" after catching me playing Morrowind when I should have been writing. Benjamin Woychowsky, for asking, "Shouldn't you be writing?" whenever I played a computer game. Crista Woychowsky, for disappearing with entire seasons of Star Gate SG-1, after catching me watching them when I should have been writing.


        My mother, Nan Gerling, for sharing her love of reading and keeping me in reading materials.


        Eric Garulay, of Prentice Hall, for marketing this book and putting me in touch with Catherine Nolan. Catherine Nolan, of Prentice Hall, for believing in this book and for her assistance in getting started with a book. Bruce Perens, for his belief that because I use Firefox, I had not tread too far down the path that leads to the dark side. Denise Mickelson, of Prentice Hall, for making sure that I kept sending in chapters. Chris Zahn, of Prentice Hall, for his editing, for answering my often bizarre questions, and for his knowledge of things in general. Thanks to George Nedeff for managing the editorial and production workflow and Heather Fox for keeping this project in the loop and on track. Any errors remaining are solely my own.


        I would like to thank the late Jack Chalker for his assistance with what to look for in writing contracts and for essentially talking me through the process using words that I could understand. Also for his writing a number of science-fiction novels that have influenced the way that I look upon the world. After all, in the end, everything is about how we look upon the world.


        Dossy Shiobara, for answering several bizarre questions concerning MySQL.


        Richard Behrens, for his assistance in formulating my thoughts.


        Joan Susski, for making sure that I didn't go totally off the deep end when developing many of the techniques used in this book.


        Premkumar Ekkaladevi, who was instrumental in deciding just how far to push the technology.


        Jon (Jack) Foreman, for explaining to me that I can't know everything.


        David Sarisohn, who years ago gave a very understandable reason for why code shouldn't be obscure.


        Finally, to Francis Burke, Shirley Tainow, Thomas Dunn, Marion Sackrowitz, Frances Mundock, Barbara Hershey, Beverly Simon, Paul Bhatia, Joseph Muller, Rick Good, Jane Liefert, Joan Litt, Albert Nicolai, and Bill Ricker for teaching me how to learn.












        3.3 UNIX Process Creation and 'fork'



        [ Team LiB ]






        3.3 UNIX Process Creation and fork


        A process can create a new process by calling fork. The calling process becomes the parent, and the created process is called the child. The fork function copies the parent's memory image so that the new process receives a copy of the address space of the parent. Both processes continue at the instruction after the fork statement (executing in their respective memory images).



        SYNOPSIS

        #include <unistd.h>

        pid_t fork(void);
        POSIX

        Creation of two completely identical processes would not be very useful. The fork function return value is the critical characteristic that allows the parent and the child to distinguish themselves and to execute different code. The fork function returns 0 to the child and returns the child's process ID to the parent. When fork fails, it returns �1 and sets the errno. If the system does not have the necessary resources to create the child or if limits on the number of processes would be exceeded, fork sets errno to EAGAIN. In case of a failure, the fork does not create a child.



        Example 3.5 simplefork.c

        In the following program, both parent and child execute the x = 1 assignment statement after returning from fork.



        #include <stdio.h>
        #include <unistd.h>

        int main(void) {
        int x;

        x = 0;
        fork();
        x = 1;
        printf("I am process %ld and my x is %d\n", (long)getpid(), x);
        return 0;
        }


        Before the fork of Example 3.5, one process executes with a single x variable. After the fork, two independent processes execute, each with its own copy of the x variable. Since the parent and child processes execute independently, they do not execute the code in lock step or modify the same memory locations. Each process prints a message with its respective process ID and x value.


        The parent and child processes execute the same instructions because the code of Example 3.5 did not test the return value of fork. Example 3.6 demonstrates how to test the return value of fork.



        Example 3.6 twoprocs.c

        After fork in the following program, the parent and child output their respective process IDs.



        #include <stdio.h>
        #include <unistd.h>
        #include <sys/types.h>

        int main(void) {
        pid_t childpid;

        childpid = fork();
        if (childpid == -1) {
        perror("Failed to fork");
        return 1;
        }
        if (childpid == 0) /* child code */
        printf("I am child %ld\n", (long)getpid());
        else /* parent code */
        printf("I am parent %ld\n", (long)getpid());
        return 0;
        }


        The original process in Example 3.6 has a nonzero value of the childpid variable, so it executes the second printf statement. The child process has a zero value of childpid and executes the first printf statement. The output from these processes can appear in either order, depending on whether the parent or the child executes first. If the program is run several times on the same system, the order of the output may or may not always be the same.



        Exercise 3.7 badprocessID.c

        What happens when the following program executes?



        #include <stdio.h>
        #include <unistd.h>
        #include <sys/types.h>

        int main(void) {
        pid_t childpid;
        pid_t mypid;

        mypid = getpid();
        childpid = fork();
        if (childpid == -1) {
        perror("Failed to fork");
        return 1;
        }
        if (childpid == 0) /* child code */
        printf("I am child %ld, ID = %ld\n", (long)getpid(), (long)mypid);
        else /* parent code */
        printf("I am parent %ld, ID = %ld\n", (long)getpid(), (long)mypid);
        return 0;
        }

        Answer:


        The parent sets the mypid value to its process ID before the fork. When fork executes, the child gets a copy of the process address space, including all variables. Since the child does not reset mypid, the value of mypid for the child does not agree with the value returned by getpid.



        Program 3.1 creates a chain of n processes by calling fork in a loop. On each iteration of the loop, the parent process has a nonzero childpid and hence breaks out of the loop. The child process has a zero value of childpid and becomes a parent in the next loop iteration. In case of an error, fork returns �1 and the calling process breaks out of the loop. The exercises in Section 3.8 build on this program.


        Figure 3.2 shows a graph representing the chain of processes generated for Program 3.1 when n is 4. Each circle represents a process labeled by its value of i when it leaves the loop. The edges represent the is-a-parent relationship. AB means process A is the parent of process B.



        Figure 3.2. Chain of processes generated by Program 3.1 when called with a command-line argument of 4.







        Program 3.1 simplechain.c

        A program that creates a chain of n processes, where n is a command-line argument.



        #include <stdio.h>
        #include <stdlib.h>
        #include <unistd.h>

        int main (int argc, char *argv[]) {
        pid_t childpid = 0;
        int i, n;

        if (argc != 2){ /* check for valid number of command-line arguments */
        fprintf(stderr, "Usage: %s processes\n", argv[0]);
        return 1;
        }
        n = atoi(argv[1]);
        for (i = 1; i < n; i++)
        if (childpid = fork())
        break;

        fprintf(stderr, "i:%d process ID:%ld parent ID:%ld child ID:%ld\n",
        i, (long)getpid(), (long)getppid(), (long)childpid);
        return 0;
        }



        Exercise 3.8

        Run Program 3.1 for large values of n. Will the messages always come out ordered by increasing i?


        Answer:


        The exact order in which the messages appear depends on the order in which the processes are selected by the process scheduler to run. If you run the program several times, you should notice some variation in the order.




        Exercise 3.9

        What happens if Program 3.1 writes the messages to stdout, using printf, instead of to stderr, using fprintf?


        Answer:


        By default, the system buffers output written to stdout, so a particular message may not appear immediately after the printf returns. Messages to stderr are not buffered, but instead written immediately. For this reason, you should always use stderr for your debugging messages.



        Program 3.2 creates a fan of n processes by calling fork in a loop. On each iteration, the newly created process breaks from the loop while the original process continues. In contrast, the process that calls fork in Program 3.1 breaks from the loop while the newly created process continues for the next iteration.



        Program 3.2 simplefan.c

        A program that creates a fan of n processes where n is passed as a command-line argument.



        #include <stdio.h>
        #include <stdlib.h>
        #include <unistd.h>

        int main (int argc, char *argv[]) {
        pid_t childpid = 0;
        int i, n;

        if (argc != 2){ /* check for valid number of command-line arguments */
        fprintf(stderr, "Usage: %s processes\n", argv[0]);
        return 1;
        }
        n = atoi(argv[1]);
        for (i = 1; i < n; i++)
        if ((childpid = fork()) <= 0)
        break;

        fprintf(stderr, "i:%d process ID:%ld parent ID:%ld child ID:%ld\n",
        i, (long)getpid(), (long)getppid(), (long)childpid);
        return 0;
        }


        Figure 3.3 shows the process fan generated by Program 3.2 when n is 4. The processes are labeled by the value of i at the time they leave the loop. The original process creates n�1 children. The exercises in Section 3.9 build on this example.



        Figure 3.3. Fan of processes generated by Program 3.2 with a command-line argument of 4.







        Exercise 3.10

        Explain what happens when you replace the test



        (childpid = fork()) <= 0

        of Program 3.2 with



        (childpid = fork()) == -1

        Answer:


        In this case, all the processes remain in the loop unless the fork fails. Each iteration of the loop doubles the number of processes, forming a tree configuration illustrated in Figure 3.4 when n is 4. The figure represents each process by a circle labeled with the i value at the time it was created. The original process has a 0 label. The lowercase letters distinguish processes that were created with the same value of i. Although this code appears to be similar to that of Program 3.1, it does not distinguish between parent and child after fork executes. Both the parent and child processes go on to create children on the next iteration of the loop, hence the population explosion.




        Exercise 3.11

        Run Program 3.1, Program 3.2, and a process tree program based on the modification suggested in Exercise 3.10. Carefully examine the output. Draw diagrams similar to those of Figure 3.2 through Figure 3.4, labeling the circles with the actual process IDs. Use to designate the is-a-parent relationship. Do not use large values of the command-line argument unless you are on a dedicated system. How can you modify the programs so that you can use ps to see the processes that are created?


        Answer:


        In their current form, the programs complete too quickly for you to view them with ps. Insert the sleep(30); statement immediately before return in order to have each process block for 30 seconds before exiting. In another command window, continually execute ps -l. Section 3.4 explains why some of the processes may report a parent ID of 1 when sleep is omitted.




        Figure 3.4. Tree of processes produced by the modification of Program 3.2 suggested in Exercise 3.10.






        The fork function creates a new process by making a copy of the parent's image in memory. The child inherits parent attributes such as environment and privileges. The child also inherits some of the parent's resources such as open files and devices.


        Not every parent attribute or resource is inherited by the child. For instance, the child has a new process ID and of course a different parent ID. The child's times for CPU usage are reset to 0. The child does not get locks that the parent holds. If the parent has set an alarm, the child is not notified when the parent's alarm expires. The child starts with no pending signals, even if the parent had signals pending at the time of the fork.


        Although a child inherits its parent's process priority and scheduling attributes, it competes for processor time with other processes as a separate entity. A user running on a crowded time-sharing system can obtain a greater share of the CPU time by creating more processes. A system manager on a crowded system might restrict process creation to prevent a user from creating processes to get a bigger share of the resources.






          [ Team LiB ]



          8.5 CODE DIVISION MULTIPLE ACCESS











           < Day Day Up > 











          8.5 CODE DIVISION MULTIPLE ACCESS


          Code division multiple access (CDMA) technology has been developed for defense applications where secure communication is very important. In CDMA systems, a very large bandwidth channel is required, many times more than the bandwidth occupied by the information to be transmitted. For instance, if the actual bandwidth required is 1MHz, in CDMA systems, perhaps 80MHz is allocated. Such large bandwidths were available only with defense organizations, and hence CDMA was used initially only for defense applications. Because the spectrum is spread, these systems are also known as spread spectrum multiple access (SSMA) systems. In this category, there are two types of techniques: frequency hopping and direct sequence.










          In spread spectrum multiple access, a wide bandwidth channel is used. Frequency hopping and direct sequence CDMA are the two types of SSMA techniques.


















          Note 

          CDMA requires a large radio bandwidth. Becasue radio spectrum is a precious natural resource, CDMA systems did not become commercially popular and were used only in defense communication systems. However, in recent years, commercial CDMA systems are being widely deployed.


          Wireless local loops are the wireless links between subscriber terminals and the base stations connected to the telephone switches. CDMA is widely used in wireless local loops.





          8.5.1 Frequency Hopping (FH)


          Consider a system in which 1MHz bandwidth is required to transmit the data. Instead of allocating a radio channel of 1MHz only, a number of radio channels (say 79) will be allocated, each channel with 1MHz bandwidth. We need a very large spectrum, 79 times that of the actual requirement. When a station has to transmit its data, it will send the data in one channel for some time, switch over to another channel and transmit some more data, and again switch over to another channel and so on. This is known as frequency hopping (FH). When the transmitting station hops its frequency of transmission, only those stations that know the hopping sequence can receive the data. This will be a secure communication system if the hopping sequence is kept a secret between the transmitting and the receiving stations.


          Frequency hopping, as used in Bluetooth radio system, is illustrated in Figure 8.8. Here the frequency hopping is done at the rate of 1600 hops per second. Every 0.625 milliseconds, the frequency of operation will change. The terminal will receive the data for 0.625 msec in frequency f1, for 0.625 msec in f20, for 0.625 msec in f32, and so on. The hopping sequence (f1, f20, f32, f41) is decided between the transmitting and receiving stations and is kept secret.






          Figure 8.8: Frequency hopping.









          In frequency hopping (FH) systems, each packet of data is transmitted using a different frequency. A pseudo-random sequence generation algorithm decides the sequence of hopping.














          Frequency hopping is used in Global System for Mobile Communications (GSM) and Bluetooth radio systems.






          Note 


          Bluetooth radio system, which interconnects devices such as desktop, laptop, mobile phone, headphones, modems, and so forth within a range of 10 meters, uses the frequency hopping technique.







          8.5.2 Direct Sequence CDMA


          In direct sequence CDMA (DS-CDMA), each bit to be transmitted is represented by multiple bits. For instance, instead of transmitting a 1, a pattern of say 16 ones and zeros is transmitted, and instead of transmitting a 0, another pattern of 16 ones and zeros is transmitted. Effectively, we are increasing the data rate and hence the bandwidth requirement by 16 times. The number of bits to be transmitted in place of 1 or 0 is known as chipping rate. If the chipping code is kept a secret, only those stations that have the chipping code can decode the information. When multiple stations have to transmit, the chipping codes will be different for each station. If they are chosen in such a way that they are orthogonal to each other, then the data from different stations can be pushed on to the channel simultaneously without interference.


          As shown in Figure 8.9, in DS-CDMA, multiple terminals transmit on to the channel simultaneously. Because these terminals will have different chipping codes, there will be no interference.






          Figure 8.9: DS-CDMA.

          CDMA systems are now being widely deployed for cellular communications as well as 3G systems for accessing the Internet through wireless networks. CDMA systems are used in wireless local loops.










          In DS-CDMA, multiple terminals transmit on the same channel simultaneously, with different chipping codes. If the chipping code length is say 11 bits, both 1 and 0 are replaced by the 11-bit sequence of ones and zeros. This sequence is unique for each terminal.


















          Note 

          In IEEE 802.11 wireless local area network standard, 11-bit chipping code is used.






















           < Day Day Up > 



          3.8 Consistency and Organization




          I l@ve RuBoard










          3.8 Consistency and Organization



          Good style is only one element in creating a high-quality program.
          Consistency is also a factor. This book is organized with the table
          of contents at the front and the index at the back. Almost every book
          printed has a similar organization. This consistency makes it easy to
          look up a word in the index or find a chapter title in the table of
          contents.



          Unfortunately, the programming community has developed a variety of
          coding styles. Each has its own advantages and disadvantages. The
          trick to efficient programming in a group is to pick one style and
          use it consistently. That way you can avoid the problems and
          confusion that arise when programs written in different styles are
          combined.



          Good style is nice, but consistency is better.









            I l@ve RuBoard



            8.6. Forks











             < Day Day Up > 







            8.6. Forks





            In Section 4.1 in Chapter 4, we saw how the

            potential to fork has important effects on how

            projects are governed. But what happens when a fork actually occurs?

            How should you handle it, and what effects can you expect it to have?

            Conversely, when should you initiate a fork?





            The

            answers depend on what kind of fork it is. Some forks are due to

            amicable but irreconcilable disagreements about the direction of the

            project; perhaps more are due to both technical disagreements and

            interpersonal conflicts. Of course, it's not always

            possible to tell the difference between the two, as technical

            arguments may involve personal elements as well. What all forks have

            in common is that one group of developers (or sometimes even just one

            developer) has decided that the costs of working with some or all of

            the others now outweigh the benefits.





            Once a project forks, there is no definitive answer to the question

            of which fork is the "true" or

            "original" project. People will

            colloquially talk of fork F coming out of project P, as though P is

            continuing unchanged down some natural path while F diverges into new

            territory, but this is, in effect, a declaration of how that

            particular observer feels about it. It is fundamentally a matter of

            perception: when a large enough percentage of observers agree, the

            assertion starts to become true. It is not the case that there is an

            objective truth from the outset, one that we are only imperfectly

            able to perceive at first. Rather, the perceptions

            are the objective truth, since ultimately a

            project�or a fork�is an entity that exists only in

            people's minds anyway.





            If those initiating the fork feel that they are sprouting a new

            branch off the main project, the perception question is resolved

            immediately and easily. Everyone, both developers and users, will

            treat the fork as a new project, with a new name (perhaps based on

            the old name, but easily distinguishable from it), a separate web

            site, and a separate philosophy or goal. Things get messier, however,

            when both sides feel they are the legitimate guardians of the

            original project and therefore have the right to continue using the

            original name. If there is some organization with trademark rights to

            the name, or legal control over the domain or web pages, that usually

            resolves the issue by fiat: that organization will decide who is the

            project and who is the fork, because it holds all the cards in a

            public relations war. Naturally, things rarely get that far: since

            everyone already knows what the power dynamics are, they will avoid

            fighting a battle whose outcome is known in advance, and just jump

            straight to the end.





            Fortunately, in most cases there is little doubt as to which is the

            project and which is the fork, because a fork is, in essence, a vote

            of confidence. If more than half of the developers are in favor of

            whatever course the fork proposes to take, usually there is no need

            to fork�the project can simply go that way itself, unless it is

            run as a dictatorship with a particularly stubborn dictator. On the

            other hand, if fewer than half of the developers are in favor, the

            fork is a clearly minority rebellion, and both courtesy and common

            sense indicate that it should think of itself as the divergent branch

            rather than the main line.







            8.6.1. Handling a Fork





            If

            someone threatens a fork in your project, keep calm and remember your

            long-term goals. The mere existence of a fork

            isn't what hurts a project; rather,

            it's the loss of developers and users. Your real

            aim, therefore, is not to squelch the fork, but to minimize these

            harmful effects. You may be mad, you may feel that the fork was

            unjust and uncalled for, but expressing that publicly can only

            alienate undecided developers. Instead, don't force

            people to make exclusive choices, and be as cooperative as is

            practicable with the fork. To start with, don't

            remove someone's commit access in your project just

            because she decided to work on the fork. Work on the fork

            doesn't mean that person has suddenly lost her

            competence to work on the original project; committers before should

            remain committers afterward. Beyond that, you should express your

            desire to remain as compatible as possible with the fork, and say

            that you hope developers will port changes between the two whenever

            appropriate. If you have administrative access to the

            project's servers, publicly offer the forkers

            infrastructure help at startup time. For example, offer them a

            complete, deep-history copy of the version control repository, if

            there's no other way for them to get it, so that

            they don't have to start off without historical data

            (this may not be necessary depending on the version control system).

            Ask them if there's anything else they need, and

            provide it if you can. Bend over backward to show that you are not

            standing in the way, and that you want the fork to succeed or fail on

            its own merits and nothing else.





            The reason to do all this�and do it publicly�is not to

            actually help the fork, but to persuade developers that your side is

            a safe bet, by appearing as non-vindictive as possible. In war it

            sometimes makes sense (strategic sense, if not human sense) to force

            people to choose sides, but in free software it almost never does. In

            fact, after a fork some developers often openly work on both

            projects, and do their best to keep the two compatible. These

            developers help keep the lines of communication open after the fork.

            They allow your project to benefit from interesting new features in

            the fork (yes, the fork may have things you want), and also increase

            the chances of a merger down the road.





            Sometimes a fork becomes so successful that, even though it was

            regarded even by its own instigators as a fork at the outset, it

            becomes the version everybody prefers, and eventually supplants the

            original by popular demand. A famous instance of this was the

            GCC/EGCS fork. The GNU C Compiler, or more

            recently, the GNU Compiler Collection (GCC) is the most popular open

            source native-code compiler, and also one of the most portable

            compilers in the world. Due to disagreements between the

            GCC's official maintainers and Cygnus

            Software,[5]

            one of GCC's most active developer groups, Cygnus

            created a fork of GCC called EGCS. The fork

            was deliberately non-adversarial: the EGCS developers did not, at any

            point, try to portray their version of GCC as a new official version.

            Instead, they concentrated on making EGCS as good as possible,

            incorporating patches at a faster rate than the official GCC

            maintainers. EGCS gained in popularity, and eventually some major

            operating system distributors decided to package EGCS as their

            default compiler instead of GCC. At this point, it became clear to

            the GCC maintainers that holding on to the

            "GCC" name while everyone switched

            to the EGCS fork would burden everyone with a needless name change,

            yet do nothing to prevent the switchover. So GCC adopted the EGCS

            codebase, and there is once again a single GCC, but greatly improved

            because of the fork.

            [5] Now part of RedHat (http://www.redhat.com/).





            This example shows why you cannot always regard a fork as an

            unadulteratedly bad thing. A fork may be painful and unwelcome at the

            time, but you cannot necessarily know whether it will succeed.

            Therefore, you and the rest of the project should keep an eye on it,

            and be prepared not only to absorb features and code where possible,

            but in the most extreme case, to even join the fork if it gains the

            bulk of the project's mindshare. Of course, you will

            often be able to predict a fork's likelihood of

            success by seeing who joins it. If the fork is started by the

            project's biggest complainer and joined by a handful

            of disgruntled developers who weren't behaving

            constructively anyway, they've essentially solved a

            problem for you by forking, and you probably don't

            need to worry about the fork taking momentum away from the original

            project. But if you see influential and respected developers

            supporting the fork, you should ask yourself why. Perhaps the project

            was being overly restrictive, and the best solution is to adopt into

            the mainline project some or all of the actions contemplated by the

            fork�in essence, to avoid the fork by becoming it.









            8.6.2. Initiating a Fork





            All

            the advice here assumes that you are forking as a last resort.

            Exhaust all other possibilities before starting a fork. Forking

            almost always means losing developers, with only an uncertain promise

            of gaining new ones later. It also means starting out with

            competition for users' attention: everyone

            who's about to download the software has to ask

            themselves: "Hmm, do I want that one or the other

            one?" Whichever one you are, the situation is messy,

            because a question has been introduced that wasn't

            there before. Some people maintain that forks are healthy for the

            software ecosystem as a whole, by a standard natural selection

            argument: the fittest will survive, which means that, in the end,

            everyone gets better software. This may be true from the

            ecosystem's point of view, but it's

            not true from the point of view of any individual project. Most forks

            do not succeed, and most projects are not happy to be forked.





            A corollary is that you should not use the threat of a fork as an

            extremist debating technique�"Do things my way

            or I'll fork the

            project!"�because everyone is aware that a

            fork that fails to attract developers away from the original project

            is unlikely to survive long. All observers�not just developers,

            but users and operating system packagers too�will make their

            own judgement about which side to choose. You should therefore appear

            extremely reluctant to fork, so that if you finally do it, you can

            credibly claim it was the only route left.





            Do not neglect to take all factors into account

            in evaluating the potential success of your fork. For example, if

            many of the developers on a project have the same employer, then even

            if they are disgruntled and privately in favor of a fork, they are

            unlikely to say so out loud if they know that their employer is

            against it. Many free software programmers like to think that having

            a free license on the code means no one company can dominate

            development. It is true that the license is, in an ultimate sense, a

            guarantor of freedom�if others want badly enough to fork the

            project, and have the resources to do so, they can. But in practice,

            some projects' development teams are mostly funded

            by one entity, and there is no point pretending that the

            entity's support doesn't matter. If

            it is opposed to the fork, its developers are unlikely to take part,

            even if they secretly want to.





            If you still conclude that you must fork, line up support privately

            first, then announce the fork in a non-hostile tone. Even if you are

            angry at, or disappointed with, the current maintainers,

            don't say that in the message. Just dispassionately

            state what led you to the decision to fork, and that you mean no ill

            will toward the project from which you're forking.

            Assuming that you do consider it a fork (as opposed to an emergency

            preservation of the original project), emphasize that

            you're forking the code and not the name, and choose

            a name that does not conflict with the project's

            name. You can use a name that contains or refers to the original

            name, as long as it does not open the door to identity confusion. Of

            course it's fine to explain prominently on the

            fork's home page that it descends from the original

            program, and even that it hopes to supplant it. Just

            don't make users' lives harder by

            forcing them to untangle an identity dispute.





            Finally, you can get things started on the right foot by

            automatically granting all committers of the original project commit

            access to the fork, including even those who openly disagreed with

            the need for a fork. Even if they never use the access, your message

            is clear: there are disagreements here, but no enemies, and you

            welcome code contributions from any competent source.



















               < Day Day Up >