Tuesday, November 3, 2009

A Review of STL


3 4



A Review of STL



Complete coverage or deep analysis of STL is well beyond the scope of this chapter. It's the kind of subject that easily could fill an entire book. One such book is the STL Tutorial and Reference Guide.4 If you are looking for coverage of C++ language features, including templates other than STL, have a look at the C++ Primer 5 or The C++ Programming Language.6 The latter does cover STL in the context of the C++ Standard Library, of which it is a part. Having said that, in this section, I will provide a review of the core portions of STL. Having a firm grasp of this subject matter gives you these advantages:




  • STL architecture and the details of its specification are a model example of effective generic programming. Potential for reuse could hardly be maximized any further.


  • Knowledge of the STL specification allows you to use STL and compliant building blocks easily and effectively. You also can write your own STL-compliant software building blocks to interoperate with those of STL and third-party providers.


  • Understanding the crucial portions of STL is essential to the comprehension of the STL/COM+ integration approach that I will outline later in the chapter. Many of the details can be gleaned from the previous references as well as MSDN online documentation, but it is important to see the big picture of STL architecture before you ponder the merits of a COM+ integration approach.





You need to realize that STL is not just a specification for a half-implemented standard. The library is mature, and its concepts and implementations have been rigorously tested. As I mentioned before, it has become a part of the internationally adopted C++ standard. Any vendor that claims to sell a standards-compliant compiler therefore must provide you with an STL implementation right out of the box. But even if your compiler does not include this, many free and commercial versions of STL implementations exist, and some implementations of the entire C++ Standard Library are also available for use with a multitude of C++ compilers. There is an implementation of STL out there for you, and you can start using it today.



Because STL's design is so clean, its specification so effective, and the potential for reuse of STL-compliant building blocks so tremendous, the STL flavor for interoperability of software building blocks is beginning to establish itself as the basis for the design of reusable C++ code. Realizing how much more useful their products become and how many more project environments they can reach, makers of previously independent and otherwise not connectable C++ software libraries have begun to adopt the STL specification as the basis for their algorithms and data structures. These makers also have released new versions of their libraries with fully STL-compliant software building blocks. The vision of plug-and-play interoperability of generic programming thus is being realized in the C++ space. Understanding STL's architecture lets you be a part of this new and exciting world.



I will talk about containers, iterators, and generic algorithms in this section. Less relevant to the discussion here are the allocator, adapter, and function object STL constructs, so these terms will not be much in evidence.



Containers



STL containers are data structures that hold collections of elements on whose type they are parameterized. There are two basic kinds of containers: sequence containers and sorted associative containers. Sequence containers hold their elements through linear arrangement without imposing any particular order. The order of retrieval when traversing the container is usually the result of the particular insertion operations that brought the elements into the container. The following are STL sequence containers:




  • The ordinary C++ array type. That's right—it's not a class type at all. But the specification for generic algorithms is formulated so that they can operate on built-in C++ types the same way as on class types, wherever possible and sensible. This particular theme will reoccur in my discussion because it is important to the COM+ integration approach.


  • The vector is an array abstraction with built-in dynamic memory management. Element insertion time is O(n), with O(c) at the end.7


  • The deque provides random access element retrieval like the vector but extends its performance guarantee by delivering constant time for element insertions also at the beginning.


  • The list is a simple abstraction of a doubly linked list.





Sorted associative containers maintain their elements in an order imposed by the container. They provide for fast element retrieval via keys. The key type must allow for key value comparison so that the container can establish the ordering. These are STL's sequence containers:




  • The set uses the collected elements themselves to hold key values and applies a user-supplied function object to pairs of elements to establish their relative order. The elements' key values must be unique in the set.


  • The multiset is identical to the set but drops the key uniqueness requirement.


  • The map separates key values from elements and stores them separately. As with the set, the programmer can supply a function object the set will use to compare key values. These key values must be unique in the map.


  • The multimap is identical to the map but drops the uniqueness requirement for the keys.





All class-based containers must present a unique interface consisting of type definitions and member functions. For example, container_type::value_type must specify the container's element type. All containers must implement a size member function that will return the number of elements in the container. Please refer to STL documentation for further details.



All sequence containers then must augment the common container interface with their own type definitions and member functions. The same is true for sorted associative containers, but the interface varies a bit from the sequence container interface. For example, all sequence and sorted associative containers must support various overloads of an insert member function to add new elements to the container, but the functions' signatures vary somewhat among the two container groups. This classification of interface requirements from the all-encompassing general and spreading out into different groups of specificity (similar to a hierarchy of interfaces with inheritance) allows algorithms to remain general with respect to the container group, when possible. Different branches need to be taken for different container types by algorithms only if they need to exercise container functionality that is not common to all containers in the same syntactic form, such as insert. Again, this is akin to deciding whether to place a certain member function in a base or derived interface. A client can remain completely impartial to the implementation only as long as a certain method is callable via the base interface. If the method isn't callable, the client's logic will need to branch. But of course with STL containers no interfaces exist at all, let alone a hierarchy of types. The requirements are purely syntactic. For a class-based sequence container cCont, the statement cCont.insert(m, n) is required to compile and produce the intended effect if m is a valid iterator to cCont and n is an element value. That's all.



STL also provides various container adapters that adapt the basic containers just shown to serve in the role of other data structures. For example, the stack container adapter turns a vector, deque, or list into a stack with just push and pop operations. There are also queue and priority queue adapters.



Iterators



An iterator is a construct used to identify a position in a container. That position might be occupied by an element, in which case you can use the iterator to retrieve the element's value. You might recognize this device as an abstraction of a pointer. STL specifies the iterator interface in such a way that class as well as raw C++ pointer types can satisfy it. The syntactic power of the C++ language, operator overloading in this case, makes this possible. Therefore, you can hand either an iterator object or a raw C++ pointer to all generic algorithms. Again we observe the remarkable flexibility that pervades all of STL's architecture.



All object-based containers return an iterator to the first element in the container from their begin method and an iterator to the position just past the last element from their end method. If the container is empty, begin and end return iterators of the same value.



Iterators are classified with respect to two independent functional categories. The first category concerns the direction in which elements can flow between iterator and client.




  • Input iterators can be used only to retrieve the value of the element in the container to which they refer.


  • Output iterators can be used only to set the value of the element in the container to which they refer.





An iterator can be both an input and an output iterator. Performing operations on the container might make reading and writing element value an illegal operation. For example, deleting an element via the container's erase method likely will invalidate all iterators that refer to this element. But for a vector, all iterators referring to elements past the deleted element are also invalidated. However, a list does not operate in this manner. The semantics are entirely container specific.



The second category classifies the kinds of movements an iterator can make over a container. This category is hierarchical because it specifies a progressive set of capabilities.




  1. The forward iterator is an input as well as an output iterator. When traversing the container, this iterator moves from element to element in one direction only, one element at a time. Its operator ++ advances the position.


  2. The bidirectional iterator is a forward iterator and can be positioned to the previous element via its operator --.


  3. The random-access iterator is a bidirectional iterator and is capable of moving forward or backward from the current element to another element at an arbitrary distance by using pointer arithmetic syntax (operators +, -, +=, and -=). Random-access iterators also can be compared to one another for relative positioning in the same container by using operators <, >, <=, and >=. Direct addressing of an element by using at-relative-index syntax is supported via operator []. An expression iterat[nIndex] is required to produce the same result as *(iterat + nIndex).



It is clear that a raw C++ pointer or array type is capable of serving as an STL iterator with the highest function: input and output (unless the type it references is const) as well as random-access. An iterator type's level of function is container dependent. While a vector produces a random-access iterator, only bidirectionality is compatible with the performance guarantee of constant time insertion in the middle of the list.



Because raw pointers can serve as iterators, STL cannot guarantee that certain checks are made when operating with its iterators. For example, advancing an iterator past-the-end or before-the-beginning is likely not to be caught. (Particular class-based iterator implementations could throw an exception, however; the CSB iterator adapter does so.) Dereferencing an iterator rendered invalid by such an operation likely will result in an access violation. Flexibility trumps safety with STL.



Some other templates that are not associated with any particular container do present iterator interfaces. Input stream iterators let algorithms retrieve elements from a C++ input stream, and output stream iterators write to output streams. STL also provides a set of iterator adapter templates: reverse iterators convert forward movement to backward movement and vice versa. It is therefore possible to change the direction of a generic algorithm without having to alter the algorithm itself in the least. It is clear that these templates can be applied only to iterators that have at least bidirectional capability. Insert iterator adapters convert write operations through the underlying iterator to insertions into its container. Therefore, they present output iterator syntactic capabilities but not output semantics. You select the manner in which the insertion into the container should take place by choosing one of the available variants of insert iterator adapters.



Generic Algorithms



A generic algorithm is a template function designed to be instantiated with built-in types or types that meet an STL specification. Some are universally useful in general-purpose programming (for example, functions that return the greater or smaller of two given values); others operate on STL containers. Many generic algorithms that operate on containers accept two iterators, one that marks the element at which to begin applying the algorithm and the other pointing just past the element at which to stop. A client that wants to apply a generic algorithm to all the elements in a container passes iterators retrieved from the begin and end methods of the class-based container to the generic algorithm.



Many generic algorithms are parameterized on the type of an argument that supplies a required external function to the algorithm. For example, the for_each algorithm whose use you have witnessed elsewhere in this book applies the operator () to the third parameter with which it is invoked, passing each element of the sequence over which it iterates exactly once. The algorithm is written so that it can be instantiated with either a plain function pointer type or a function object (a class type). Other algorithms, especially those that depend on predicates, accept such functions optionally. For example, the max algorithm invokes the binary operator < if invoked with two arguments. But if you pass a predicate Foo as the third argument, it returns the second argument only if Foo(first, second) evaluates to true.



Different algorithms make different demands on the iterators that are passed to them. Nonmutating sequence algorithms require only an input iterator, while mutating sequence algorithms require an output iterator, and most—but not all (such as the generatealgorithm)—require an input iterator as well. Some algorithms require
8 random-access iterators, while others use only bidirectional or forward iterator capabilities. Yet other algorithms adjust their behavior based on what kind of iterator they are given.9 For example, the binary_search algorithm makes a performance guarantee of O(log(n)) only when given a random access iterator. Otherwise, it adopts a strategy that lessens the performance guarantee to O(n).



I encourage you to browse through a description of the complete set of algorithms in the STL documentation. You won't realize that you can avoid implementing a certain function unless you are aware of the STL software building blocks that can do the job for you. In addition, existing generic algorithms give you an idea of what can be accomplished in as generic a fashion as possible and showcase good generic design principles that you might be able to apply to your own projects. Because the entire set of STL generic algorithms is too large to reproduce here, I have chosen a representative cross section. My intent is to communicate the spirit and pattern underlying generic algorithms that is necessary for proceeding with the STL/COM+ integration topic.




  • The for_each algorithm applies its third argument (on whose type it is parameterized) to each element in the sequence bounded by the given iterators (on whose common type it also is parameterized).


  • The find algorithm returns an iterator to the first element that is equal to a given value. The find_if algorithm extends this behavior by allowing you to pass your own predicate to take the place of operator ==.


  • The partition algorithm reorders elements within the bounding iterators so that all elements that satisfy the given predicate precede those that do not.


  • The sort algorithm provides the performance guarantees of quicksort.


  • The stable_sort algorithm is based on adaptive merge sorting.


  • The partial_sort algorithm provides an efficient implementation of heapsort.


  • The unique algorithm eliminates adjacent duplicates.


  • The search algorithm locates matching element sequences.


  • The replace algorithm and its custom predicate version, replace_if, replace all elements with matching values by a new element.


  • The remove algorithm eliminates all elements that satisfy a predicate. Like the unique algorithm, it accomplishes this without access to a container's erase method but solely through manipulation of the given forward iterators.


  • The merge algorithm merges two sorted ranges into one sorted range.


  • The make_heap, push_heap, pop_heap, and sort_heap generic algorithms work with element sequences in a heap arrangement.





As with containers and iterators, STL provides some adapters for generic algorithms that make new algorithms from existing ones. These include negators, binders, and function pointer adapters. A negator inverts the return value of a predicate. Negators are provided for unary and binary predicates. A function pointer adapter converts a raw function pointer or a pointer to a member function to a function object, also known as a functional.10 A binder converts a binary functional to a unary one by fixing one of the parameters, either the first or the second. You can compose n binders to fix n arguments in functionals that expect n + 1 arguments. To understand the use of function adapters, consider the following example. Say you have written a function to determine whether a number is equal to the value in either the first or second slot of an array. Such a function might look like this:





bool�IsNumberInTuple(short�anTuple[],�short�nVal)
{
����return�nVal�==�anTuple[0]�||�nVal�==�anTuple[1];
}



Now assume that you have a set of numbers, cSet. You want to find the first instance in the set of either the number 6 or 7. Instead of writing an entirely new predicate for find_if, you can reuse IsNumberInTuple by composing a binder with a function pointer adapter like this:





short�anVals[]�=�{�6,�7�};
std::set<short>::iterator�cFirstMatch�=
����std::find_if(cSet.begin(),�cSet.end(),
�����������������std::bind1st(std::ptr_fun(IsNumberInTuple),�anVals));



Now cFirstMatch references the first element in the set with the value 6 or 7, or it has the value of cSet.end() if there is no such element. Such elegant use of adapters and their composition is fairly typical when programming with STL. The reuse and maintainability it achieves is unparalleled.



Iterators are key to letting generic algorithms operate on the broadest input type set possible. Even if data exists in a medium that is not compliant with STL, construction of STL-compliant iterator classes allows generic algorithms to operate on the data. The iterator is the central piece in STL software building block interoperability. The istream_iterator and ostream_iterator make this case quite convincingly. Even though no STL-compliant container exists, generic algorithms can operate on C++ stream data because of these iterators. You therefore can apply the find algorithm, for example, directly to an istream. For this reason, the iterator is central to the STL/COM+ integration strategy of CSB.



No comments:

Post a Comment