23.1. Long-Living IP Peer InformationAt the IP layer, there is no concept of a stateful connection. Because IP is a stateless protocol, there are no parameters or connection-related data structures to keep, except for statistics. (These are optional and are not required by the protocol itself.) However, to improve performance, the kernel keeps information about some parameters on a per-destination IP address base. We will see an example in a moment. Any host that has recently carried on an exchange of data with a Linux box is considered an IP peer. The kernel allocates a data structure for each peer to preserve some long-living information. At the moment, not many parameters are kept in the structure. The most important one is the IP packet ID. We saw in Chapter 18 that each IP packet is identified by a 16-bit field called ID. Instead of having a single shared ID, incremented for each IP packet regardless of the destination, one unique instance is kept for each IP peer. (This solution is an implementation choice; it is not imposed by any standard.) We already had a little discussion on the packet ID in Chapter 18. Peers are represented by inet_peer structures. These structures, defined in include/net/inetpeer.h and described in the section "inet_peer Structure," are organized in an AVL tree, which is a well-known type of data structure optimized for lookups. I will not go into detail about the AVL data structure
The whole AVL tree and the associated global variables (such as peer_total) are protected by the peer_pool_lock lock. The lock can be acquired in both shared and exclusive modes. Lookups need only read privilege and therefore will acquire the lock in shared mode, whereas insert/delete operations have to acquire the lock in exclusive mode. 23.1.1. InitializationThe peer subsystem is initialized by inet_initpeers, which is defined in net/ipv4/inetpeer.c and is invoked by ip_init when the IPv4 protocol is initialized at boot time. That function accomplishes three main tasks:
23.1.2. LookupsThe key for a search is the destination's IP address. There are two main functions:
inet_getpeer is passed the search key (the peer's IP address) and a flag (create) that can be used to ask for the creation of a new entry in case the search failed. When a new entry is created, the initial IP packet ID is initialized to a random value by means of secure_ip_id. Figure 23-1 shows the internals of inet_getpeer. The function is pretty simple and does not need much explanation. However, there is one point worth clarifying: why there are two lookups to see whether there is already an entry with the same destination address as the one being requested. The second check is not superfluous because a similar entry could have been created and added to the tree between the time the read lock was released and the write lock was acquired. 23.1.3. How the IP Layer Uses inet_peer StructuresAmong the few fields of the inet_peer structure, only two are currently used by the IP layer: v4addr, which identifies the peer, and ip_id_count. The value of ip_id_count can be retrieved via inet_getid, which automatically increments its value at the same time. The latter is never called directly. The section "Selecting the IP Header's ID Field" offers a list of the wrappers that are used by the IP layer depending on the context. 23.1.4. Garbage CollectionBecause the number of inet_peer instances that can be created is limited, there is a timer (peer_periodic_timer) that is started at subsystem initialization time (inet_initpeers) and that at regular intervals causes the removal of entries that have not been used for a given amount of time. The timer handler is peer_check_expire. The amount needed to classify an entry as old depends on how loaded the system is. A system is considered loaded when the number of elements (peer_total) is greater than or equal to the threshold (inet_peer_threshold). On a loaded system, entries are removed after an inactivity period of 120 seconds (inet_peer_minttl). On a system that is not loaded, the value lies between 120 seconds and 10 minutes (inet_peer_maxttl) and is inversely proportional to the number of outstanding inet_peer entries (peer_total). To avoid making the timer a CPU hog, the number of elements removable at each timer expiration is set to PEER_MAX_CLEANUP_WORK (30). When the timer is first started, the timeout is set to expire after inet_peer_minttl, with a little perturbation to avoid synchronization with other timers started at boot time. After that, the timer does not really run at regular intervals. Instead, the expiration time is set to a value between 10 seconds (inet_peer_gc_mintime) and 120 seconds (inet_peer_gc_maxtime), inversely proportional to the number of entries (see peer_check_expire), which means that the more entries there are, the faster they expire. When an entry expires, it is inserted into the unused list, whose head and tail are pointed to by the two global variables inet_peer_unused_head and inet_peer_unused_tailp. The unused list is protected by the inet_peer_unused_lock lock. If an expired entry is still referenced (that is, the reference count is greater than 1), it cannot be freed and it is kept in the unused list; otherwise it, is freed now. Figure 23-1. inet_getpeer functionWhen an inet_peer structure is to be removed, because it expired or because it is not used anymore (i.e., its reference count dropped to 0), it is inserted into the unused list but is kept in the AVL tree, too. This means that subsequent lookups on the AVL tree can return inet_peer entries currently in the unused list. The way entries are purged is through the cleanup_once function, which is called by the timer handler peer_check_expire, and by inet_getpeer when the number of entries passes the allowed limit. The input parameter to cleanup_once specifies how long an inet_peer instance must have spent on the unused list before being eligible for deletion. The value 0, as used by inet_getpeer, means that any instance is eligible. When an entry that is in the unused list is accessed (i.e., selected by a lookup on the AVL tree), it gets removed from that list. For this reason, an entry can join and leave the unused list several times during its life (see inet_getpeer). |
Tuesday, October 27, 2009
Section 23.1. Long-Living IP Peer Information
Subscribe to:
Post Comments (Atom)
No comments:
Post a Comment