Tuesday, October 27, 2009

Section 23.1.  Long-Living IP Peer Information










23.1. Long-Living IP Peer Information





At the IP layer, there is no concept of a stateful connection. Because IP is a stateless protocol, there are no parameters or connection-related data structures to keep, except for statistics. (These are optional and are not required by the protocol itself.) However, to improve performance, the kernel keeps information about some parameters on a per-destination IP address base. We will see an example in a moment.


Any host that has recently carried on an exchange of data with a Linux box is considered an IP peer. The kernel allocates a data structure for each peer to preserve some long-living information. At the moment, not many parameters are kept in the structure. The most important one is the IP packet ID. We saw in Chapter 18 that each IP packet is identified by a 16-bit field called ID. Instead of having a single shared ID, incremented for each IP packet regardless of the destination, one unique instance is kept for each IP peer. (This solution is an implementation choice; it is not imposed by any standard.) We already had a little discussion on the packet ID in Chapter 18.


Peers are represented by inet_peer structures. These structures, defined in include/net/inetpeer.h and described in the section "inet_peer Structure," are organized in an AVL tree, which is a well-known type of data structure optimized for lookups. I will not go into detail about the AVL data structure
; you can find it in any programming book.[*] However, it is worthwhile to underline the trade-offs involved in an AVL tree. Essentially, the tree is kept balanced thanks to the way in which insert and delete operations are defined. Because the tree is balanced, a search will always take O(lg n) time, where n is the number of elements in the tree. Generally speaking, because keeping the tree balanced comes at a cost, this kind of data structure is usually used when there are many lookups relative to insert/delete/change operations, and when the speed of these lookups is particularly important.

[*] The comment at the top of net/ipv4/inetpeer.c is quite clear and self-explanatory.


The whole AVL tree and the associated global variables (such as peer_total) are protected by the peer_pool_lock lock. The lock can be acquired in both shared and exclusive modes. Lookups need only read privilege and therefore will acquire the lock in shared mode, whereas insert/delete operations have to acquire the lock in exclusive mode.



23.1.1. Initialization










The peer subsystem is initialized by inet_initpeers, which is defined in net/ipv4/inetpeer.c and is invoked by ip_init when the IPv4 protocol is initialized at boot time.


That function accomplishes three main tasks:


  • Allocates the cache that will be used to hold inet_peer structures, which will be allocated as peers are recognized.

  • Defines a threshold (inet_peer_threshold) that will be used to limit the amount of memory used by inet_peer structures. Its value is computed based on the amount of RAM in the system. When a new entry is created, the global counter peer_total is incremented; it is of course decremented when an element is removed. If peer_total becomes bigger than the threshold, the most recently used element is removed (see inet_getpeer).

  • Starts the garbage collection timer. We describe this task in the section "Garbage Collection."




23.1.2. Lookups


The key for a search is the destination's IP address. There are two main functions:



lookup


This is a macro local to net/ipv4/inetpeer.c that implements a simple search in an AVL tree.


inet_getpeer


This function can be used from other subsystems, such as TCP and routing, to search a given entry. This function is built on top of lookup.


inet_getpeer is passed the search key (the peer's IP address) and a flag (create) that can be used to ask for the creation of a new entry in case the search failed. When a new entry is created, the initial IP packet ID is initialized to a random value by means of secure_ip_id.


Figure 23-1 shows the internals of inet_getpeer. The function is pretty simple and does not need much explanation. However, there is one point worth clarifying: why there are two lookups to see whether there is already an entry with the same destination address as the one being requested. The second check is not superfluous because a similar entry could have been created and added to the tree between the time the read lock was released and the write lock was acquired.




23.1.3. How the IP Layer Uses inet_peer Structures

















Among the few fields of the inet_peer structure, only two are currently used by the IP layer: v4addr, which identifies the peer, and ip_id_count.


The value of ip_id_count can be retrieved via inet_getid, which automatically increments its value at the same time. The latter is never called directly. The section "Selecting the IP Header's ID Field" offers a list of the wrappers that are used by the IP layer depending on the context.




23.1.4. Garbage Collection



Because the number of inet_peer instances that can be created is limited, there is a timer (peer_periodic_timer) that is started at subsystem initialization time (inet_initpeers) and that at regular intervals causes the removal of entries that have not been used for a given amount of time. The timer handler is peer_check_expire.


The amount needed to classify an entry as old depends on how loaded the system is. A system is considered loaded when the number of elements (peer_total) is greater than or equal to the threshold (inet_peer_threshold). On a loaded system, entries are removed after an inactivity period of 120 seconds (inet_peer_minttl). On a system that is not loaded, the value lies between 120 seconds and 10 minutes (inet_peer_maxttl) and is inversely proportional to the number of outstanding inet_peer entries (peer_total). To avoid making the timer a CPU hog, the number of elements removable at each timer expiration is set to PEER_MAX_CLEANUP_WORK (30).


When the timer is first started, the timeout is set to expire after inet_peer_minttl, with a little perturbation to avoid synchronization with other timers started at boot time. After that, the timer does not really run at regular intervals. Instead, the expiration time is set to a value between 10 seconds (inet_peer_gc_mintime) and 120 seconds (inet_peer_gc_maxtime), inversely proportional to the number of entries (see peer_check_expire), which means that the more entries there are, the faster they expire.


When an entry expires, it is inserted into the unused list, whose head and tail are pointed to by the two global variables inet_peer_unused_head and inet_peer_unused_tailp. The unused list is protected by the inet_peer_unused_lock lock. If an expired entry is still referenced (that is, the reference count is greater than 1), it cannot be freed and it is kept in the unused list; otherwise it, is freed now.



Figure 23-1. inet_getpeer function



When an inet_peer structure is to be removed, because it expired or because it is not used anymore (i.e., its reference count dropped to 0), it is inserted into the unused list but is kept in the AVL tree, too. This means that subsequent lookups on the AVL tree can return inet_peer entries currently in the unused list.


The way entries are purged is through the cleanup_once function, which is called by the timer handler peer_check_expire, and by inet_getpeer when the number of entries passes the allowed limit. The input parameter to cleanup_once specifies how long an inet_peer instance must have spent on the unused list before being eligible for deletion. The value 0, as used by inet_getpeer, means that any instance is eligible.


When an entry that is in the unused list is accessed (i.e., selected by a lookup on the AVL tree), it gets removed from that list. For this reason, an entry can join and leave the unused list several times during its life (see inet_getpeer).













No comments:

Post a Comment