Sunday, October 25, 2009

Section 35.3.  The Table Lookup: fn_hash_lookup










35.3. The Table Lookup: fn_hash_lookup







All routing table lookups
, regardless of the tables provided by Policy Routing and the direction of the traffic, are done with fn_hash_lookup. This function is registered as the handler for the tb_lookup function pointer of the fib_table structure in fib_hash_init (see the section "Routing Table Initialization" in Chapter 34).


The function's lookup algorithm uses the LPM algorithm introduced in Chapter 30. The execution of this algorithm is facilitated by the organization of routes into per-netmask hash tables, as shown in Figure 34-1 in Chapter 34. fn_hash_lookup searches for the fib_node instance that has the information to route packets to a particular destination.


The prototype for fn_hash_lookup is:



static int
fn_hash_lookup(struct fib_table *tb, const struct flowi *flp, struct fib_result *res)



Here is the meaning of its input parameters:



tb


The routing table to search. Because fn_hash_lookup is a generic lookup routine that runs on one table at a time, the tables to search are decided by the caller, depending on Policy Routing support and related factors.


flp


Search key.


res


Upon success, res is initialized with the routing information.


And these are the possible return values:



0: success


res has been initialized (by fib_semantic_match) with the forwarding information.


1: failure


No route matched the search key.


Less than 0: Administrative failure


This means the lookup cannot succeed because the route found is of no value: for instance, the associated host may be flagged as unreachable.


The LPM algorithm loops over the routes, starting with the zone that represents the longest netmask. This is because longer netmasks mean more specific routes, which in turn means that the packet is likely to get closer to the final destination. (For instance, a /27 netmask that can cover only 30 hosts is preferred over a /24 netmask that potentially covers 254.) Thus, the search browses all the active zones, starting from the ones with the longest netmasks. As we saw in the section "Organization of Routing Hash Tables" in Chapter 34, all the active zones are sorted by netmask length and fn_zone_list stores the head of the list.



struct fn_hash *t = (struct fn_hash*)tb->tb_data;
read_lock(&fib_hash_lock);
for (fz = t->fn_zone_list; fz; fz = fz->fz_next) {
struct hlist_head *head;
struct hlist_node *node;
struct fib_node *f;



The function ANDs the destination IP address with the netmask of the active zone being checked, and uses the result as a search key. For example, if the function is currently checking the /24 zone, and the destination address flp->fl4_dst is 10.0.1.2, the search key k is initialized to 10.0.1.2 & 255.255.255.0, which comes out to 10.0.1.0. This means that the following piece of code searches for a route to the subnet 10.0.1.0/24:



u32 k = fz_key(flp->fl4_dst, fz);



Because routes are stored in a hash table (fz_hash), head selects the right bucket of the table by applying a hash function to the key k. The next step is to browse the list of routes (fib_node structures) associated with the selected table's bucket and look for one that matches k.



head = &fz->fz_hash[fn_hash(k, fz)];
hlish_for_each_entry(f, node, head, fn_hash) {
if (f->fn_key != k)) {
continue;
 
err = fib_semantic_match(&f->fn_alias,
flp, res,
f->fn_key, fz->fz_mask,
fz->fz_order);
 
if (err < 0)
goto out;
}
}
err = 1;
out:
read_unlock(&fib_hash_lock);
return err;
}



We saw in the section "Organization of Routing Hash Tables" in Chapter 34 that a fib_node covers all the routes that lead to the same subnet but that could differ on other fields such as TOS. Now, if fn_hash_lookup manages to find a fib_node that matches the search key k, the function still needs to check each potential route to find one that also matches the other search key fields received in input through the flp parameter. This detailed check is taken care of by fib_semantic_match, described in the next section.


If fib_semantic_match returns success, it also initializes the input parameter res that stores the result of the lookup, and fn_hash_lookup returns this result to its caller. fn_hash_lookup loops through all the zones until fib_semantic_match either returns a successful result or discovers that the table's routes are unusable (i.e., they do not match).



35.3.1. Semantic Matching on Subsidiary Criteria





fib_semantic_match is called to find whether any routes (fib_alias structures) among the ones associated with a given fib_node match all the required search key fields. We saw in the previous section that the main field, the final destination IP address to which the packet must be routed, was matched by fn_hash_lookup before invoking this function. So it falls to fib_semantic_match to check the other criteria.


Once fib_semantic_match has identified the right instance of fib_alias, it simply needs to extract the routing information from the associated fib_node. The only additional task required is the selection of the next hop. This last task is needed only when the matching route uses Multipath, and it can be handled in two ways:


  • By fib_semantic_match, when the search key provides an egress device.

  • By fib_select_multipath, when the search key does not provide an egress device. fib_select_multipath is called by the ip_route_input_slow or ip_route_output_slow routine.


The logic of fib_semantic_match is shown in Figure 35-2.



Figure 35-2. fib_semantic_match function




35.3.1.1. Criteria for rejecting routes

While browsing fib_alias structures, fib_semantic_match rejects the ones that:


  • Do not match the TOS. Note that when routes are not configured with a TOS value, they can be used to route packets with any TOS.

  • Have a narrower scope than the one specified with the search key. For example, if the routing subsystem is looking for a route with scope RT_SCOPE_UNIVERSE, it cannot use one with scope RT_SCOPE_LINK.


Furthermore, the function must check whether a route or the desired next hop has gone away, in which case the routing subsystem has marked it for deletion by setting its RTNH_F_DEAD flag. The section "Helper Routines" in Chapter 32 shows how the RTNH_F_DEAD flag can be set for an entire route or for a single next hop of a route.


Once an eligible fib_alias instance has been identified, and supposing the associated fib_info structure is usable (i.e., not marked RTNH_F_DEAD), fib_semantic_match needs to browse all the next hops' fib_nh instances to find one that also matches the search key's device, if a device was specified. It is possible that none of the next hops can actually be used. This could happen for one of two main reasons:


  • All the next hops are unusable (that is, they have their RTNH_F_DEAD flags set).

  • The search key specifies an egress device that does not match any of the next hop configurations.


When there is no support for Multipath, there can be only one next hop.


While browsing fib_alias instances, fib_semantic_match sets the FA_S_ACCESSED flag on those that meet the scope and TOS requirements mentioned earlier in this section. The flag is set regardless of whether the fib_alias is selected. If and when the fib_alias instance is removed, this flag will be taken into account to decide whether the cache should be flushed.




35.3.1.2. Return value from fib_semantic_match


As stated earlier, the return value from fib_semantic_match can take one of three meanings:


  • 1 means there is no matching route.

  • 0 means success. In this case, the result of the lookup is stored in the input parameter res. The result includes a pointer to the matching fib_info instance.

  • A negative value represents an administrative failure.


Both 0 and the negative return values are determined from the type (fa->fa_type) of the matching route (fa) found by fib_semantic_match. Examples of the type value are RTN_UNICAST and RTN_LOCAL. From this type, fib_semantic_match can decide whether the lookup should succeed or fail, and can pass back an error code that allows the kernel to take the proper action in case of failure.


For example, a route of type RTN_UNREACHABLE causes fib_semantic_match to return the error -EHOSTUNREACH, which then leads the kernel to generate an ICMP unreachable message. A route of type RTN_THROW causes fib_semantic_match to return the error-EAGAIN, which instructs the Policy Routing version of fib_lookup in net/ipv4/fib_rules.c to retry the lookup with the next routing table.


Because the fa->fa_type type field drives the value returned, the error codes are embodied in a fib_props array, defined and initialized in the file net/ipv4/fib_semantics.c (see the section "rtable Structure" in Chapter 36). The array contains an element for each possible route type that specifies the associated error code and an RT_SCOPE_XXX scope. Deriving the error code and scope is as simple as referencing the element of fib_props corresponding to the index fa->fa_type.


Table 35-1 shows how fib_props is initialized.


Table 35-1. Initialization of fib_props

Route type

Error

Scope

RTN_UNSPEC

0

RT_SCOPE_NOWHERE

RTN_UNICAST

0

RT_SCOPE_UNIVERSE

RTN_LOCAL

0

RT_SCOPE_HOST

RTN_BROADCAST

0

RT_SCOPE_LINK

RTN_ANYCAST

0

RT_SCOPE_LINK

RTN_MULTICAST

0

RT_SCOPE_UNIVERSE

RTN_BLACKHOLE

-EINVAL

RT_SCOPE_UNIVERSE

RTN_UNREACHABLE

-EHOSTUNREACH

RT_SCOPE_UNIVERSE

RTN_PROHIBIT

-EACCES

RT_SCOPE_UNIVERSE

RTN_THROW

-EAGAIN

RT_SCOPE_UNIVERSE

RTN_NAT

-EAGAIN

RT_SCOPE_NOWHERE

RTN_XRESOLVE

-EINVAL

RT_SCOPE_NOWHERE



Note that the first few elements have a value of 0 for error: in these cases, fib_semantic_match returns success. The others have an error code used by the routing code to handle the routing failure correctly.














No comments:

Post a Comment