22.7 in_pcbbind Function The next function, in_pcbbind, binds a local address and port number to a socket. It is called from five functions: from bind for a TCP socket (normally to bind a server's well-known port); from bind for a UDP socket (either to bind a server's well-known port or to bind an ephemeral port to a client's socket); from connect for a TCP socket, if the socket has not yet been bound to a nonzero port (this is typical for TCP clients); from 1isten for a TCP socket, if the socket has not yet been bound to a nonzero port (this is rare, since listen is called by a TCP server, which normally binds a well-known port, not an ephemeral port); and from in_pcbconnect (Section 22.8), if the local IP address and local port number have not been set (typical for a call to connect for a UDP socket or for each call to sendto for an unconnected UDP socket).
In cases 3, 4, and 5, an ephemeral port number is bound to the socket and the local IP address is not changed (in case it is already set). We call cases 1 and 2 explicit binds and cases 3, 4, and 5 implicit binds. We also note that although it is normal in case 2 for a server to bind a well-known port, servers invoked using remote procedure calls (RPC) often bind ephemeral ports and then register their ephemeral port with another program that maintains a mapping between the server's RPC program number and its ephemeral port (e.g., the Sun port mapper described in Section 29.4 of Volume 1). We'll show the in_pcbbind function in three sections. Figure 22.20 is the first section. 64-67 The first two tests verify that at least one interface has been assigned an IP address and that the socket is not already bound. You can't bind a socket twice. 68-71 This if statement is confusing. The net result sets the variable wild to INPLOOKUP_WILDCARD if neither SO_REUSEADDR or SO_REUSEPORT are set. The second test is true for UDP sockets since PR_CONNREQUIRED is false for connectionless sockets and true for connection-oriented sockets. The third test is where the confusion lies [Torek 1992]. The socket flag SO_ACCEPTCONN is set only by the listen system call (Section 15.9), which is valid only for a connection-oriented server. In the normal scenario, a TCP server calls socket, bind, and then listen. Therefore, when in_pcbbind is called by bind, this socket flag is cleared. Even if the process calls socket and then listen, without calling bind, TCP's PRU_LISTEN request calls in_pcbbind to assign an ephemeral port to the socket before the socket layer sets the SO_ACCEPTCONN flag. This means the third test in the if statement, testing whether SO_ACCEPTCONN is not set, is always true. The if statement is therefore equivalent to if ((so->so_options & (SO_REUSEADDR|SO_REUSEPORT)) == 0 && ((so->so_proto->pr_flags & PR_CONNREQUIRED) == 0 || 1) wild = INPLOOKUP_WILDCARD;
Since anything logically ORed with 1 is always true, this is equivalent to if ((so->so_options & (SO_REUSEADDR|SO_REUSEPORT)) == 0) wild = INPLOOKUP_WILDCARD;
which is simpler to understand: if either of the REUSE socket options is set, wild is left as 0. If neither of the REUSE socket options are set, wild is set to INPLOOKUP_WILDCARD. In other words, when in_pcblookup is called later in the function, a wildcard match is allowed only if neither of the REUSE socket options are on. The next section of the in_pcbbind, shown in Figure 22.22, function processes the optional nam argument.
72-75 The nam argument is a nonnull pointer only when the process calls bind explicitly. For an implicit bind (a side effect of connect, listen, or in_pcbconnect, cases 3, 4, and 5 from the beginning of this section), nam is a null pointer. When the argument is specified, it is an mbuf containing a sockaddr_in structure. Figure 22.21 shows the four cases for the nonnull nam argument. 76-83 The test for the correct address family is commented out, yet the identical test in the in_pcbconnect function (Figure 22.25) is performed. We expect either both to be in or both to be out.
85-94 Net/3 tests whether the IP address being bound is a multicast group. If so, the SO_REUSEADDR option is considered identical to SO_REUSEPORT. 95-99 Otherwise, if the local address being bound by the caller is not the wildcard, ifa_ifwithaddr verifies that the address corresponds to a local interface. The comment "yech" is probably because the port number in the socket address structure must be 0 because ifa_ifwithaddr does a binary comparison of the entire structure, not just a comparison of the IP addresses. This is one of the few instances where the process must zero the socket address structure before issuing the system call. If bind is called and the final 8 bytes of the socket address structure (sin_zero [8]) are nonzero, ifa_ifwithaddr will not find the requested interface, and in_pcbbind will return an error.
100-105 The next if statement is executed when the caller is binding a nonzero port, that is, the process wants to bind one particular port number (the second and fourth scenarios from Figure 22.21). If the requested port is less than 1024 (IPPORT_RESERVED) the process must have superuser privilege. This is not part of the Internet protocols, but a Berkeley convention. A port number less than 1024 is called a reserved port and is used, for example, by the rcmd function [Stevens 1990], which in turn is used by the rlogin and rsh client programs as part of their authentication with their servers. 106-109 The function in_pcblookup (Figure 22.16) is then called to check whether a PCB already exists with the same local IP address and local port number. The second argument is the wildcard IP address (the foreign IP address) and the third argument is a port number of 0 (the foreign port). The wildcard value for the second argument causes in_pcblookup to ignore the foreign IP address and foreign port in the PCB�only the local IP address and local port are compared to sin->sin_addr and lport, respectively. We mentioned earlier that wild is set to INPLOOKUP_WILDCARD only if neither of the REUSE socket options are set. 111 The caller's value for the local IP address is stored in the PCB. This can be the wildcard address, if that's the value specified by the caller. In this case the local IP address is chosen by the kernel, but not until the socket is connected at some later time. This is because the local IP address is determined by IP routing, based on foreign IP address. The final section of in_pcbbind handles the assignment of an ephemeral port when the caller explicitly binds a port of 0, or when the nam argument is a null pointer (an implicit bind). 113-122 The next ephemeral port number to use for this protocol (TCP or UDP) is maintained in the head of the protocol's PCB list: tcb or udb. Other than the inp_next and inp_back pointers in the protocol's head PCB, the only other element of the inpcb structure that is used is the local port number. Confusingly, this local port number is maintained in host byte order in the head PCB, but in network byte order in all the other PCBs on the list! The ephemeral port numbers start at 1024 (IPPORT_RESERVED) and get incremented by 1 until port 5000 is used (IPPORT_USERRESERVED), then cycle back to 1024. The loop is executed until in_pcbbind does not find a match. so_reuseaddr Examples Let's look at some common examples to see the interaction of in_pcbbind with in_pcblookup and the two REUSE socket options. A TCP or UDP server normally starts by calling socket and bind. Assume a TCP server that calls bind, specifying the wildcard IP address and its nonzero well-known port, say 23 (the Telnet server). Also assume that the server is not already running and that the process does not set the SO_REUSEADDR socket option. in_pcbbind calls in_pcblookup with INPLOOKUP_WILDCARD as the final argument. The loop in in_pcblookup won't find a matching PCB, assuming no other process is using the server's well-known TCP port, causing a null pointer to be returned. This is OK and in_pcbbind returns 0. Assume the same scenario as above, but with the server already running when someone tries to start the server a second time. When in_pcblookup is called it finds the PCB with a local socket of {*, 23}. Since the wildcard counter is 0, in_pcblookup returns the pointer to this entry. Since reuseport is 0, in_pcbbind returns EADDRINUSE. Assume the same scenario as the previous example, but when the attempt is made to start the server a second time, the SO_REUSEADDR socket option is specified. Since this socket option is specified, in_pcbbind calls in_pcblookup with a final argument of 0. But the PCB with a local socket of {*, 23} is still matched and returned because wildcard is 0, since in_pcblookup cannot compare the two wildcard addresses (Figure 22.17). in_pcbbind again returns EADDRINUSE, preventing us from starting two instances of the server with identical local sockets, regardless of whether we specify SO_REUSEADDR or not. Assume that a Telnet server is already running with a local socket of {*, 23} and we try to start another with a local socket of {140.252.13.35, 23}. Assuming SO_REUSEADDR is not specified, in_pcblookup is called with a final argument of INPLOOKUP_WILDCARD. When it compares the PCB containing * .23, the counter wildcard is set to 1. Since a wildcard match is allowed, this match is remembered as the best match and a pointer to it is returned after all the TCP PCBs are scanned. in_pcbbind returns EADDRINUSE. This example is the same as the previous one, but we specify the SO_REUSEADDR socket option for the second server that tries to bind the local socket {140.252.13.35, 23}. The final argument to in_pcblookup is now 0, since the socket option is specified. When the PCB with the local socket {*, 23} is compared, the wildcard counter is 1, but since the final flags argument is 0, this entry is skipped and is not remembered as a match. After comparing all the TCP PCBs, the function returns a null pointer and in_pcbbind returns 0. Assume the first Telnet server is started with a local socket of {140.252.13.35, 23} when we try to start a second server with a local socket of {*, 23}. This is the same as the previous example, except we're starting the servers in reverse order this time. The first server is started without a problem, assuming no other socket has already bound port 23. When we start the second server, the final argument to in_pcblookup is INPLOOKUP_WILDCARD, assuming the SO_REUSEADDR socket option is not specified. When the PCB with the local socket of {140.252.13.35, 23} is compared, the wildcard counter is set to 1 and this entry is remembered. After all the TCP PCBs are compared, the pointer to this entry is returned, causing in_pcbbind to return EADDRINUSE. What if we start two instances of a server, both with a nonwildcard local IP address? Assume we start the first Telnet server with a local socket of {140.252.13.35, 23} and then try to start a second with a local socket of {127.0.0.1, 23}, without specifying SO_REUSEADDR. When the second server calls in_pcbbind, it calls in_pcblookup with a final argument of INPLOOKUP_WILDCARD. When the PCB with the local socket of {140.252.13.35, 23} is compared, it is skipped because the local IP addresses are not equal. in_pcblookup returns a null pointer, and in_pcbbind returns 0. From this example we see that the SO_REUSEADDR socket option has no effect on nonwildcard IP addresses. Indeed the test on the flags value INPLOOKUP_WILDCARD in in_pcblookup is made only when wildcard is greater than 0, that is, when either the PCB entry has a wildcard IP address or the IP address being bound is the wildcard. As a final example, assume we try to start two instances of the same server, both with the same nonwildcard local IP address, say 127.0.0.1. When the second server is started, in_pcblookup always returns a pointer to the matching PCB with the same local socket. This happens regardless of the SO_REUSEADDR socket option, because the wildcard counter is always 0 for this comparison. Since in_pcblookup returns a nonnull pointer, in_pcbbind returns EADDRINUSE.
From these examples we can state the rules about the binding of local IP addresses and the SO_REUSEADDR socket option. These rules are shown in Figure 22.24. We assume that localIP1 and localIP2 are two different unicast or broadcast IP addresses valid on the local host, and that localmcastIP is a multicast group. We also assume that the process is trying to bind the same nonzero port number that is already bound to the existing PCB. We need to differentiate between a unicast or broadcast address and a multicast address, because we saw that in_pcbbind considers SO_REUSEADDR to be the same as SO_REUSEPORT for a multicast address. SO_REUSEPORT Socket Option The handling of SO_REUSEPORT in Net/3 changes the logic of in_pcbbind to allow duplicate local sockets as long as both sockets specify SO_REUSEPORT. In other words, all the servers must agree to share the same local port.
|
No comments:
Post a Comment