Friday, November 13, 2009

Stateful and Stateless Data










Perl for System AdministrationSearch this book









9.3. Stateful and Stateless Data








In addition to the format in which
the log file presents its data, it is important to think about the
contents of these files because what the data
represents and how it is represented both
contribute to our plan of attack when programming. With log file
contents, often a distinction can be made between data that is
stateful and data that is
stateless. Let's see a couple of
examples that will make this distinction clear.







Here's a three-line snippet from an Apache web server log. Each
line represents a request answered by the web server:








esnet-118.dynamic.rpi.edu - - [13/Dec/1998:00:04:20 -0500] "GET home/u1/tux/
tuxedo05.gif

HTTP/1.0" 200 18666 ppp-206-170-3-49.okld03.pacbell.net - - [13/Dec/1998:00:04:21
-0500] "GET home/u2/news.htm

HTTP/1.0" 200 6748 ts007d39.ftl-fl.concentric.net - - [13/Dec/1998:00:04:22 -0500]
"GET home/u1/bgc.jpg HTTP/1.1" 304 -







Here are a few lines from a printer daemon log file:








Aug 14 12:58:46 warhol  printer: cover/door open 
Aug 14 12:58:58 warhol printer: error cleared
Aug 14 17:16:26 warhol printer: offline or intervention needed
Aug 14 17:16:43 warhol printer: error cleared
Aug 15 20:40:45 warhol printer: paper out
Aug 15 20:40:48 warhol printer: error cleared







In both cases, each line of the log file is independent of every
other line in the file. We can find patterns or aggregate lines
together gathering statistics, but there's nothing inherent in
the data that connects the log file entries to each other.







Now consider some slightly doctored entries from a
sendmail mail log:








Dec 13 05:28:27 mailhub sendmail[26690]: FAA26690:
from=<user@has.a.godcomplex.
com>, size=643, class=0, pri=30643, nrcpts=1,
msgid=<199812131032.CAA22824@has.a.
godcomplex.com>, proto=ESMTP, relay=user@has.a.godcomplex.com [216.32.32.176]

Dec 13 05:29:13 mailhub sendmail[26695]: FAA26695: from=<root@host.ccs.neu.edu>,
size=9600, class=0, pri=39600, nrcpts=1, msgid=<199812131029.FAA15005@host.ccs.
neu.edu>, proto=ESMTP, relay=root@host.ccs.neu.edu [129.10.116.69]

Dec 13 05:29:15 mailhub sendmail[26691]: FAA26690: to=<user@ccs.neu.edu>,
delay=00:00:02, xdelay=00:00:01, mailer=local, stat=Sent

Dec 13 05:29:19 mailhub sendmail[26696]: FAA26695: to="|IFS=' '&&exec /usr/bin/
procmail -f-||exit 75 #user", ctladdr=user (6603/104), delay=00:00:06,
xdelay=00:
00:06, mailer=prog, stat=Sent







Unlike the previous examples, there is a definite connection between
the lines in the file. Figure 9-2 makes that
connection explicit.










Figure 9.2. Related entries in the sendmail log



Each line has at least one partner entry that shows the source and
destinations of each message. When a message enters the system it is
assigned a unique "Message-ID," highlighted above, which
identifies that message while it is in play. This Message-ID allows
us to associate related lines in an interleaved log file, essentially
giving a message an existence or "state" in between
entries of a log file.







Sometimes we care about the "distance" between state
transitions. Take, for instance, the wtmp file
we saw earlier in this chapter. Not only are we interested in when
someone logs in and when they log out (the two state transitions in
the log), but in the time between these two events, i.e., how long
they were logged in.







The most sophisticated log files can add another twist. Here are some
excerpts from a POP (Post Office Protocol) server's log file
while the server is in debug mode. The names and IP addresses have
been changed to protect the innocent:








Jan 14 15:53:45 mailhub popper[20243]: Debugging turned on
Jan 14 15:53:45 mailhub popper[20243]: (v2.53) Servicing request from
"client" at 129.X.X.X
Jan 14 15:53:45 mailhub popper[20243]: +OK QPOP (version 2.53) at mailhub starting.
Jan 14 15:53:45 mailhub popper[20243]: Received: "USER username"
Jan 14 15:53:45 mailhub popper[20243]: +OK Password required for username.
Jan 14 15:53:45 mailhub popper[20243]: Received: "pass xxxxxxxxx"
Jan 14 15:53:45 mailhub popper[20243]: +OK username has 1 message (26627 octets).
Jan 14 15:53:46 mailhub popper[20243]: Received: "LIST"
Jan 14 15:53:46 mailhub popper[20243]: +OK 1 messages (26627 octets)
Jan 14 15:53:46 mailhub popper[20243]: Received: "RETR 1"
Jan 14 15:53:46 mailhub popper[20243]: +OK 26627 octets
<message text appears here>
Jan 14 15:53:56 mailhub popper[20243]: Received: "DELE 1"
Jan 14 15:53:56 mailhub popper[20243]: Deleting message 1 at offset 0 of length 26627
Jan 14 15:53:56 mailhub popper[20243]: +OK Message 1 has been deleted.
Jan 14 15:53:56 mailhub popper[20243]: Received: "QUIT"
Jan 14 15:53:56 mailhub popper[20243]: +OK Pop server at mailhub signing off.
Jan 14 15:53:56 mailhub popper[20243]: (v2.53) Ending request from "user" at (client) 129.X.X.X







Not only do we encounter connections ("Servicing request
from...") and disconnections ("Ending request
from..."), but we have information detailing what took place in
between these state transitions.







Each of these middle events also provides potentially useful
"distance" information. If there was a problem with our
POP server, we might look to see how long each step in the above
output took.







In the case of an FTP server, you may be able to draw some
conclusions from this data about how people interact with your site.
On average, how long do people stay connected before they transfer
files? Do they pause between commands for a long time? Do they always
travel from one part of your site to another before downloading the
same file? The interstitial data can be a rich source of
information.















Copyright © 2001 O'Reilly & Associates. All rights reserved.







No comments:

Post a Comment