[Chapter 29] 29.4 Rule Set 3

29.4 Rule Set 3

Rule set 3 is the first to process every address. It puts each into a form that simplifies the tasks of other rule sets. The most common method is to have rule set 3 focus an address (place angle brackets around the host part). Then later rules don't have to search for the host part, because it is already highlighted. For example, consider trying to spot the recipient host in this mess:

uuhost!user%host1%host2

Here, user is eventually intended to receive the mail message on the host uuhost. But where should sendmail send the message first? As it happens, sendmail selects uuhost. Focusing on this address therefore results in the following:

user%host1%host2<@uuhost.uucp>

Note that uuhost was moved to the end, the ! was changed to an @, and .uucp was appended. The @ is there so that all focused parts uniformly contain an @ just before the targeted host. Later, when we take up postprocessing, we'll show how rule set 4 moves the uuhost back to the beginning and replaces the !.

29.4.1 A Special Case: From:<>

The first rule in a typical rule set 3 handles addresses that are composed of empty angle brackets. These represent the special case of an empty or nonexistent address. Empty addresses should be turned into the address of the pseudo-user that bounces mail, Mailer-Daemon:

# handle "from:<>" special case
R$*<>$*      $@<@>       empty becomes special

Here, empty angle brackets, no matter what surrounds them ($*), are rewritten to be a lone @. Other rule sets later turn this special token into $n (which contains Mailer-Daemon as its value).

29.4.2 Basic Textual Canonicalization

Addresses can be legally expressed in only four formats: [2]

[2] Actually, we are fudging for simplicity. Addresses can appear in various permutations of those shown and we completely ignore the list:members; form of address.

address
address (full name)
<address>
full name <address>

When sendmail preprocesses an address that is in the second format, it removes (and saves for later use) the full name from within the parentheses. The last two formats, however, contain additional characters and information that are not discarded during preprocessing. As a consequence, rule set 3 must take on the job of discarding the unwanted information:

# basic textual canonicalization
R$*<$*<$*<$*>$*>$*>$*   $4        3-level <> nesting
R$*<$*<$*>$*>$*         $3        2-level <> nesting
R$*<$*>$*               $2        basic RFC821/822 parsing

Here, we discard everything outside of and including the innermost pair of angle brackets. Three rules are required to do this because of the minimal-matching nature of the LHS operators (see Section 8.7.2, "Minimal Matching"). Consider trying to de-nest a three-level workspace using only a rule like the third:

the workspace   A < B < C < D > C > B > A
$*  matches  A
<   matches  <
$+  matches  B < C < D
>   matches  >
$*  matches  C > B > A

Clearly, the result B<C<D is not the value between the innermost pair of angle brackets and will result in an address that produces the error message:

Unbalanced '<'

John Halleck designed a clever alternative to the above traditional technique that is now included with V8 sendmail:

R$*                     $: < $1 >                       housekeeping <>
R$+ < $* >                 < $2 >                       strip excess on left
R< $* > $+                 < $1 >                       strip excess on right
R<>                     $@ < @ >                        MAIL FROM:<> case
R< $+ >                 $: $1                           remove housekeeping <>

Here, angle bracket pairs are stripped first from the left of an address, then from the right, and finally whatever is left must be the address.

29.4.3 Handling Routing Addresses

The sendmail program must be able to handle addresses that are in route address syntax. Such addresses are in the form @A,@B:user@C (which means that mail should be sent first to A, then from A to B, and finally from B to C). [3] The commas are converted to colons for easier design of subsequent rules. They must be converted back to commas by rule set 4. Rule set 3 uses a simple rule to convert all commas to colons:

[3] Also see the DontPruneRoutes option in Section 34.8.20, DontPruneRoutes (R) and the F=d delivery agent flag in Section 30.8.16, F=d.

# make sure list syntax is easy to parse
R@ $+ , $+        @ $1 : $2           change all "," to ":"

The iterative nature of rules comes into play here. As long as there is an @ followed by anything ($+), then a comma, then anything, this rule repeats, converting the comma to a colon. The result is then carried down to the next rule that focuses:

R@ $+ : $+         $@ <@ $1> : $2      focus route-addr

Once that host has angle brackets placed around it (is focused), the job of rule set 3 ends, and it exits (the $@ prefix in the RHS).

29.4.4 Handling Specialty Addresses

A whole book is dedicated to the myriad forms of addressing that might face a site administrator: !%@:: A Directory of Electronic Mail Addressing & Networks by Donnalyn Frey and Rick Adams (O'Reilly & Associates, 1993). We won't duplicate that work here; rather, we point out that most such addresses are handled nicely by existing configuration files. Consider the format of a DECnet address:

host::user

One approach to handling such an address in rule set 3 is to convert it into the Internet user@host.domain form:

R$+ :: $+        $@ $2 @ $1.decnet

Here, we reverse the host and user and put them into Internet form. The .decnet can later be used by rule set 0 to select an appropriate delivery agent.

This is a simple example of a special address problem from the many that can develop. In addition to DECnet, for example, your site may have to deal with Xerox Grapevine addresses, X.400 addresses, or UUCP addresses. The best way to handle such addresses is to copy what others have done.

29.4.5 Focusing for @ Syntax

The last few rules in our illustration of rule set 3 are used to process the Internet-style user@domain address:

# find focus for @ syntax addresses
R$+ @ $+                $: $1 <@ $2>        focus on domain
R$+ < $+ @ $+ >            $1 $2 <@ $3>     move gaze right
R$+ <@ $+ >             $@ $1 <@ $2>        already focused

For an address like something@something, the first rule focuses on all the tokens following the first @ as the name of the host. Recall that the $: prefix to the RHS prevents potentially infinite recursion. Assuming that the workspace started with:

user@host1@host2

this first rewrite results in

user<@host1@host2>

The second rule (move gaze right) then attempts to fine-tune the focus by making sure only the rightmost @host is selected. This rule can move the focus right, using recursion, and can handle addresses that are as extreme as the following:

user<@host1@host2@host3@host4> becomes  user@host1@host2@host3<@host4>

The third rule checks to see whether the workspace has been focused. If it has, it returns the focused workspace (the $@ prefix in the RHS), and its job is done.

Any address that has not been handled by rule set 3 is unchanged and probably not focused. Since rule set 0 expects all addresses to be focused so that it can select appropriate delivery agents, such unfocused addresses may bounce. Many configuration files allow local addresses (just a username) to be unfocused.


29.3 The Sequence of Rule Sets		29.5 Rule Set 4