Learning Perl on Win32 Systems

Learning Perl on Win32 SystemsSearch this book
Previous: 15.4 Advanced SortingChapter 15
Other Data Transformation
Next: 15.6 Exercises
 

15.5 Transliteration

When you want to take a string and replace every instance of some character with some new character, or delete every instance of some character, you can do so with carefully selected s/// commands. But suppose you had to change all of the a's into b's, and all of the b's into a's? You can't do that with two s/// commands because the second one would undo all of the changes that the first one made.

Perl provides a tr operator that does the trick:

tr/ab/ba/;

The tr operator takes two arguments: an old string and a new string. These arguments work like the two arguments to s///; in other words, there's some delimiter that appears immediately after the tr keyword that separates and terminates the two arguments (in this case, a slash, but nearly any character will do).

The tr operator modifies the contents of the $_ variable (just like s///), looking for characters of the old string within the $_ variable. All such characters found are replaced with the corresponding characters in the new string. Here are some examples:

$_ = "fred and barney";
tr/fb/bf/;        # $_ is now "bred and farney"
tr/abcde/ABCDE/;  # $_ is now "BrED AnD fArnEy"
tr/a-z/A-Z/;      # $_ is now "BRED AND FARNEY"

Notice how a range of characters can be indicated by two characters separated by a dash. If you need a literal dash in either string, precede it with a backslash.

If the new string is shorter than the old string, the last character of the new string is repeated enough times to make the strings equal length, like so:

$_ = "fred and barney";
tr/a-z/x/; # $_ is now "xxxx xxx xxxxxx"

To prevent this behavior, append a d to the end of the tr/// operator, which means delete. In this case, the last character is not replicated. Any character that matches in the old string without a corresponding character in the new string is simply removed from the string. For example:

$_ = "fred and barney";
tr/a-z/ABCDE/d; # $_ is now "ED AD BAE"

Notice how any letter after e disappears because there's no corresponding letter in the new list, and that spaces are unaffected because they don't appear in the old list.

If the new list is empty and there's no d option, the new list is the same as the old list. This default may seem silly. Why replace an I for an I and a 2 for a 2? But the command actually does something useful. The return value of the tr/// operator is the number of characters matched by the old string, and by changing characters into themselves, you can get the count of that kind of character within the string.[3] For example:

[3] This method works only for single characters. To count strings, use the /g flag to a pattern match:

while (/pattern/g) {
    $count++;
}
$_ = "fred and barney";
$count = tr/a-z//;      # $_ unchanged, but $count is 13
$count2 = tr/a-z/A-Z/;  # $_ is uppercased, and $count2 is 13

If you append a c (like appending the d), you complement the old string with respect to all 256 characters. Any character you list in the old string is removed from the set of all possible characters; the remaining characters, taken in sequence from lowest to highest, form the resulting old string. So, a way to count or change the nonletters in our string could be:

$_ = "fred and barney";
$count = tr/a-z//c; # $_ unchanged, but $count is 2
tr/a-z/_/c;         # $_ is now "fred_and_barney" (non-letters => _)
tr/a-z//cd;         # $_ is now "fredandbarney" (delete non-letters)

Notice that the options can be combined, as shown in that last example, where we first complement the set (the list of letters become the list of all nonletters) and then use the d option to delete any character in that set.

The final option for tr/// is s, which squeezes multiple consecutive copies of the same resulting translated letter into one copy. As an example, look at this:

$_ = "aaabbbcccdefghi";
tr/defghi/abcddd/s; # $_ is now "aaabbbcccabcd"

Note that the def became abc, and ghi (which would have become ddd without the s option) becomes a single d. Also note that the consecutive letters at the first part of the string are not squeezed because they didn't result from a translation. Here are some more examples:

$_ = "fred and barney, wilma and betty";
tr/a-z/X/s;   # $_ is now "X X X, X X X"
$_ = "fred and barney, wilma and betty";
tr/a-z/_/cs;  # $_ is now "fred_and_barney_wilma_and_betty"

In the first example, each word (consecutive letters) was squeezed down to a single letter X. In the second example, all chunks of consecutive nonletters became a single underscore.

Like s///, the tr operator can be targeted at another string besides $_ using the =~ operator:

$names = "fred and barney";
$names =~ tr/aeiou/X/; # $names now "frXd Xnd bXrnXy"


Previous: 15.4 Advanced SortingLearning Perl on Win32 SystemsNext: 15.6 Exercises
15.4 Advanced SortingBook Index15.6 Exercises