Learning Perl

Learning PerlSearch this book
Previous: 7.5 SubstitutionsChapter 7
Regular Expressions
Next: 7.7 Exercises

7.6 The split and join Functions

Regular expressions can be used to break a string into fields. The split function does this, and the join function glues the pieces back together.

7.6.1 The split Function

The split function takes a regular expression and a string, and looks for all occurrences of the regular expression within that string. The parts of the string that don't match the regular expression are returned in sequence as a list of values. For example, here's something to parse colon-separated fields, such as in UNIX /etc/passwd files:

$line = "merlyn::118:10:Randal:/home/merlyn:/usr/bin/perl";
@fields = split(/:/,$line); # split $line, using : as delimiter
# now @fields is ("merlyn","","118","10","Randal",
#                 "/home/merlyn","/usr/bin/perl")

Note how the empty second field became an empty string. If you don't want this, match all of the colons in one fell swoop:

@fields = split(/:+/, $line);

This matches one or more adjacent colons together, so there is no empty second field.

One common string to split is the $_ variable, and that turns out to be the default:

$_ = "some string";
@words = split(/ /); # same as @words = split(/ /, $_);

For this split, consecutive spaces in the string to be split will cause null fields (empty strings) in the result. A better pattern would be / +/, or ideally /\s+/, which matches one or more whitespace characters together. In fact, this pattern is the default pattern,[8] so if you're splitting the $_ variable on whitespace, you can use all the defaults and merely say:

[8] Actually, the " " string is the default pattern, and this will cause leading whitespace to be ignored, but that's still close enough for this discussion.

@words = split; # same as @words = split(/\s+/, $_);

Empty trailing fields do not normally become part of the list. This is not generally a concern. A solution like this,

$line = "merlyn::118:10:Randal:/home/merlyn:";
($name,$password,$uid,$gid,$gcos,$home,$shell) =
    split(/:/,$line); # split $line, using : as delimiter

simply gives $shell a null (undef) value if the line isn't long enough or if it contains empty values in the last field. (Extra fields are silently ignored, because list assignment works that way.)

7.6.2 The join Function

The join function takes a list of values and glues them together with a glue string between each list element. It looks like this:

$bigstring = join($glue,@list);

For example, to rebuild the password line, try something like:

$outline = join(":", @fields);

Note that the glue string is not a regular expression - just an ordinary string of zero or more characters.

If you need to get glue ahead of every item instead of just between items, a simple cheat suffices:

$result = join ("+", "", @fields);

Here, the extra "" is treated as an empty element, to be glued together with the first data element of @fields. This results in glue ahead of every element. Similarly, you can get trailing glue with an empty element at the end of the list, like so:

$output = join ("\n", @data, "");

Previous: 7.5 SubstitutionsLearning PerlNext: 7.7 Exercises
7.5 SubstitutionsBook Index7.7 Exercises