a question regarding the use of Split operator in Perl
Ben Scott
dragonhawk at gmail.com
Tue Sep 11 14:35:24 EDT 2007
On 9/11/07, Jerry <greenmt at gmail.com> wrote:
> I want to separate the phrases by 2+ whitespace to
> separate these phrases, so I think the perl code
> should look like this
> @list = split ( /\s{2,}/, $_);
This may do what you want:
@list = split ( /\s{2,}|\s\*\s/, $_);
That will do the split on either "two spaces" OR "a space, a star, and
a space". Someone wise once gave me the tip, "Use split() when you
want to specify what you are throwing away"; in this case, we're
throwing away either of those two things.
Some other tips (unrelated to your actual question):
(1) Alternate pattern matching delimiters
The use of alternate pattern delimiters can make regular expressions
more readable. For example:
@list = split ( m{\s{2,}|\s\*\s}, $_);
That uses {braces} instead of the default /slashes/ to identify the
pattern. You can specify whatever character you want as a delimiter
(I tend to use <> and {} a lot). The "m" prefix signifies a matching
pattern (as opposed to s//, which is a substitution pattern). The "m"
is option for m// but required for anything else.
(2) Optional syntax
You don't need to put parenthesis around arguments to split, and you
don't need to explicitly specify the default pattern match target
($_). The resulting being the following, which I personally find more
readable:
@list = split m{\s{2,}|\s\*\s};
(3) The /x modifier
You can use the /x modifier to allow whitespace and comments to be
embedded in a pattern. (Literal whitespace becomes syntactically
insignificant.) So now we get (you'll need to view in a monospace
font for it to line up properly):
@list = split m{ # split on and discard ...
\s{2,} # ... two or more whitespace characters in a row ...
| # ... or ...
\s\*\s # ... on a space, a star, and a space (exactly; in that order)
}x;
Hope this helps!
-- Ben
More information about the gnhlug-discuss
mailing list