searching/grepping for words "near" each other
Ben Scott
dragonhawk at gmail.com
Thu Apr 30 16:06:08 EDT 2009
On Thu, Apr 30, 2009 at 12:02 PM, Kevin D. Clark
<kevin_d_clark at comcast.net> wrote:
> ("0777" causes Perl to "undef $/" (go into "slurp mode"),
It took me a minute, and a RTFM moment, to figure that out. For
those who, like me, didn't get it: That's a capital letter "oh", not a
zero. The "-O" switch to Perl specifies the record separator, which
is basically the line separator. Normally it's a C newline. You can
specify an octal or hex value for the character. But there are some
magic values:
-O00 (Two zeros.) Paragraph mode, separating records by two or more
blank lines.
-O (Nothing.) ASCII NUL separator. Useful with "find -print0".
-O777 No record separator.
With no record separator, the entire file gets sucked in as the
first and only record, newlines and all. So it then becomes useful to
match newlines with the /s regexp modifier. (Normally, the newline
will only be at the end of the record. Matching that is rather
boring. Especially if you use "chomp".)
I presume 777 was used because 777 was never a valid character for
either hex or octal. But then Unicode happened, and characters could
be bigger than one byte. So TFM says Unicode has to be specified in
hex, not octal.
In code, you can set $/ to multi-character strings.
-- Ben
More information about the gnhlug-discuss
mailing list