Help with sed script?

Kevin D. Clark kclark at elbrysnetworks.com
Wed Sep 6 09:25:01 EDT 2006


Michael ODonnell writes:

> Hmmmm.  I'd heard that sed's RE parser is a type known as
> "greedy" meaning that every expression matches the longest
> possible string in the input.  I therefore can't understand
> how after all the leading whitespace has been matched there
> can be any whitespace "left over" to match the not-a-hashmark
> expression, but apparently there is.  Anyway, changing that part
> to be not-a-hashmark-or-whitespace seems to solve the problem:
>
>   sed -r -e '/^[[:space:]]*[^#[:space:]].*[[:space:]]+xyz[[:space:]]*/{s/^.*$/REWRITTEN/}'

Regular expression engines are typically greedy because this goes
along with the formalism of what a regular expression is:  regular
expressions are a a type of grammar accepted by finite state automata (FSA).

The regexp engine, which tries to mimic a FSA and also perform useful
work, employs a technique called backtracking to accomplish its work.
The backtracking process is the thing that seems to be causing you
some surprise here:  during the backtrack process with your previous
regexp, the engine matches a space [ ] with your "not a hash mark"
regexp [^#].

Not all regexp engines are always greedy.  Perl's, for example, can be
directed to not be greedy.  You can do this with the "?" operator.
For example, /ab*?/.

Are you sure that your new regexp does what you want?:

echo 'xyz' | \
 sed -r -e '/^[[:space:]]*[^#[:space:]].*[[:space:]]+xyz[[:space:]]*/{s/^.*$/REWRITTEN/}'

You specify "surrounded by whitespace" in your original description,
but "occurs at the beginning of a line" might be reasonable too (I
don't know what your file looks like...).

Regards,

--kevin
-- 
GnuPG ID: B280F24E                     And the madness of the crowd
alumni.unh.edu!kdc                     Is an epileptic fit
                                       -- Tom Waits




More information about the gnhlug-discuss mailing list