Help with sed script?
Kevin D. Clark
kclark at elbrysnetworks.com
Wed Sep 6 09:25:01 EDT 2006
Michael ODonnell writes:
> Hmmmm. I'd heard that sed's RE parser is a type known as
> "greedy" meaning that every expression matches the longest
> possible string in the input. I therefore can't understand
> how after all the leading whitespace has been matched there
> can be any whitespace "left over" to match the not-a-hashmark
> expression, but apparently there is. Anyway, changing that part
> to be not-a-hashmark-or-whitespace seems to solve the problem:
>
> sed -r -e '/^[[:space:]]*[^#[:space:]].*[[:space:]]+xyz[[:space:]]*/{s/^.*$/REWRITTEN/}'
Regular expression engines are typically greedy because this goes
along with the formalism of what a regular expression is: regular
expressions are a a type of grammar accepted by finite state automata (FSA).
The regexp engine, which tries to mimic a FSA and also perform useful
work, employs a technique called backtracking to accomplish its work.
The backtracking process is the thing that seems to be causing you
some surprise here: during the backtrack process with your previous
regexp, the engine matches a space [ ] with your "not a hash mark"
regexp [^#].
Not all regexp engines are always greedy. Perl's, for example, can be
directed to not be greedy. You can do this with the "?" operator.
For example, /ab*?/.
Are you sure that your new regexp does what you want?:
echo 'xyz' | \
sed -r -e '/^[[:space:]]*[^#[:space:]].*[[:space:]]+xyz[[:space:]]*/{s/^.*$/REWRITTEN/}'
You specify "surrounded by whitespace" in your original description,
but "occurs at the beginning of a line" might be reasonable too (I
don't know what your file looks like...).
Regards,
--kevin
--
GnuPG ID: B280F24E And the madness of the crowd
alumni.unh.edu!kdc Is an epileptic fit
-- Tom Waits
More information about the gnhlug-discuss
mailing list