global search and replace

bscott at ntisys.com bscott at ntisys.com
Tue May 13 11:01:20 EDT 2003


  For those who just want a solution:

  find -type f -print0 | xargs -0 perl -i.bak -pe 's/foo/bar/g'

  The above will replace all instances of string "foo" with string "bar" in
all files under the current directory.  The original files are saved with a
".bak" extension added.  You can further specify options to "find" to narrow
the search, should you wish.

  Read on for a more detailed explanation and analysis.

On Tue, 13 May 2003, at 10:21am, pll at lanminds.com wrote:
> Well, assuming that all the files are under a single hierarchy, you 
> could do something like this:
> 
>     for i in `find ./ -type f -print | xargs grep foo | cut -f1 -d: | sort -u `
>     do
>        perl -i.bak -anpe '$_ =~ s/foo/bar/g' $i
>     done

  Wow, that's... weird.

  grep'ing each file for the string to replace just slows things down.  You
end up grovelling every file at least once, and every matching gets done
file twice.  Just let Perl do the search.

  The entire 'for' loop is unneeded, and results in Perl getting invoked
repeatedly; far better to just use xargs to invoke Perl once.

  Your example breaks on "unsafe" file names.  (Consider this parenthetical
remark to take the place of the re-hash of the file naming debate; such file
names exist, regardless of who likes them or not.)

  The '-p' switch and the '-n' switch conflict in their semantics.  The Perl
documentation says '-p' overrides '-n', but that's not something I would
like to depend on.

  I cannot figure out why you would want to use the '-a' switch at all.

  The '=~' binding operator is unnecessary; 's///' operates on the default
pattern space... well, by default.  :-)

  This, IMNSHO, is much better:

  find -type f -print0 | xargs -0 perl -i.bak -pe 's/foo/bar/g'

  The above uses "find" to find all regular files ("-type f") under the
current directory.  The "-print0" argument instructs find to output
null-terminated strings, avoiding complications from shell meta-characters
in filenames.

  The "xargs" command takes each file name from input, and runs the given
Perl command on them, all at once.  The "-0" argument tells "xargs" to
expect null-terminated strings (as output by "find -print0").

  The "-i" switch tells Perl to do the replace operation "in place".  
The ".bak" part tells Perl to keep the old file with a ".bak"  (backup)
extension added.

  The "-p" switch tells Perl to put an implicit read/process/print loop
around the entire program.  See the "perlrun" document for details.

  The "-e" switch gives Perl a program to execute as an argument.

  The "s/foo/bar/g" does the actual replacement.  It uses the default
pattern space (which is also used by the "-p" switch).  The "s" stands for
substitute.  The "/g" switch means "global", and will cause every occurrence
of "foo" in a given line of the file to be replaced.  Otherwise, just the
first instance is replaced.  (I always forget the "/g" switch and end up
needing to run my command twice; thanks to Paul for reminding me.)

  Finally, Perl will implicitly receive each file name argument when run by
xargs.  See the "xargs" man page for details.

  The only thing the above does not do that Paul's example did is sort the
filenames prior to running the search-and-replace operation.  But, since the
entire thing is intended to be non-interactive, I'm not really sure what the
benefit of that sort is.  :-)

> Oh, btw, the above will result in all the original files being moved to
> *.bak.  If you do not want this, remove the '-i.bak' above.

  Remove the ".bak" part; leave "-i", or you will get a lot of output on
stdout instead.  :-)

-- 
Ben Scott <bscott at ntisys.com>
| The opinions expressed in this message are those of the author and do  |
| not represent the views or policy of any other person or organization. |
| All information is provided without warranty of any kind.              |






More information about the gnhlug-discuss mailing list