extract string from filename

Bill McGonigle bill at bfccomputing.com
Fri Jan 13 10:31:00 EST 2006


On Jan 12, 2006, at 19:40, Zhao Peng wrote:

> I also downloaded an e-book called "Learning Perl" (OReilly, 
> 4th.Edition), and had a quick look thru its Contents of Table, but did 
> not find any chapter which looks likely addressing any issue related 
> to my question.

Good start.  Read these sections: 'A Stroll Through Perl', 'The Split 
and Join Functions', 'Lists and Arrays', 'Hashes', 'Directory Access', 
and 'File Manipulation'.

Your description is the outline of the algorithm.  Take this script 
where I've filled in the requisite perl and figure out how it works:

#!/usr/bin/perl -w
use strict;                   # show stupid errors
use warnings FATAL=>'all';    # don't let you get away with them

#I have almost 1k small files within one folder. The only pattern of 
the file names is:
my $dirname = shift; # take the command line parameter as the directory 
name
opendir DIRECTORY, $dirname;
my @files = readdir(DIRECTORY);
closedir DIRECTORY;

#string1_string2_string3_string4.sas7bdat

#Note:
#1, string2 often repeat itself across each file name
#2, All 4 strings contain no underscores.
#3, 4 strings are separated by  3 underscores (as you can see)
#4, The length of all 4 strings are not fixed.

my (@part_2s);  # we'll keep the second parts here
foreach my $file (@files) {
     next if (($file eq '.') or ($file eq '..')); # the directory will 
contain . and .. which we don't want
#My goal is to :
#1, extract string2 from each file name
     my ($filename,$extension) = split('\.',$file); # don't forget to 
escape the . since this is a regex
     my @strings = split('_',$filename);
     my $part_2 = $strings[1]; # remember, arrays in perl are 
zero-indexed
     push(@part_2s,$part_2);   # store the data we want on the end of 
the array
}

#2, keep only unique ones
# perl trick using a hash to easily get unique items
my (%temp_hash);
foreach my $part (@part_2s) {
     $temp_hash{$part} = 1;
}
my @uniques = (keys %temp_hash);

# and then sort them
my @sorted = sort { $a cmp $b}  (@uniques);  # cmp for string storting

#3, then output them to a .txt file. (one unique string2 per line)
open OUTFILE, ">output.txt";
foreach my $item (@sorted) {
     print OUTFILE $item . "\n";
}
close OUTFILE;

When you understand each line you'll be able to solve future similar 
problems easily.  Note Kevin's perl solution is equally valid and 
probably faster, but you're not going to grok it until you excercise 
the perl part of your brain for a while.

-Bill
-----
Bill McGonigle, Owner           Work: 603.448.4440
BFC Computing, LLC              Home: 603.448.1668
bill at bfccomputing.com           Cell: 603.252.2606
http://www.bfccomputing.com/    Page: 603.442.1833
Blog: http://blog.bfccomputing.com/
VCard: http://bfccomputing.com/vcard/bill.vcf




More information about the gnhlug-discuss mailing list