extract string from filename
Bill McGonigle
bill at bfccomputing.com
Fri Jan 13 10:31:00 EST 2006
On Jan 12, 2006, at 19:40, Zhao Peng wrote:
> I also downloaded an e-book called "Learning Perl" (OReilly,
> 4th.Edition), and had a quick look thru its Contents of Table, but did
> not find any chapter which looks likely addressing any issue related
> to my question.
Good start. Read these sections: 'A Stroll Through Perl', 'The Split
and Join Functions', 'Lists and Arrays', 'Hashes', 'Directory Access',
and 'File Manipulation'.
Your description is the outline of the algorithm. Take this script
where I've filled in the requisite perl and figure out how it works:
#!/usr/bin/perl -w
use strict; # show stupid errors
use warnings FATAL=>'all'; # don't let you get away with them
#I have almost 1k small files within one folder. The only pattern of
the file names is:
my $dirname = shift; # take the command line parameter as the directory
name
opendir DIRECTORY, $dirname;
my @files = readdir(DIRECTORY);
closedir DIRECTORY;
#string1_string2_string3_string4.sas7bdat
#Note:
#1, string2 often repeat itself across each file name
#2, All 4 strings contain no underscores.
#3, 4 strings are separated by 3 underscores (as you can see)
#4, The length of all 4 strings are not fixed.
my (@part_2s); # we'll keep the second parts here
foreach my $file (@files) {
next if (($file eq '.') or ($file eq '..')); # the directory will
contain . and .. which we don't want
#My goal is to :
#1, extract string2 from each file name
my ($filename,$extension) = split('\.',$file); # don't forget to
escape the . since this is a regex
my @strings = split('_',$filename);
my $part_2 = $strings[1]; # remember, arrays in perl are
zero-indexed
push(@part_2s,$part_2); # store the data we want on the end of
the array
}
#2, keep only unique ones
# perl trick using a hash to easily get unique items
my (%temp_hash);
foreach my $part (@part_2s) {
$temp_hash{$part} = 1;
}
my @uniques = (keys %temp_hash);
# and then sort them
my @sorted = sort { $a cmp $b} (@uniques); # cmp for string storting
#3, then output them to a .txt file. (one unique string2 per line)
open OUTFILE, ">output.txt";
foreach my $item (@sorted) {
print OUTFILE $item . "\n";
}
close OUTFILE;
When you understand each line you'll be able to solve future similar
problems easily. Note Kevin's perl solution is equally valid and
probably faster, but you're not going to grok it until you excercise
the perl part of your brain for a while.
-Bill
-----
Bill McGonigle, Owner Work: 603.448.4440
BFC Computing, LLC Home: 603.448.1668
bill at bfccomputing.com Cell: 603.252.2606
http://www.bfccomputing.com/ Page: 603.442.1833
Blog: http://blog.bfccomputing.com/
VCard: http://bfccomputing.com/vcard/bill.vcf
More information about the gnhlug-discuss
mailing list