extract string from filename
Dan Jenkins
dan at rastech.com
Fri Jan 13 21:02:02 EST 2006
Zhao Peng wrote:
> string1_string2_string3_string4.sas7bdat
>
> abc_st_nh_num.sas7bdat
> abc_st_vt_num.sas7bdat
> abc_st_ma_num.sas7bdat
> abcd_region_NewEngland_num.sas7bdat
> abcd_region_South_num.sas7bdat
>
> My goal is to :
> 1, extract string2 from each file name
> 2, then sort them and keep only unique ones
> 3, then output them to a .txt file. (one unique string2 per line)
Solution #1:
ls -1 *sas7bdat|awk -F_ '{print $2}'|sort -fu|cat -n >output.txt
Take output of ls, 1 file per line (ls -1) - only files ending with sas7bdat
Feed into awk, splitting on _, print the 2nd field
Sort ignoring case, eliminating duplicates (sort options: f "folds
case", u "keeps only uniques")
Number the lines (cat -n)
Put output in file named output.txt
Solution #2:
ls -1 *sas7bdat|sed 's/^\([a-zA-Z0-9]*_\)\([a-zA-Z0-9]*\)_.*$/\2/'|sort
-fu|cat -n >output.txt
Use sed (stream editor) to break up filenames into atoms separated by _,
and output the 2nd one (the \2). Regular expressions (regex) can be very
handy. ^ matches beginning of string, [a-zA-Z0-9]*_ matches
letter/number string ending with _, the backslashed parentheses groups
the patterns, so the 2nd one can be extracted.
There are many solutions to the problem, as you can see.
--
Dan Jenkins (dan at rastech.com)
Rastech Inc., Bedford, NH, USA --- 1-603-206-9951
*** Technical Support Excellence for over a quarter century
More information about the gnhlug-discuss
mailing list