Data conversion
Michael O'Donnell
mod+gnhlug at std.com
Wed Jan 8 13:09:38 EST 2003
In-Reply-To: Your message of "Wed, 08 Jan 2003 12:35:15 EST."
<000801c2b73c$4e7a16a0$301216cf at winbox>
References: <000801c2b73c$4e7a16a0$301216cf at winbox>
--------
I cobbled the attached script together a while back
and it may solve at least part of your problem by
allowing you to tabularize your data. It's a hack
that I never intended be seen by actual humans so,
please, cut me at least as much slack about its
(lack of) readability as you would if it were coded
in Perl... ;->
The script is just a hack to allow you to turn
glop like this:
23553065.s 23553065.w Cache News
US XUL.mfasl abook.mab bookmarks.html
cert7.db chrome cookies.txt cookies.txtORIG
cookperm.txt crunchBookmarks diffBookmarks downloads.rdf
genAlphaFolder history.dat history.mab install.log
key3.db localstore.rdf mimeTypes.rdf panacea.dat
panels.rdf prefs.bak prefs.js search.rdf
...into glop like this
23553065.s 23553065.w Cache News
US XUL.mfasl abook.mab bookmarks.html
cert7.db chrome cookies.txt cookies.txtORIG
cookperm.txt crunchBookmarks diffBookmarks downloads.rdf
genAlphaFolder history.dat history.mab install.log
key3.db localstore.rdf mimeTypes.rdf panacea.dat
panels.rdf prefs.bak prefs.js search.rdf
...by saying:
lineup <unalignedStuff >alignedStuff
-------------- next part --------------
# This shell script employs AWK to read through a text file, treating
# the Nth whitespace-separated token (as recognized by AWK) in each
# line as an element of the corresponding Nth tabular column. That
# is, the Nth elements of all lines are regarded as being members of
# the Nth column of the input.
#
# After capturing stdin in a temp file, we note the widest token in
# each column, using that information to generate an AWK format string
# suitable for use during a second pass, in which we actually emit (as
# stdout) the text from that same temp file, tabularized.
#
# This version of this script is my first attempt at it - improvements
# are undoubtedly possible...
#
# HACK: supplying ANY args on the command line is regarded as a
# request that the format string itself be displayed before
# the results, in a form suitable for use inside VI.
#
timeStamp=`date '+%Y%m%d%H%M%S'`
tempFile=/tmp/$$tempFile$timeStamp
formatString=/tmp/$$format$timeStamp
cat > $tempFile
nawk ' BEGIN { fieldMax = 0 } ; { if( NF > fieldMax ) { fieldMax = NF } for( i = 1; i <= NF; ++i ) { width = length( $i ); if( width > widest[ i ] ) { widest[ i ] = width; } } } ; END { if( fieldMax > 0 ) { printf( "{ printf( \"%%-%ds", widest[ 1 ] ); if( fieldMax > 1 ) { for( i = 2; i <= fieldMax; ++i ) { printf( " %%-%ds", widest[ i ] ); } } printf( "\\n\"" ); for( i = 1; i <= fieldMax; ++i ) { printf( ", $%d", i ); } printf( " ); }\n" ); } } ' <$tempFile >$formatString
#
# Presence of args means Please Show Format String,
# so generate a complete nawk commandline from it...
#
if [ ! -z "$1" ]
then
sed -e 's/%/\\%/g' -e "s/^/nawk '/" -e "s/$/ '/" <$formatString
fi
nawk -f $formatString <$tempFile | sed -e 's/[ ][ ]*$//'
rm -f $formatString $tempFile
More information about the gnhlug-discuss
mailing list