Data conversion

Michael O'Donnell mod+gnhlug at std.com
Wed Jan 8 13:09:38 EST 2003


In-Reply-To: Your message of "Wed, 08 Jan 2003 12:35:15 EST."
             <000801c2b73c$4e7a16a0$301216cf at winbox>
References:  <000801c2b73c$4e7a16a0$301216cf at winbox>
--------


I cobbled the attached script together a while back
and it may solve at least part of your problem by
allowing you to tabularize your data.  It's a hack
that I never intended be seen by actual humans so,
please, cut me at least as much slack about its
(lack of) readability as you would if it were coded
in Perl...  ;->

The script is just a hack to allow you to turn
glop like this:

   23553065.s 23553065.w Cache News
   US XUL.mfasl abook.mab bookmarks.html
   cert7.db chrome cookies.txt cookies.txtORIG
   cookperm.txt crunchBookmarks diffBookmarks downloads.rdf
   genAlphaFolder history.dat history.mab install.log
   key3.db localstore.rdf mimeTypes.rdf panacea.dat
   panels.rdf prefs.bak prefs.js search.rdf

...into glop like this

   23553065.s     23553065.w      Cache         News
   US             XUL.mfasl       abook.mab     bookmarks.html
   cert7.db       chrome          cookies.txt   cookies.txtORIG
   cookperm.txt   crunchBookmarks diffBookmarks downloads.rdf
   genAlphaFolder history.dat     history.mab   install.log
   key3.db        localstore.rdf  mimeTypes.rdf panacea.dat
   panels.rdf     prefs.bak       prefs.js      search.rdf


...by saying:

   lineup <unalignedStuff >alignedStuff

-------------- next part --------------
# This shell script employs AWK to read through a text file, treating
# the Nth whitespace-separated token (as recognized by AWK) in each
# line as an element of the corresponding Nth tabular column.  That
# is, the Nth elements of all lines are regarded as being members of
# the Nth column of the input.
#
# After capturing stdin in a temp file, we note the widest token in
# each column, using that information to generate an AWK format string
# suitable for use during a second pass, in which we actually emit (as
# stdout) the text from that same temp file, tabularized.
#
# This version of this script is my first attempt at it - improvements
# are undoubtedly possible...
#
# HACK: supplying ANY args on the command line is regarded as a
#       request that the format string itself be displayed before
#       the results, in a form suitable for use inside VI.
#

   timeStamp=`date '+%Y%m%d%H%M%S'`
    tempFile=/tmp/$$tempFile$timeStamp
formatString=/tmp/$$format$timeStamp

cat > $tempFile

nawk ' BEGIN { fieldMax = 0 } ; { if( NF > fieldMax ) { fieldMax = NF } for( i = 1; i <= NF; ++i ) { width = length( $i ); if( width > widest[ i ] ) { widest[ i ] = width; } } } ; END { if( fieldMax > 0 ) { printf( "{ printf( \"%%-%ds", widest[ 1 ] ); if( fieldMax > 1 ) { for( i = 2; i <= fieldMax; ++i ) { printf( " %%-%ds", widest[ i ] ); } } printf( "\\n\"" ); for( i = 1; i <= fieldMax; ++i ) { printf( ", $%d", i ); } printf( " ); }\n" ); } } ' <$tempFile >$formatString

#
# Presence of args means Please Show Format String,
# so generate a complete nawk commandline from it...
#
if [ ! -z "$1" ]
then
    sed -e 's/%/\\%/g' -e "s/^/nawk '/" -e "s/$/ '/" <$formatString
fi

nawk -f $formatString <$tempFile | sed -e 's/[	 ][	 ]*$//'

rm   -f $formatString  $tempFile



More information about the gnhlug-discuss mailing list