extract string
Zhao Peng
greenmt at gmail.com
Wed Jan 11 01:35:01 EST 2006
Hi All,
First I really cannot be more grateful for the answers to my question
from all of you, I appreciate your help and time. I'm especially touched
by the outpouring of response on this list., which I have never
experienced before anywhere else.
Secondly I'm sorry for the big stir-up as to "homework problems" which
flooded the list, since I'm origin of it.
Kenny, your "grep univ abc.txt | cut -f3 -d, | sed s/\"//g >> dev.txt"
works. I mis-read /\ as a simliar sign on the top of "6" key on the
keyboard(so when I typed that sign, I felt strange that it is much
smaller than /\, but didn't realize that they just are not the same
thing), instead of forward slash and back slash. I felt really
embarrassed with my stupid mistake. //blush
Kenny, regarding missing column issue, let me try to explain it again.
Below is quoted from my original post:
============================================
Also, if one column is missing, and "," is used to indicate that missing
column, like the following (2nd column of 3rd line is missing):
"name","age","school"
"jerry" ,"21","univ of Vermont"
"jesse",,,"Dartmouth college"
"jack","18","univ of Penn"
"john","20","univ of south Florida"
===========================================
You said that "there is an extra column in the 3rd line". I disagree
with you from my perspective. As you can see, there are 3 commas in
between "jesse" and "Dartmouth college". For these 3 commas, again, if
we think the 2nd one as an merely indication that the value for age
column is missing, then the 3rd line will be be read as ["jesse",
MISSING, "Dartmouth college"], not ["jesse",empty,empty, "Dartmouth
college"] as you suggested.
Paul, as to your "simplest by what measurement" question. I was thinking
of both "easiest to remember" and "easiest to understand" when I was
posting my question. Now I desire for "most efficient" approach. I know
that will be my homework.
BTW,
A bit about me: I'm a junior SAS programmer at Dartmouth Medical school.
(FYI: core strength of SAS lies in statistical analysis, I think, so you
could say it's a statistical software, check www.sas.com). We run SAS on
a RedHat server, but I basically know nothing about linux before I
started working on this position(July, 2005). Fortunately, SAS
programming doesn't require much linux knowledge. However, as you can
imagine, at least I need to know some basic linux commands since I work
on linux platform.
Part of my primary job responsibilities is to convert raw data into SAS
data sets. My "extract string" question comes from processing a raw data
file in .txt format, which doesn't have any documentation, except the
variable list. By looking at the raw data, I know that each variable is
separated by a comma. For one particular variable(column) called
"school", the length of some of its value is quite long(like: Univ of
Wisconsin at Madison, Health Sci Ctr), but I don't know the definite
length. I need to know it, because if the length I specify it not
enough, only partial values will be read. Many of its values contain
"univ", so I just thought if I could extract all strings containing
"univ" from that variable(column), I will have a better chance to figure
out the length of "school". That's why I had this question.
Thank you all again!
Zhao
More information about the gnhlug-discuss
mailing list