Searching for what is not there using REGEX in only a single step

Greg Rundlett greg at freephile.com
Fri May 28 13:07:01 EDT 2004


NOTE:  I know how to solve this problem by processing the text in 2 
steps, first finding all occurences of  /A(.*)C/ and then searching for 
B in $1, but I'm wondering if there is some advanced expression for 
doing it in only one step.

I have an interesting little problem that I'm wondering if someone knows 
how to solve using regular expressions:

Given some larger text, where you have many subsections that are made up 
of a token A followed by an indeterminate amount of text NOT including 
token B and then token C, how can you find those chunks of text?  I've 
been trying with Perl-compatible Regular Expressions through PHP, but 
can't come up with a way to do it.

For example,
I have an XML file, with a bunch of records.  Some records are fine.  
Others are missing a chunk.  I want to find the broken records and 
insert the missing tags.
Broken Record
  </fh>

    30101 Agoura Ct., #115<br /></location_addr1>
    <location_addr2></location_addr2>

Fixed Record
  </fh>
  <location id="">
    <location_name>

    </location_name>
    <location_addr1>30101 Agoura Ct., #115<br /></location_addr1>
    <location_addr2></location_addr2>

I thought I would be able to find </fh> followed by </locacation_addr1> 
and do a lookback negative assertion to say that <location_addr1> was 
not present.  However, not knowing the length of text between </fh> and 
</location_addr1> seems to make this impossible.

-- 
FREePHILE
We are 'Open' for Business
Free and Open Source Software
http://www.freephile.com
(978) 270-2425
"Paul Lynde to block..."
-- a contestant on "Hollywood Squares"




More information about the gnhlug-discuss mailing list