Solved

Problems w/ String, Regex, and infinite loop

Posted on 1997-08-12
5
498 Views
Last Modified: 2010-07-27
The current project I work on deals with submitting certain HTML-like
resources to a homemade database we have made.  The HTML-like resources
have tags in them like
<Filename> </Filename>   and
<Title> </Title>
Along the way, we create HTML files from these resources by parsing the
file.

Recently, a coworker of mine created a template language (as a C++
class) that, given one of these resources and a template which he
created, turns one of these resources into an actual HTML file.  (let me
know if I'm boring anyone ;)

Anyway, to get to the point, he reads in each line of the template file
and stores it in a variable of type String( the "super" string class ).
He then uses the contains() method in the Regex class to determine if
any keywords are located in it.  The problem is that, during the course
of looping through this file, a line is read in that causes Regex() to
go into an infinite loop (or darn near close).

The line read in is the following (he reads into the String() buffer he
has set up until he sees a ">"):

<dbmlitem
  usetable="/homeb/jdavis/testdir/FileFormat.icons"
  tagname="UNITEResourceFile/Descriptions/FileDescription/FileFormat"
  ending=""
  instance="1">

My coworker reads this in a buffer called line (a member of the String()
class) and then issues the following command:
    ending = line.at ((Regex)"[Ee][Nn][Dd][Ii][Nn][Gg]=\"[^\"]*\"");
to determine if the line has an ending attribute in it.

Unfortunately, this decides to go into the "infinite" loop on this
line.  The output from gdb is the following:

#0  rx_bitset_difference (size=8, a=0x11fffece0, b=0x140125b64) at
rx.c:563
#1  0x120022cc8 in compute_super_edge (rx=0x14001a7a0,
dfout=0x11ffff0e0,
    csetout=0x11fffece0, superstate=0x1400243a0, chr=224 'ý') at
rx.c:3706
#2  0x120022f80 in rx_handle_cache_miss (rx=0x14001a7a0,
super=0x1400243a0,
    chr=0 '\000', data=0x1400243a0) at rx.c:3853
#3  0x12001ef90 in rx_search (rxb=0x14001a7a0, startpos=0, range=161,
    stop=161, total_size=161, get_burst=0x1200260d0
<re_search_2_get_burst>,
    back_check=0x120026250 <re_search_2_back_check>,
    fetch_char=0x120026440 <re_search_2_fetch_char>,
app_closure=0x11ffff330,
    regs=0x140010ac0, resume_state=0x0, save_state=0x0) at rx.h:3589
#4  0x120026548 in re_search_2 (rxb=0x8,
    string1=0x11fffece0 "*", '*' <repeats 31 times>, size1=1,
    string2=0x1400243a0 "\201", size2=0, startpos=537026768, range=161,
    regs=0x140010ac0, stop=161) at rx.c:6444
#5  0x12001ccec in Regex::search (this=0xffffffffffffffff,
    s=0x140000658 '\001' <repeats 200 times>..., len=-3,
matchlen=0x11ffff3b0,
    startpos=0) at Regex.cc:105
#6  0x120019664 in String::at () at String.h:640
#7  0x12000eec0 in createHTML (dbmlLoc={rep = 0x1400071e0}, hdtLoc={
      rep = 0x140009d10}, htmlLoc={rep = 0x140009ef0}) at hdt.cxx:163
#8  0x1200109ec in main (argc=4, argv=0x11ffffce8) at createHTML.cxx:34

So, in other words, I have no idea what's crashing.

I know this is probably a fairly specific question.  If any one has any
idea what could be crashing this, I do have the entire source of the
program to send (I thought no one would want to see all of it on
Usenet).  BTW, this is using g++ 2.7.2 on OSF 4.0.
0
Comment
Question by:jessed
5 Comments
 
LVL 2

Expert Comment

by:kellyjj
ID: 1167135
An "infinite" loop is cause when you have the wrong termination cause for your loop.

for example:
for (int x=2; x==1; x++) printf("");
would never end. I have not seen your source code but something like is usually the reason I get infinite loops.  A annoying, but easily fixed prob.
0
 

Author Comment

by:jessed
ID: 1167136
This comment is completely unrelated to my question.  Yeah, most 10-year olds know what an infinite loop is!
0
 
LVL 2

Expert Comment

by:kellyjj
ID: 1167137
You said that you are getting stuck in a infinite loop.  Which you also said that you did not know where it is crashing. A 10 year old would know that if you are getting stuck in a infinite loop then that is proboly where you are crashing.    

I don't mean to insult or anything, but sometimes the most obvious answers are just too hard to see until someone points them out.  Believe me I have been there.
0
 
LVL 1

Accepted Solution

by:
peter_vc earned 200 total points
ID: 1167138
As an interesting test, I'd like to see what happens when you change the regex expression to match the actual case of the input string so that it reads ending= instead of [Ee][Nn]...

Anyway, I suspect the problem is in the attempt to match the double quotes.  Before I get into the answer, anyone doing regex stuff should stop reading this and go buy Mastering Regular Expressions by Jeffrey E.F. Friedl.  

Jeff talks about this problem extensively throughout the book.  It would appear that your infinte loop is the result of backtracking.  He goes on to explain how "unrolling the loop" avoids the neverending match (pg. 162-176).

Jeff offers this as a solution to match a double quoted string with possibly escaped characters imbeded:

"[^"\\]*(\\.[^"\\]*)*"

You will have to "translate" this to your flavor of regex.



0
 
LVL 84

Expert Comment

by:ozo
ID: 1167139
That regexp /[Ee][Nn][Dd][Ii][Nn][Gg]="[^"]*"/
doesn't seem like it ought to cause excessive backtracking.
What is the line which causes it to loop?

0

Featured Post

How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

Join & Write a Comment

In days of old, returning something by value from a function in C++ was necessarily avoided because it would, invariably, involve one or even two copies of the object being created and potentially costly calls to a copy-constructor and destructor. A…
Introduction This article is the first in a series of articles about the C/C++ Visual Studio Express debugger.  It provides a quick start guide in using the debugger. Part 2 focuses on additional topics in breakpoints.  Lastly, Part 3 focuses on th…
The viewer will learn how to pass data into a function in C++. This is one step further in using functions. Instead of only printing text onto the console, the function will be able to perform calculations with argumentents given by the user.
The viewer will learn additional member functions of the vector class. Specifically, the capacity and swap member functions will be introduced.

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now