Solved

Problems w/ String, Regex, and infinite loop

Posted on 1997-08-12
5
499 Views
Last Modified: 2010-07-27
The current project I work on deals with submitting certain HTML-like
resources to a homemade database we have made.  The HTML-like resources
have tags in them like
<Filename> </Filename>   and
<Title> </Title>
Along the way, we create HTML files from these resources by parsing the
file.

Recently, a coworker of mine created a template language (as a C++
class) that, given one of these resources and a template which he
created, turns one of these resources into an actual HTML file.  (let me
know if I'm boring anyone ;)

Anyway, to get to the point, he reads in each line of the template file
and stores it in a variable of type String( the "super" string class ).
He then uses the contains() method in the Regex class to determine if
any keywords are located in it.  The problem is that, during the course
of looping through this file, a line is read in that causes Regex() to
go into an infinite loop (or darn near close).

The line read in is the following (he reads into the String() buffer he
has set up until he sees a ">"):

<dbmlitem
  usetable="/homeb/jdavis/testdir/FileFormat.icons"
  tagname="UNITEResourceFile/Descriptions/FileDescription/FileFormat"
  ending=""
  instance="1">

My coworker reads this in a buffer called line (a member of the String()
class) and then issues the following command:
    ending = line.at ((Regex)"[Ee][Nn][Dd][Ii][Nn][Gg]=\"[^\"]*\"");
to determine if the line has an ending attribute in it.

Unfortunately, this decides to go into the "infinite" loop on this
line.  The output from gdb is the following:

#0  rx_bitset_difference (size=8, a=0x11fffece0, b=0x140125b64) at
rx.c:563
#1  0x120022cc8 in compute_super_edge (rx=0x14001a7a0,
dfout=0x11ffff0e0,
    csetout=0x11fffece0, superstate=0x1400243a0, chr=224 'ý') at
rx.c:3706
#2  0x120022f80 in rx_handle_cache_miss (rx=0x14001a7a0,
super=0x1400243a0,
    chr=0 '\000', data=0x1400243a0) at rx.c:3853
#3  0x12001ef90 in rx_search (rxb=0x14001a7a0, startpos=0, range=161,
    stop=161, total_size=161, get_burst=0x1200260d0
<re_search_2_get_burst>,
    back_check=0x120026250 <re_search_2_back_check>,
    fetch_char=0x120026440 <re_search_2_fetch_char>,
app_closure=0x11ffff330,
    regs=0x140010ac0, resume_state=0x0, save_state=0x0) at rx.h:3589
#4  0x120026548 in re_search_2 (rxb=0x8,
    string1=0x11fffece0 "*", '*' <repeats 31 times>, size1=1,
    string2=0x1400243a0 "\201", size2=0, startpos=537026768, range=161,
    regs=0x140010ac0, stop=161) at rx.c:6444
#5  0x12001ccec in Regex::search (this=0xffffffffffffffff,
    s=0x140000658 '\001' <repeats 200 times>..., len=-3,
matchlen=0x11ffff3b0,
    startpos=0) at Regex.cc:105
#6  0x120019664 in String::at () at String.h:640
#7  0x12000eec0 in createHTML (dbmlLoc={rep = 0x1400071e0}, hdtLoc={
      rep = 0x140009d10}, htmlLoc={rep = 0x140009ef0}) at hdt.cxx:163
#8  0x1200109ec in main (argc=4, argv=0x11ffffce8) at createHTML.cxx:34

So, in other words, I have no idea what's crashing.

I know this is probably a fairly specific question.  If any one has any
idea what could be crashing this, I do have the entire source of the
program to send (I thought no one would want to see all of it on
Usenet).  BTW, this is using g++ 2.7.2 on OSF 4.0.
0
Comment
Question by:jessed
5 Comments
 
LVL 2

Expert Comment

by:kellyjj
ID: 1167135
An "infinite" loop is cause when you have the wrong termination cause for your loop.

for example:
for (int x=2; x==1; x++) printf("");
would never end. I have not seen your source code but something like is usually the reason I get infinite loops.  A annoying, but easily fixed prob.
0
 

Author Comment

by:jessed
ID: 1167136
This comment is completely unrelated to my question.  Yeah, most 10-year olds know what an infinite loop is!
0
 
LVL 2

Expert Comment

by:kellyjj
ID: 1167137
You said that you are getting stuck in a infinite loop.  Which you also said that you did not know where it is crashing. A 10 year old would know that if you are getting stuck in a infinite loop then that is proboly where you are crashing.    

I don't mean to insult or anything, but sometimes the most obvious answers are just too hard to see until someone points them out.  Believe me I have been there.
0
 
LVL 1

Accepted Solution

by:
peter_vc earned 200 total points
ID: 1167138
As an interesting test, I'd like to see what happens when you change the regex expression to match the actual case of the input string so that it reads ending= instead of [Ee][Nn]...

Anyway, I suspect the problem is in the attempt to match the double quotes.  Before I get into the answer, anyone doing regex stuff should stop reading this and go buy Mastering Regular Expressions by Jeffrey E.F. Friedl.  

Jeff talks about this problem extensively throughout the book.  It would appear that your infinte loop is the result of backtracking.  He goes on to explain how "unrolling the loop" avoids the neverending match (pg. 162-176).

Jeff offers this as a solution to match a double quoted string with possibly escaped characters imbeded:

"[^"\\]*(\\.[^"\\]*)*"

You will have to "translate" this to your flavor of regex.



0
 
LVL 84

Expert Comment

by:ozo
ID: 1167139
That regexp /[Ee][Nn][Dd][Ii][Nn][Gg]="[^"]*"/
doesn't seem like it ought to cause excessive backtracking.
What is the line which causes it to loop?

0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Unlike C#, C++ doesn't have native support for sealing classes (so they cannot be sub-classed). At the cost of a virtual base class pointer it is possible to implement a pseudo sealing mechanism The trick is to virtually inherit from a base class…
IntroductionThis article is the second in a three part article series on the Visual Studio 2008 Debugger.  It provides tips in setting and using breakpoints. If not familiar with this debugger, you can find a basic introduction in the EE article loc…
The goal of the video will be to teach the user the concept of local variables and scope. An example of a locally defined variable will be given as well as an explanation of what scope is in C++. The local variable and concept of scope will be relat…
The viewer will be introduced to the member functions push_back and pop_back of the vector class. The video will teach the difference between the two as well as how to use each one along with its functionality.

895 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now