Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17


Problems w/ String, Regex, and infinite loop

Posted on 1997-08-12
Medium Priority
Last Modified: 2010-07-27
The current project I work on deals with submitting certain HTML-like
resources to a homemade database we have made.  The HTML-like resources
have tags in them like
<Filename> </Filename>   and
<Title> </Title>
Along the way, we create HTML files from these resources by parsing the

Recently, a coworker of mine created a template language (as a C++
class) that, given one of these resources and a template which he
created, turns one of these resources into an actual HTML file.  (let me
know if I'm boring anyone ;)

Anyway, to get to the point, he reads in each line of the template file
and stores it in a variable of type String( the "super" string class ).
He then uses the contains() method in the Regex class to determine if
any keywords are located in it.  The problem is that, during the course
of looping through this file, a line is read in that causes Regex() to
go into an infinite loop (or darn near close).

The line read in is the following (he reads into the String() buffer he
has set up until he sees a ">"):


My coworker reads this in a buffer called line (a member of the String()
class) and then issues the following command:
    ending = ((Regex)"[Ee][Nn][Dd][Ii][Nn][Gg]=\"[^\"]*\"");
to determine if the line has an ending attribute in it.

Unfortunately, this decides to go into the "infinite" loop on this
line.  The output from gdb is the following:

#0  rx_bitset_difference (size=8, a=0x11fffece0, b=0x140125b64) at
#1  0x120022cc8 in compute_super_edge (rx=0x14001a7a0,
    csetout=0x11fffece0, superstate=0x1400243a0, chr=224 'ý') at
#2  0x120022f80 in rx_handle_cache_miss (rx=0x14001a7a0,
    chr=0 '\000', data=0x1400243a0) at rx.c:3853
#3  0x12001ef90 in rx_search (rxb=0x14001a7a0, startpos=0, range=161,
    stop=161, total_size=161, get_burst=0x1200260d0
    back_check=0x120026250 <re_search_2_back_check>,
    fetch_char=0x120026440 <re_search_2_fetch_char>,
    regs=0x140010ac0, resume_state=0x0, save_state=0x0) at rx.h:3589
#4  0x120026548 in re_search_2 (rxb=0x8,
    string1=0x11fffece0 "*", '*' <repeats 31 times>, size1=1,
    string2=0x1400243a0 "\201", size2=0, startpos=537026768, range=161,
    regs=0x140010ac0, stop=161) at rx.c:6444
#5  0x12001ccec in Regex::search (this=0xffffffffffffffff,
    s=0x140000658 '\001' <repeats 200 times>..., len=-3,
    startpos=0) at
#6  0x120019664 in String::at () at String.h:640
#7  0x12000eec0 in createHTML (dbmlLoc={rep = 0x1400071e0}, hdtLoc={
      rep = 0x140009d10}, htmlLoc={rep = 0x140009ef0}) at hdt.cxx:163
#8  0x1200109ec in main (argc=4, argv=0x11ffffce8) at createHTML.cxx:34

So, in other words, I have no idea what's crashing.

I know this is probably a fairly specific question.  If any one has any
idea what could be crashing this, I do have the entire source of the
program to send (I thought no one would want to see all of it on
Usenet).  BTW, this is using g++ 2.7.2 on OSF 4.0.
Question by:jessed
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions

Expert Comment

ID: 1167135
An "infinite" loop is cause when you have the wrong termination cause for your loop.

for example:
for (int x=2; x==1; x++) printf("");
would never end. I have not seen your source code but something like is usually the reason I get infinite loops.  A annoying, but easily fixed prob.

Author Comment

ID: 1167136
This comment is completely unrelated to my question.  Yeah, most 10-year olds know what an infinite loop is!

Expert Comment

ID: 1167137
You said that you are getting stuck in a infinite loop.  Which you also said that you did not know where it is crashing. A 10 year old would know that if you are getting stuck in a infinite loop then that is proboly where you are crashing.    

I don't mean to insult or anything, but sometimes the most obvious answers are just too hard to see until someone points them out.  Believe me I have been there.

Accepted Solution

peter_vc earned 400 total points
ID: 1167138
As an interesting test, I'd like to see what happens when you change the regex expression to match the actual case of the input string so that it reads ending= instead of [Ee][Nn]...

Anyway, I suspect the problem is in the attempt to match the double quotes.  Before I get into the answer, anyone doing regex stuff should stop reading this and go buy Mastering Regular Expressions by Jeffrey E.F. Friedl.  

Jeff talks about this problem extensively throughout the book.  It would appear that your infinte loop is the result of backtracking.  He goes on to explain how "unrolling the loop" avoids the neverending match (pg. 162-176).

Jeff offers this as a solution to match a double quoted string with possibly escaped characters imbeded:


You will have to "translate" this to your flavor of regex.

LVL 84

Expert Comment

ID: 1167139
That regexp /[Ee][Nn][Dd][Ii][Nn][Gg]="[^"]*"/
doesn't seem like it ought to cause excessive backtracking.
What is the line which causes it to loop?


Featured Post

[Webinar] Lessons on Recovering from Petya

Skyport is working hard to help customers recover from recent attacks, like the Petya worm. This work has brought to light some important lessons. New malware attacks like this can take down your entire environment. Learn from others mistakes on how to prevent Petya like worms.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Often, when implementing a feature, you won't know how certain events should be handled at the point where they occur and you'd rather defer to the user of your function or class. For example, a XML parser will extract a tag from the source code, wh…
Introduction This article is a continuation of the C/C++ Visual Studio Express debugger series. Part 1 provided a quick start guide in using the debugger. Part 2 focused on additional topics in breakpoints. As your assignments become a little more …
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…
The goal of the video will be to teach the user the difference and consequence of passing data by value vs passing data by reference in C++. An example of passing data by value as well as an example of passing data by reference will be be given. Bot…

688 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question