Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

C++ how to profile code for regex lib?

Posted on 2012-12-30
10
Medium Priority
?
298 Views
Last Modified: 2013-04-01
I have a small test program that is testing regex library.

2 major lines are:
...
   if(xregcomp(preg,pattern,REG_ICASE|REG_EXTENDED)==0) {
            int ret = xregexec(preg, buffer, 1, &pmatch, 0);
                  std::cout << "ret=" << ret << " pmatch.rm_so=" << pmatch.rm_so << ", pmatch..rm_eo=" << pmatch.rm_eo << std::endl;
}

for some reason xregexec is taking some extra seconds that shouldn't at all!

In my project I have a reference to project xregex that is using regex.c lib from
Extended regular expression matching and search library,
   version 0.12.
...

The question: How can I profile and find out where the time is wasting here?
regex.c
xregex.h
0
Comment
Question by:longjumps
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 5
10 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 38731028
what is in pattern and buffer when it takes extra seconds?
0
 
LVL 1

Author Comment

by:longjumps
ID: 38731055
ozo, I attached both

Regex
(OR|[|][|]|AND|[&][&]|HAVING|WHERE)([[:space:]]*|/[*].*[*]/)*[('\"]*[[:space:]]*([^('\"[:space:]]+)[[:space:]]*[)'\"]*[[:space:]]*=[[:space:]]*[N]*[[:space:]]*[('\"]*[[:space:]]*\3

and buffer attached.
rule122-regex.txt
toparsw.txt
0
 
LVL 84

Expert Comment

by:ozo
ID: 38731183
There can be exponentially many ways for
([[:space:]]*|/[*].*[*]/)*
to match.   It may take a lot of time to check all of them.
0
Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

 
LVL 1

Author Comment

by:longjumps
ID: 38731197
Yes. But why this slowness happens for this buffer?
Once I take any other it is not happening?
0
 
LVL 84

Expert Comment

by:ozo
ID: 38731231
does changing
([[:space:]]*|/[*].*[*]/)*
to
([[:space:]]|/[*][^*]*[*]+([^/*][^*]*[*]+)*/)*
make a difference?
0
 
LVL 1

Author Comment

by:longjumps
ID: 38732042
I am checking your regex substitution proposal.

However why the attached specific buffer is slowing down significantly performance?
Same expression with other buffers, including MBs of the things is working super fast. Why?
0
 
LVL 84

Expert Comment

by:ozo
ID: 38732084
With REG_ICASE,
beland
matches (OR|[|][|]|AND|[&][&]|HAVING|WHERE)
We then match ([[:space:]]*|/[*].*[*]/)*
When the
[('\"]*[[:space:]]*([^('\"[:space:]]+)[[:space:]]*[)'\"]*[[:space:]]*=
fails to match, a human or sufficiently clever match engine may realize that
backtracking to find a different way to match the ([[:space:]]*|/[*].*[*]/)*
won't make a difference to the success of the entire match,
but a more naive match engine
(such as one optimized for tight loops without extra complicated checks)
would go back to try them all.
(And, in theory, it could even find an infinite number of ways to match it)
would
0
 
LVL 84

Accepted Solution

by:
ozo earned 1500 total points
ID: 38732096
[('\"]*[[:space:]]*
in combination with the preceding regexp clause looks like it could also contribute
to multiplying the number of ways to match
Perhaps you could try instead
([('\"]+[[:space:]]*)?
same for
[[:space:]]*[)'\"]*[[:space:]]*
which you might try replacing with
[[:space:]]*([)'\"]+[[:space:]])?
0
 
LVL 1

Author Comment

by:longjumps
ID: 38777197
checking solution
0
 
LVL 1

Author Closing Comment

by:longjumps
ID: 39039530
workaround solution for changes in Regex and not code.
0

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

For anyone that has accidentally used newSID with Server 2008 R2 (like I did) and hasn't been able to get the server running again because you were unlucky (as I was) and had no backups - I was able to get things working by doing a Registry Hive rec…
After seeing many questions for JRNL_WRAP_ERROR for replication failure, I thought it would be useful to write this article.
To efficiently enable the rotation of USB drives for backups, storage pools need to be created. This way no matter which USB drive is installed, the backups will successfully write without any administrative intervention. Multiple USB devices need t…
This tutorial will show how to configure a single USB drive with a separate folder for each day of the week. This will allow each of the backups to be kept separate preventing the previous day’s backup from being overwritten. The USB drive must be s…

618 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question