Solved

C++ how to profile code for regex lib?

Posted on 2012-12-30
10
294 Views
Last Modified: 2013-04-01
I have a small test program that is testing regex library.

2 major lines are:
...
   if(xregcomp(preg,pattern,REG_ICASE|REG_EXTENDED)==0) {
            int ret = xregexec(preg, buffer, 1, &pmatch, 0);
                  std::cout << "ret=" << ret << " pmatch.rm_so=" << pmatch.rm_so << ", pmatch..rm_eo=" << pmatch.rm_eo << std::endl;
}

for some reason xregexec is taking some extra seconds that shouldn't at all!

In my project I have a reference to project xregex that is using regex.c lib from
Extended regular expression matching and search library,
   version 0.12.
...

The question: How can I profile and find out where the time is wasting here?
regex.c
xregex.h
0
Comment
Question by:longjumps
  • 5
  • 5
10 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 38731028
what is in pattern and buffer when it takes extra seconds?
0
 
LVL 1

Author Comment

by:longjumps
ID: 38731055
ozo, I attached both

Regex
(OR|[|][|]|AND|[&][&]|HAVING|WHERE)([[:space:]]*|/[*].*[*]/)*[('\"]*[[:space:]]*([^('\"[:space:]]+)[[:space:]]*[)'\"]*[[:space:]]*=[[:space:]]*[N]*[[:space:]]*[('\"]*[[:space:]]*\3

and buffer attached.
rule122-regex.txt
toparsw.txt
0
 
LVL 84

Expert Comment

by:ozo
ID: 38731183
There can be exponentially many ways for
([[:space:]]*|/[*].*[*]/)*
to match.   It may take a lot of time to check all of them.
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 1

Author Comment

by:longjumps
ID: 38731197
Yes. But why this slowness happens for this buffer?
Once I take any other it is not happening?
0
 
LVL 84

Expert Comment

by:ozo
ID: 38731231
does changing
([[:space:]]*|/[*].*[*]/)*
to
([[:space:]]|/[*][^*]*[*]+([^/*][^*]*[*]+)*/)*
make a difference?
0
 
LVL 1

Author Comment

by:longjumps
ID: 38732042
I am checking your regex substitution proposal.

However why the attached specific buffer is slowing down significantly performance?
Same expression with other buffers, including MBs of the things is working super fast. Why?
0
 
LVL 84

Expert Comment

by:ozo
ID: 38732084
With REG_ICASE,
beland
matches (OR|[|][|]|AND|[&][&]|HAVING|WHERE)
We then match ([[:space:]]*|/[*].*[*]/)*
When the
[('\"]*[[:space:]]*([^('\"[:space:]]+)[[:space:]]*[)'\"]*[[:space:]]*=
fails to match, a human or sufficiently clever match engine may realize that
backtracking to find a different way to match the ([[:space:]]*|/[*].*[*]/)*
won't make a difference to the success of the entire match,
but a more naive match engine
(such as one optimized for tight loops without extra complicated checks)
would go back to try them all.
(And, in theory, it could even find an infinite number of ways to match it)
would
0
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 38732096
[('\"]*[[:space:]]*
in combination with the preceding regexp clause looks like it could also contribute
to multiplying the number of ways to match
Perhaps you could try instead
([('\"]+[[:space:]]*)?
same for
[[:space:]]*[)'\"]*[[:space:]]*
which you might try replacing with
[[:space:]]*([)'\"]+[[:space:]])?
0
 
LVL 1

Author Comment

by:longjumps
ID: 38777197
checking solution
0
 
LVL 1

Author Closing Comment

by:longjumps
ID: 39039530
workaround solution for changes in Regex and not code.
0

Featured Post

Problems using Powershell and Active Directory?

Managing Active Directory does not always have to be complicated.  If you are spending more time trying instead of doing, then it's time to look at something else. For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

If you migrate a Terminal Server licenses server inside the 2008 server family, you can takte advantage of the build-in migration tool. If you like to migrate an older 2003 Server (and the installed client CALs) to a 2008 R2 server for example, you …
Sometimes drives fill up and we don't know why.  If you don't understand the best way to use the tools available, you may end up being stumped as to why your drive says it's not full when you have no space left!  Here's how you can find out...
This tutorial will show how to push an installation of Backup Exec to an additional server in both 2012 and 2014 versions of the software. Click on the Backup Exec button in the upper left corner. From here, select Installation and Licensing, then I…
This tutorial will walk an individual through the steps necessary to configure their installation of BackupExec 2012 to use network shared disk space. Verify that the path to the shared storage is valid and that data can be written to that location:…

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question