?
Solved

C++ how to profile code for regex lib?

Posted on 2012-12-30
10
Medium Priority
?
297 Views
Last Modified: 2013-04-01
I have a small test program that is testing regex library.

2 major lines are:
...
   if(xregcomp(preg,pattern,REG_ICASE|REG_EXTENDED)==0) {
            int ret = xregexec(preg, buffer, 1, &pmatch, 0);
                  std::cout << "ret=" << ret << " pmatch.rm_so=" << pmatch.rm_so << ", pmatch..rm_eo=" << pmatch.rm_eo << std::endl;
}

for some reason xregexec is taking some extra seconds that shouldn't at all!

In my project I have a reference to project xregex that is using regex.c lib from
Extended regular expression matching and search library,
   version 0.12.
...

The question: How can I profile and find out where the time is wasting here?
regex.c
xregex.h
0
Comment
Question by:longjumps
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 5
10 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 38731028
what is in pattern and buffer when it takes extra seconds?
0
 
LVL 1

Author Comment

by:longjumps
ID: 38731055
ozo, I attached both

Regex
(OR|[|][|]|AND|[&][&]|HAVING|WHERE)([[:space:]]*|/[*].*[*]/)*[('\"]*[[:space:]]*([^('\"[:space:]]+)[[:space:]]*[)'\"]*[[:space:]]*=[[:space:]]*[N]*[[:space:]]*[('\"]*[[:space:]]*\3

and buffer attached.
rule122-regex.txt
toparsw.txt
0
 
LVL 84

Expert Comment

by:ozo
ID: 38731183
There can be exponentially many ways for
([[:space:]]*|/[*].*[*]/)*
to match.   It may take a lot of time to check all of them.
0
Has Powershell sent you back into the Stone Age?

If managing Active Directory using Windows Powershell® is making you feel like you stepped back in time, you are not alone.  For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why.

 
LVL 1

Author Comment

by:longjumps
ID: 38731197
Yes. But why this slowness happens for this buffer?
Once I take any other it is not happening?
0
 
LVL 84

Expert Comment

by:ozo
ID: 38731231
does changing
([[:space:]]*|/[*].*[*]/)*
to
([[:space:]]|/[*][^*]*[*]+([^/*][^*]*[*]+)*/)*
make a difference?
0
 
LVL 1

Author Comment

by:longjumps
ID: 38732042
I am checking your regex substitution proposal.

However why the attached specific buffer is slowing down significantly performance?
Same expression with other buffers, including MBs of the things is working super fast. Why?
0
 
LVL 84

Expert Comment

by:ozo
ID: 38732084
With REG_ICASE,
beland
matches (OR|[|][|]|AND|[&][&]|HAVING|WHERE)
We then match ([[:space:]]*|/[*].*[*]/)*
When the
[('\"]*[[:space:]]*([^('\"[:space:]]+)[[:space:]]*[)'\"]*[[:space:]]*=
fails to match, a human or sufficiently clever match engine may realize that
backtracking to find a different way to match the ([[:space:]]*|/[*].*[*]/)*
won't make a difference to the success of the entire match,
but a more naive match engine
(such as one optimized for tight loops without extra complicated checks)
would go back to try them all.
(And, in theory, it could even find an infinite number of ways to match it)
would
0
 
LVL 84

Accepted Solution

by:
ozo earned 1500 total points
ID: 38732096
[('\"]*[[:space:]]*
in combination with the preceding regexp clause looks like it could also contribute
to multiplying the number of ways to match
Perhaps you could try instead
([('\"]+[[:space:]]*)?
same for
[[:space:]]*[)'\"]*[[:space:]]*
which you might try replacing with
[[:space:]]*([)'\"]+[[:space:]])?
0
 
LVL 1

Author Comment

by:longjumps
ID: 38777197
checking solution
0
 
LVL 1

Author Closing Comment

by:longjumps
ID: 39039530
workaround solution for changes in Regex and not code.
0

Featured Post

Problems using Powershell and Active Directory?

Managing Active Directory does not always have to be complicated.  If you are spending more time trying instead of doing, then it's time to look at something else. For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

I was prompted to write this article after the recent World-Wide Ransomware outbreak. For years now, System Administrators around the world have used the excuse of "Waiting a Bit" before applying Security Patch Updates. This type of reasoning to me …
Resolving an irritating Remote Desktop connection that stops your saved credentials from being used.
This tutorial will walk an individual through the steps necessary to configure their installation of BackupExec 2012 to use network shared disk space. Verify that the path to the shared storage is valid and that data can be written to that location:…
To efficiently enable the rotation of USB drives for backups, storage pools need to be created. This way no matter which USB drive is installed, the backups will successfully write without any administrative intervention. Multiple USB devices need t…
Suggested Courses

764 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question