Solved

C++ how to profile code for regex lib?

Posted on 2012-12-30
10
295 Views
Last Modified: 2013-04-01
I have a small test program that is testing regex library.

2 major lines are:
...
   if(xregcomp(preg,pattern,REG_ICASE|REG_EXTENDED)==0) {
            int ret = xregexec(preg, buffer, 1, &pmatch, 0);
                  std::cout << "ret=" << ret << " pmatch.rm_so=" << pmatch.rm_so << ", pmatch..rm_eo=" << pmatch.rm_eo << std::endl;
}

for some reason xregexec is taking some extra seconds that shouldn't at all!

In my project I have a reference to project xregex that is using regex.c lib from
Extended regular expression matching and search library,
   version 0.12.
...

The question: How can I profile and find out where the time is wasting here?
regex.c
xregex.h
0
Comment
Question by:longjumps
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 5
10 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 38731028
what is in pattern and buffer when it takes extra seconds?
0
 
LVL 1

Author Comment

by:longjumps
ID: 38731055
ozo, I attached both

Regex
(OR|[|][|]|AND|[&][&]|HAVING|WHERE)([[:space:]]*|/[*].*[*]/)*[('\"]*[[:space:]]*([^('\"[:space:]]+)[[:space:]]*[)'\"]*[[:space:]]*=[[:space:]]*[N]*[[:space:]]*[('\"]*[[:space:]]*\3

and buffer attached.
rule122-regex.txt
toparsw.txt
0
 
LVL 84

Expert Comment

by:ozo
ID: 38731183
There can be exponentially many ways for
([[:space:]]*|/[*].*[*]/)*
to match.   It may take a lot of time to check all of them.
0
Ransomware: The New Cyber Threat & How to Stop It

This infographic explains ransomware, type of malware that blocks access to your files or your systems and holds them hostage until a ransom is paid. It also examines the different types of ransomware and explains what you can do to thwart this sinister online threat.  

 
LVL 1

Author Comment

by:longjumps
ID: 38731197
Yes. But why this slowness happens for this buffer?
Once I take any other it is not happening?
0
 
LVL 84

Expert Comment

by:ozo
ID: 38731231
does changing
([[:space:]]*|/[*].*[*]/)*
to
([[:space:]]|/[*][^*]*[*]+([^/*][^*]*[*]+)*/)*
make a difference?
0
 
LVL 1

Author Comment

by:longjumps
ID: 38732042
I am checking your regex substitution proposal.

However why the attached specific buffer is slowing down significantly performance?
Same expression with other buffers, including MBs of the things is working super fast. Why?
0
 
LVL 84

Expert Comment

by:ozo
ID: 38732084
With REG_ICASE,
beland
matches (OR|[|][|]|AND|[&][&]|HAVING|WHERE)
We then match ([[:space:]]*|/[*].*[*]/)*
When the
[('\"]*[[:space:]]*([^('\"[:space:]]+)[[:space:]]*[)'\"]*[[:space:]]*=
fails to match, a human or sufficiently clever match engine may realize that
backtracking to find a different way to match the ([[:space:]]*|/[*].*[*]/)*
won't make a difference to the success of the entire match,
but a more naive match engine
(such as one optimized for tight loops without extra complicated checks)
would go back to try them all.
(And, in theory, it could even find an infinite number of ways to match it)
would
0
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 38732096
[('\"]*[[:space:]]*
in combination with the preceding regexp clause looks like it could also contribute
to multiplying the number of ways to match
Perhaps you could try instead
([('\"]+[[:space:]]*)?
same for
[[:space:]]*[)'\"]*[[:space:]]*
which you might try replacing with
[[:space:]]*([)'\"]+[[:space:]])?
0
 
LVL 1

Author Comment

by:longjumps
ID: 38777197
checking solution
0
 
LVL 1

Author Closing Comment

by:longjumps
ID: 39039530
workaround solution for changes in Regex and not code.
0

Featured Post

Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
SYSVOL corrupted 12 160
Windows mapped drive communications - Secure? 5 69
Local admin account 3 43
Event Viewer, File access logging and tools to review 3 17
To effectively work with Diskpart on a Server Core, it is necessary to write some small batch script's, because you can't execute diskpart in a remote powershell session. To get startet, place the Diskpart batch script's into a share on your loca…
Restoring deleted objects in Active Directory has been a standard feature in Active Directory for many years, yet some admins may not know what is available.
This tutorial will walk an individual through the steps necessary to configure their installation of BackupExec 2012 to use network shared disk space. Verify that the path to the shared storage is valid and that data can be written to that location:…
This tutorial will walk an individual through locating and launching the BEUtility application to properly change the service account username and\or password in situation where it may be necessary or where the password has been inadvertently change…

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question