• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 424
  • Last Modified:

Regular expression optimization

Experts:
I am parsing the usual SIP (Session Initiation Protocol) logs.  I write a regular expression to extract certain values which works fine.
But with an unexpected log, would you believe it takes for ever in recursion that finding out the pattern would not work takes forever.


Exceptional Record:
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
<<                                    BYE                                    <<<
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
SipTrxnManager::TrxnMsgReceivedEvHandler
sip:ujunwaonline%99yahoo.com@99.993.99.999:99995;transport=tls;pin=9999CC3A SIP/2.0
From: "iokongwu@company.com"<sip:iokongwu%99company.com@sip.voip.coname.com>;tag=9999999999
To: "ujuawnoilen@yahoo.com"<sip:ujunwaonline%99yahoo.com@sip.voip.coname.com>;tag=9999999999
Call-ID: s...
...aderjmimaupqdpyafutgw@sip.voip.coname.com
CSeq: 4 BYE
Via: SIP/2.0/TCP 992.99.994.99:9999;branch=z9hG4bK-9999999999992-999999999
Route: <sip:sip-robinhood5.voip5.coname:9999;transport=TCP;lr>;context=routing
Record-Route: <sip:sip-robinhood7.voip7.coname:9999;lr>
Record-Route: <sip:iokongwu%99company.com@sip.voip.coname.com;lr>
Route: <sip:ujunwaonline%99yahoo.com@sip.voip.coname.com;lr>
Contact: "iokongwu@company.com"<sip:iokongwu%99company.com@99.995.997.99:99998;transport=tls;pin=FFFF9999>
User-Agent: Video Chat Client v1.0.3
X-Completed: 7PP7 9999 9999 C0A99998
Reason: CSR;cause=1;text=Success;AFE=7PP7 9999 9999 C0A99998;VfxRxBitrate=991.99;VfxTxBitrate=998.99
Max-Forwards: 99
Content-Length: 0

Open in new window


Normal Record:
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
<<                                    BYE                                    <<<
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
SipTrxnManager::TrxnMsgReceivedEvHandler
sip:ujunwaonline%99yahoo.com@99.993.99.999:99995;transport=tls;pin=9999CC3A SIP/2.0
From: "iookwu@company.com"<sip:iokongwu%99company.com@sip.voip.coname.com>;tag=9999999999
To: "ujunwaonline@yahoo.com"<sip:ujunwaonline%99yahoo.com@sip.voip.coname.com>;tag=9999999999
Call-ID: s...
...adieulcroxxacela@sip.voip.coname.com
CSeq: 5 BYE
Via: SIP/2.0/TCP 992.99.994.99:9999;branch=z9hG4bK-9999999999999999999999
Route: <sip:sip-roundrobin5.voip5.coname:9999;transport=TCP;lr>;context=routing
Record-Route: <sip:sip-roundrobin7.voip7.coname:9999;lr>
Record-Route: <sip:iokongwu%99company.com@sip.voip.coname.com;lr>
Route: <sip:ujunwaonline%99yhaoo.com@sip.voip.coname.com;lr>
Contact: "iokongwu@company.com"<sip:iokongwu%99company.com@99.995.997.99:99998;transport=tls;pin=FFFF9999>
User-Agent: Video Chat Client v1.0.3
X-Completed: 7PP7 99CA 99BB C0A99998
Reason: CSR;cause=1;text=Success;AFE=TPPT 1C26 01CF 000B 1A42 0001 0004F3E 0A89E8BC <;VfxRxBitrate=206.93;VfxTxBitrate=0.00
Max-Forwards: 99
Content-Length: 0

Open in new window


Regular Expression:
(@values) = $line =~ m/[^F]*?
                 From[:][^<]+<sip:([^>]+).*?           #1. Orig_Device_PIN (From)
                 To[:][^<]+<sip:([^>]+).*?             #2. Dest_Device_PIN (To)
                 Call-ID[:]\s*(\S+).*?                 #3. Call-ID
                 User-Agent[:].*?([0-9.]+).*?          #4. Agent version
                 Reason[:].*?cause=(\d+).*?            #5. Term_Cause
                 AFE=\S(\S\S)\S                        #6. Call_Type
                     \s(\w+)                           #7. Call Setup Time*
                     \s(\w+)                           #8. ICE check time*
                     \s(\w+)                           #9. Start to Invite Time*
                     \s(\w+)                          #10. Invite to 180*
                     \s(\w+)                          #11. Time 180 to 200*
                     \s(\w+)                          #12. Total Call Time*
                     [^V]*VfxRxBitrate=([.0-9]+)      #13. VfxRxBitrate
                     /sx;

Open in new window


Questions:

1.  How can I make it fail the unexpected pattern immediately?
2.  Is there an "optimization tool" like the database query optimization tools that would show me recursions, etc with suggestions to improve it.
3.  Suppose I matched a pattern with $1 , $2 , etc.  Is there a special variable for array equivalent that would contain all the matches?
0
farzanj
Asked:
farzanj
  • 7
  • 5
4 Solutions
 
ozoCommented:
(@values) = $line =~ m/[^F]*?
                 From[:][^<]+<sip:([^>]+)>.*?           #1. Orig_Device_PIN (From)
                 To[:][^<]+<sip:([^>]+)>.*?             #2. Dest_Device_PIN (To)
                 Call-ID[:]\s*(\S+)\s.*?                 #3. Call-ID
                 User-Agent[:].*?([0-9.]+).*?          #4. Agent version
                 Reason[:].*?cause=(\d+).*?            #5. Term_Cause
                 AFE=\S(\S\S)\S                        #6. Call_Type
                     \s(\w+)                           #7. Call Setup Time*
                     \s(\w+)                           #8. ICE check time*
                     \s(\w+)                           #9. Start to Invite Time*
                     \s(\w+)                          #10. Invite to 180*
                     \s(\w+)                          #11. Time 180 to 200*
                     \s(\w+)                          #12. Total Call Time*
                     [^V]*VfxRxBitrate=([.0-9]+)      #13. VfxRxBitrate
                     /sx;
0
 
ozoCommented:
3.  Suppose I matched a pattern with $1 , $2 , etc.  Is there a special variable for array equivalent that would contain all the matches?
besides @values
$1 is the same as substr($line, $-[1], $+[1] - $-[1]) or $values[0]
$2 is the same as substr($line, $-[2], $+[2] - $-[2]) or $values[1]
$3 is the same as substr($line, $-[3], $+[3] - $-[3]) or $values[2]
0
 
farzanjAuthor Commented:
Ozo, what did you change in my regex?  Everything appears to be exactly the same.

Second, I was thinking that I could use a match in if condition and if I get the match, I should proceed otherwise, should skip printing the output.
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
farzanjAuthor Commented:
Is there a command to analyze the regular expression?  How many recursions it does, etc?
0
 
ozoCommented:
<sip:([^>]+).*? tries to match
[^>]+  = ujunwaonline%99yahoo.com@sip.voip.coname.com
.*? = >;tag=9999999999
when that fails to produce a match, it tries
[^>]+  = ujunwaonline%99yahoo.com@sip.voip.coname.co
.*? = m>;tag=9999999999
when that fails to produce a match, it tries
[^>]+  = ujunwaonline%99yahoo.com@sip.voip.coname.c
.*? = om>;tag=9999999999
...
similarly for the other <sip:([^>]+).*?
in all possible combinations, before giving up


0
 
ozoCommented:
I added > after ([^>]+) so that it can only match as
[^>]+  = ujunwaonline%99yahoo.com@sip.voip.coname.com
>.*? = >;tag=9999999999
0
 
farzanjAuthor Commented:
Solution?  Sorry, could not follow you first post.
0
 
ozoCommented:
Second, I was thinking that I could use a match in if condition and if I get the match, I should proceed otherwise, should skip printing the output
like
if (@values) = $line =~ m/[^F]*?
..
0
 
farzanjAuthor Commented:
Ok, that is good :)

So there is no RE optimization tools or commands that does the analysis?
0
 
farzanjAuthor Commented:
Thanks for your time, Ma'am.
0
 
farzanjAuthor Commented:
No optimization tools :(

No other ways to prevent recursion :(
0
 
farzanjAuthor Commented:
0

Featured Post

Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

  • 7
  • 5
Tackle projects and never again get stuck behind a technical roadblock.
Join Now