Complicated regular expression

From the text i need to extract:
deadlock id, in this case is '121665'

From Node 1:
PAG: '7:1:2909316'
Statement Type, in this case: 'DELETE Line #: 48'
Event: 'PA_InsertverificationResult'
Mode: 'IX' <- here may be 1 or 2 characters length

From Node 2:
PAG: '7:1:2925190'
Statement Type, in this case: 'DELETE Line #: 48'
Event: 'PA_InsertverificationResult'
Mode: 'IX' <- here may be 1 or 2 characters length

The trickiest part is that i need one regexp to match full string and to extract from a file all occurencies.


2009-05-11 10:50:48.93 spid2     Starting deadlock search 121665
2009-05-11 10:50:48.93 spid2     Target Resource Owner:
2009-05-11 10:50:48.93 spid2      ResType:LockOwner Stype:'OR' Mode: IX SPID:139 ECID:0 Ec:(0x206DB590) Value:0x4a0baac0
2009-05-11 10:50:48.93 spid2      Node:1	 ResType:LockOwner Stype:'OR' Mode: IX SPID:139 ECID:0 Ec:(0x206DB590) Value:0x4a0baac0
2009-05-11 10:50:48.93 spid2      Node:2	 ResType:LockOwner Stype:'OR' Mode: IX SPID:163 ECID:0 Ec:(0x2A429588) Value:0x4c687a00
2009-05-11 10:50:48.93 spid2      Cycle:	 ResType:LockOwner Stype:'OR' Mode: IX SPID:139 ECID:0 Ec:(0x206DB590) Value:0x4a0baac0
2009-05-11 10:50:48.93 spid2     
2009-05-11 10:50:48.93 spid2     
2009-05-11 10:50:48.93 spid2     Deadlock cycle was encountered .... verifying cycle
2009-05-11 10:50:48.93 spid2      Node:1	 ResType:LockOwner Stype:'OR' Mode: IX SPID:139 ECID:0 Ec:(0x206DB590) Value:0x4a0baac0 Cost:(0/3C)
2009-05-11 10:50:48.93 spid2      Node:2	 ResType:LockOwner Stype:'OR' Mode: IX SPID:163 ECID:0 Ec:(0x2A429588) Value:0x4c687a00 Cost:(0/3C)
2009-05-11 10:50:48.93 spid2      Cycle:	 ResType:LockOwner Stype:'OR' Mode: IX SPID:139 ECID:0 Ec:(0x206DB590) Value:0x4a0baac0 Cost:(0/3C)
2009-05-11 10:50:48.93 spid2     
2009-05-11 10:50:48.93 spid2     
Deadlock encountered .... Printing deadlock information
2009-05-11 10:50:48.93 spid2     
2009-05-11 10:50:48.93 spid2     Wait-for graph
2009-05-11 10:50:48.93 spid2     
2009-05-11 10:50:48.93 spid2     Node:1
2009-05-11 10:50:48.93 spid2     PAG: 7:1:2909316               CleanCnt:2 Mode: SIU Flags: 0x2
2009-05-11 10:50:48.93 spid2      Grant List 0::
2009-05-11 10:50:48.93 spid2        Owner:0x5e8b0280 Mode: S        Flg:0x0 Ref:0 Life:00000001 SPID:163 ECID:0
2009-05-11 10:50:48.93 spid2        SPID: 163 ECID: 0 Statement Type: DELETE Line #: 48
2009-05-11 10:50:48.93 spid2        Input Buf: RPC Event: PA_InsertVerificationResult ;1
2009-05-11 10:50:48.93 spid2      Grant List 1::
2009-05-11 10:50:48.93 spid2      Requested By: 
2009-05-11 10:50:48.93 spid2        ResType:LockOwner Stype:'OR' Mode: IX SPID:139 ECID:0 Ec:(0x206DB590) Value:0x4a0baac0 Cost:(0/3C)
2009-05-11 10:50:48.93 spid2     
2009-05-11 10:50:48.93 spid2     Node:2
2009-05-11 10:50:48.93 spid2     PAG: 7:1:2925190               CleanCnt:2 Mode: SIU Flags: 0x2
2009-05-11 10:50:48.93 spid2      Grant List 0::
2009-05-11 10:50:48.93 spid2      Grant List 1::
2009-05-11 10:50:48.93 spid2        Owner:0x4e9948c0 Mode: S        Flg:0x0 Ref:0 Life:00000001 SPID:139 ECID:0
2009-05-11 10:50:48.93 spid2        SPID: 139 ECID: 0 Statement Type: DELETE Line #: 48
2009-05-11 10:50:48.93 spid2        Input Buf: RPC Event: PA_InsertVerificationResult ;1
2009-05-11 10:50:48.93 spid2      Requested By: 
2009-05-11 10:50:48.93 spid2        ResType:LockOwner Stype:'OR' Mode: IX SPID:163 ECID:0 Ec:(0x2A429588) Value:0x4c687a00 Cost:(0/3C)
2009-05-11 10:50:48.93 spid2     Victim Resource Owner:
2009-05-11 10:50:48.93 spid2      ResType:LockOwner Stype:'OR' Mode: IX SPID:163 ECID:0 Ec:(0x2A429588) Value:0x4c687a00 Cost:(0/3C)

Open in new window

BornForCodeAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

abelCommented:
Not sure if I understand. There's only one occurrence for 121665, but you say you need all occurrences. Also, in the two snippets in you text, the sought-after string is not found.

If I assume that the ID is all numeric and that it is always preceded by "Starting deadlock search " and followed by a newline, this should find your ID in #1:

/Starting deadlock search ([0-9]+)\n/m
the "m" is for multiline matching. Depending on the language you use, you'll need to set that option in a different way.

-- Abel --
0
BornForCodeAuthor Commented:
And the rest of the parameters? :)
0
abelCommented:
I don't think I understand... hmm... looking again at your explanation, these are not snippets but show what you want from that text dump? I guess so...

Also, and very important: what language or regex flavor are you using? This is going to be a multiline regex and many engines treat that differently.
0
BornForCodeAuthor Commented:
Probably i will use C# for it.
Indeed is a multiline, from dat text i have to get the needed values (liste above) and this test is repeating itself in a large log file.
0
abelCommented:
C# it is ;-)

Here's a possible solution. I had to throw away your requirement for it being one regular expression. Though that idea is possible, it will make the overall matching process dreadfully slow (you can only do it with non-greedy regex, which is an order of magnitude slower than greedy regexes). We still need non-greediness (there's another option: to also split by nodes, which you may have to anyway if there can be any number of nodes).

Here's what I did: I changed your string for correcting the line-ends regex-friendly. Then I created a splittable string, which I split. This creates chunks of log-records which all start with your first line.

Now matching becomes a breeze. It is still not trivial if you're not acquainted with regular expressions, but at least it is quite readable. The output of the below program snippet with your source above is:

Deadlock id: 121665
Node 1 PAG : 7:1:2909316
Node 1 stmt: DELETE Line #: 48
Node 1 evt : PA_InsertVerificationResult
Node 1 mode: IX
Node 2 PAG : 7:1:2925190
Node 2 stmt: DELETE Line #: 48
Node 2 evt : PA_InsertVerificationResult
Node 2 mode: IX
There's room for improvement with the below code, but on a large log (< 200MB) it should perform reasonably well. If it becomes larger, you should consider a different approach.

Improvements can be achieved with adding another split per node (which makes it possible to use quick greedy regexes) and, for performance, by not reading everything at once. But that would require quite some extra coding.

-- Abel --

PS: the match for "mode" was not clear. I assumed it was preceded by "ResType". There are two other modes in each node-section.

string theLog = new StreamReader(File.Open("Data/Q24441162.txt", FileMode.Open)).ReadToEnd();
 
// normalize line endings and split
theLog = theLog.Replace("\r\n", "\n");
Regex re = new Regex("^(.*Starting deadlock search [0-9]+)$", RegexOptions.Multiline);
theLog = re.Replace(theLog, "__SPLIT__$1");
string[] logBlocks = theLog.Split(new string[] { "__SPLIT__" }, StringSplitOptions.RemoveEmptyEntries);
 
// the actual regex we are going to use
re = new Regex("Starting deadlock search ([0-9]+)" +
    ".*Wait-for graph.*Node:1\n[^\n]+PAG: ([0-9:]+)" +
    ".*?Statement Type: ([^\n]+)" +
    ".*?Event: ([A-Za-z_0-9-]+)" +
    ".*?ResType:[^\n]+Mode: ([A-Za-z0-9]{1,2})" +
    ".*?Node:2\n[^\n]+PAG: ([0-9:]+)" +
    ".*?Statement Type: ([^\n]+)" +
    ".*?Event: ([A-Za-z_0-9-]+)" +
    ".*?ResType:[^\n]+Mode: ([A-Za-z0-9]{1,2})", 
    RegexOptions.Singleline | RegexOptions.Multiline | RegexOptions.Compiled);
foreach (string logBlock in logBlocks)
{
    foreach (Match match in re.Matches(logBlock))
    {
        Debug.WriteLine("Deadlock id: " + match.Groups[1]);
        Debug.WriteLine("Node 1 PAG : " + match.Groups[2]);
        Debug.WriteLine("Node 1 stmt: " + match.Groups[3]);
        Debug.WriteLine("Node 1 evt : " + match.Groups[4]);
        Debug.WriteLine("Node 1 mode: " + match.Groups[5]);
        Debug.WriteLine("Node 2 PAG : " + match.Groups[6]);
        Debug.WriteLine("Node 2 stmt: " + match.Groups[7]);
        Debug.WriteLine("Node 2 evt : " + match.Groups[8]);
        Debug.WriteLine("Node 2 mode: " + match.Groups[9]);
    }
}

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Regular Expressions

From novice to tech pro — start learning today.