Solved

Regular Expression hanging when used within .NET

Posted on 2008-10-26
8
515 Views
Last Modified: 2012-08-14
Hi There All,

I am reasonably new to Regex's and have hit a peculiar problem.  I have built a Regular Expression using RegexBuddy (www.regexbuddy.com) and have got it to pass against a test string correctly.

The RegEx used is as follows:
P[ ]\d{4}[ ](?:\d|[ ]){3}[ ]\d{2}[ ](?:\w|[ ]){8}[ ](?:\d|[ ]){7}[ ]\d{8}[ ]\d{6}[ ]\d{6}[ ](?:.|[ ]){30}[ ](?:.|[ ]){35}[ ](?:\d|[ ]){8}[ ]\w{1}

A test string which will pass is as follows:
P 0001 071 14 201         1495 20050401 171224 003410 Internet                       Smith,Mr                                  15 M

The RegEx is for testing a message received via TCP to ensure it meets a specific format.

Now my delima is, that this RegEx passes ok when matched within C# .NET, however when I wish to test the character code 0x02 at the start of the RegEx and a few other character codes at the end of the RegEx, when attempting to run the IsMatch() function to match, Visual Studio Debugging hangs and it seems the RegEx is never evaluated and you have to terminate the process (by ending debug).  Please note I am using the RegexBuddy "use" tab which outputs the correct syntax for C# as all backslashes need to be doubled and line begins with \A and ends with \z.

The RegEx which fails to execute is:
\x02P[ ]\d{4}[ ](?:\d|[ ]){3}[ ]\d{2}[ ](?:\w|[ ]){8}[ ](?:\d|[ ]){7}[ ]\d{8}[ ]\d{6}[ ]\d{6}[ ](?:.|[ ]){30}[ ](?:.|[ ]){35}[ ](?:\d|[ ]){8}[ ]\w{1}\x03\x13\x10

You can see the only changes are the \x02 at the beginning of the RegEx and the \x03\x13 and \x10 at the end of the RegEx.  However please be aware that this RegEx DOES evaluate correctly using RegexBuddy with the test string I gave you above (if it does not pass in RegexBuddy for you, remove \x13 + \x10), just not in .NET, however I feel even though it is passing in RegexBuddy, it must be in bad form.

I have not given you guys a string with the extended characters added as it does not make a difference as the RegEx will not even evaluate, but if required simply just add them to the string.

This is very odd, I'm sure I must need some extra characters in the RegEx or something.
Thanks for letting me take up your time and hope you can help! =)

Graham


0
Comment
Question by:Votech
  • 6
  • 2
8 Comments
 
LVL 1

Author Comment

by:Votech
ID: 22810239
RegEx snippet without browser interference attached.
regex.txt
0
 
LVL 1

Author Comment

by:Votech
ID: 22810260
OK, it appears actually .NET isn't even finding matches with my first Regex; however RegexBuddy is!

Is there differences between the two engines?  Need to find out what is wrong with my non-character code RegEx also.

Thank you kindly.
0
 
LVL 1

Author Comment

by:Votech
ID: 22810345
My apologies, my first Regex is working in .NET now, I needed the C# code export from RegexBuddy which was 'Test if the regex matches (part of) a string' as for now I removed the character codes from the Regex.

Back to original problem now.
0
 
LVL 84

Expert Comment

by:ozo
ID: 22810387
(?:.|[ ]){35} may try to match in as many as 2^35 different possible ways before giving up
why not just .{35}
0
Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

 
LVL 1

Author Comment

by:Votech
ID: 22810437
True.  as a dot matches any character, including spaces.

That helps, I will make the change.
0
 
LVL 1

Author Comment

by:Votech
ID: 22817913
Can anybody help?

It seems if I change "Smith,Mr" to "Smith0,Mr" adding one more character and attempting to make the RegEx not pass, it just hangs......
0
 
LVL 1

Author Comment

by:Votech
ID: 22818323
Wow, ozo, making that change alone seemed to have fixed everything.  I wasn't going to implement that just yet as I thought it was just an efficiency change but that was the root of all my issues.  It seemed that my RegEx was hanging whenever it could not be matched; not just on extended ASCII codes.

If nobody has any suggestions on my RegEx I will award these points in a couple of days once I can guarantee it is rock solid.

Thanks ozo.
Hopefully you can reply and let me know your thoughts.......
0
 
LVL 84

Accepted Solution

by:
ozo earned 500 total points
ID: 22819096
It is an efficiency change.
I expect that if you let the original RegEx run for a few trillion times as long, it will eventually report a failure to match,
but before that, it has to try to see it it can match as
........................................
.......................................[]
......................................[ ].
......................................[ ][ ]
.....................................[ ]..
.....................................[ ].[ ]
.....................................[ ][ ].
....................................[ ][ ][ ]
...................................[ ]...
...................................[ ]..[ ]
...................................[ ].[ ].
etc.
etc.
before it can finally give up and declare that there is no possible match with any combination of . matches and [ ] matches
0

Featured Post

Do You Know the 4 Main Threat Actor Types?

Do you know the main threat actor types? Most attackers fall into one of four categories, each with their own favored tactics, techniques, and procedures.

Join & Write a Comment

This document covers how to connect to SQL Server and browse its contents.  It is meant for those new to Visual Studio and/or working with Microsoft SQL Server.  It is not a guide to building SQL Server database connections in your code.  This is mo…
A long time ago (May 2011), I have written an article showing you how to create a DLL using Visual Studio 2005 to be hosted in SQL Server 2005. That was valid at that time and it is still valid if you are still using these versions. You can still re…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

10 Experts available now in Live!

Get 1:1 Help Now