Solved

RegEx Match issue - match not happening

Posted on 2008-10-15
7
871 Views
Last Modified: 2008-10-15
Hi

I have a function that build a RegEx and applies it against some incoming data.

The incoming data:
incomingData = @"UNH+00000154600001+CONTRL:D:3:UN+CONTRL'
UCI+00000000000443+ETRADEX+SARS+7'
UCM+00000044300001+CUSDEC:D:96B:UN:ZZZ01+7'
UNT+4+00000154600004'
UNT+5+00000154600005'
UNT+6+00000154600006'
UNZ+1+000001546'";

The RegEx created:
pattern = "^U\s*N\s*H\s*\+(?<unhCode1>.*)\+(?<unhMessageType>.*):(?<unhShortCode>.
*):(?<unhVersion>.*):(?<unhControlBody>.*)\+(?<unhType>.*)'$"

Now when I try go:

 reg = new Regex(pattern);
 if (!reg.IsMatch(incomingData))
            {
                throw new Exception("The Incoming Data differs from the pattern ");
             }

Now I understand why this is happening - because the match is trying to compare the pattern to the WHOLE incoming String as far as I understand, and the pattern only matches the first part of the incoming data.

What I want it to do is find the first occurance of the pattern, in the incoming data, so that if I go:
groups = reg.Match(incomingData).Groups;

Then I don't get an error, but I get the group value matched by the Regular Expression for the UNH line.

Thanks:

0
Comment
Question by:djcheeky
  • 4
  • 3
7 Comments
 
LVL 27

Expert Comment

by:ddrudik
ID: 22721580
I don't get what you mean by:
What I want it to do is find the first occurance of the pattern, in the incoming data, so that if I go:
groups = reg.Match(incomingData).Groups;

I assume you are abbreviating your code for posting, doesn't seem to be valid.
0
 

Author Comment

by:djcheeky
ID: 22721900
Hi - sorry for that - I was hoping to avoid having to add all this code. :)
(Value in [ ] are variable names in the code)

Basically, a Regex [pattern] is created from the [templateItem]. Then what I want to do is match that [pattern] against the [incomingData] so that in the current example I gave below, it will match the data "UNH+00000154600001+CONTRL:D:3:UN+CONTRL'" in the [incomingData] .

Curently the code throws the exception because it is comparing my [pattern] against the [incomingData] and not finding a match. I would like to know how to modify this code to get the [pattern] to match the first occurance of the pattern in the [incomingData].

Thanks




using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.Text.RegularExpressions;
 

namespace MessageTranslationExample

{

    class Program

    {

        static void Main(string[] args)

        {

            GetIncomingValues();

            Console.ReadKey();

        }

          
 

         static void GetIncomingValues()

        {

            //Init

            String incomingData = @"UNH+00000154600001+CONTRL:D:3:UN+CONTRL'

UCI+00000000000443+ETRADEX+SARS+7'

UCM+00000044300001+CUSDEC:D:96B:UN:ZZZ01+7'

UNT+4+00000154600004'

UNT+5+00000154600005'

UNT+6+00000154600006'

UNZ+1+000001546'";

            String templateItem = @"UNH+{unhCode1}+{unhMessageType}:{unhShortCode}:{unhVersion}:{unhControlBody}+{unhType}'";

            Regex reg;

            String pattern;

            String keyValue, keyName;

            List<String> keys;

            int index;

            Dictionary<String, String> incomingDataDictionary;

            String lastMatch = "";

            reg = new Regex("(?<text>[^{}]*)({(?<key>[^}]+)})?"); // .NET Regular Expression matching KeyTemplate Grammar

            keys = new List<String>();

            incomingDataDictionary = new Dictionary<String, String>();
 

            // Pattern Start Character

            pattern = "^";
 

            //For each RegEx Template Item match in the Template

            foreach (Match match in reg.Matches(templateItem))

            {

                // Version 1

                //pattern += match.Groups["text"].Value.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)");
 

                // Version 2 : Handle whitespaces in the ValueTemplate

                keyValue = "";

                foreach (char c in match.Groups["text"].Value)

                {

                    if (c != ' ' && c != '\t')

                        keyValue += c + "$$SPACE$$";

                    else

                        keyValue += c;

                }
 

                keyValue = keyValue.Replace("$$SPACE$$ ", "$$SPACE$$");
 

                // Version 3 : Remove the last white space matcher of the pattern

                if (keyValue.EndsWith("$$SPACE$$"))

                    keyValue = keyValue.Substring(0, keyValue.Length - "$$SPACE$$".Length);
 

                pattern += keyValue.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)").Replace("$$SPACE$$", "\\s*");

                // End of version 2
 

                if (match.Groups["key"].Value != "")

                {

                    keyName = match.Groups["key"].Value.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)");
 

                    // Version 4 : Find a valid key name for the result dictionary to avoid duplicates when repeating the template

                    if (keys.Contains(keyName))

                    {

                        index = 1;

                        while (keys.Contains(keyName + "_" + index.ToString())) index++;

                        keyName = keyName + "_" + index.ToString();

                    }
 

                    keys.Add(keyName);

                    //Version 5 (Sample 6): A value may be omitted so make its matcher optionnal

                    pattern += string.Format("(?<{0}>.*)", keyName);

                    lastMatch = keyName;
 

                }

            }

            pattern += "$";

            Console.WriteLine("Pattern: " + pattern + "\n");

 

            //Value Extractor : Uses the generated Regex to extract values from the input

            reg = new Regex(pattern);

            

            if (!reg.IsMatch(incomingData))

            {

                throw new Exception("The templateItem could not be matched in the incoming String");

            }

        }

    }

}

Open in new window

0
 
LVL 27

Accepted Solution

by:
ddrudik earned 500 total points
ID: 22722085
It would seem that your code is missing the (?=\r\n|$) at the end of the pattern (in my earlier code examples):

string pattern = @"^U\s*N\s*H\s*\+(?<unhCode1>.*)\+(?<unhMessageType>.*):(?<unhShortCode>.*):(?<unhVersion>.*):(?<unhControlBody>.*)\+(?<unhType>.*)'(?=\r\n|$)";

Which would catch the first line match or entire match (or any line in Multiline mode).
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 

Author Comment

by:djcheeky
ID: 22723037
Hi ddrudik.

Ok, adding (?=\r\n|$) to the end of my RegEx and adding the RegexOptions.Multiline property did the trick.
But I had to change the code I originally had:

FROM::

groups = reg.Match(input).Groups;
                  foreach (string key in keys)
                  {
                        keyValue = groups[key.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)")].Value;
                        dct.Add(key, keyValue);
                  }

TO:

 MatchCollection mc = reg.Matches(incomingData);
            if (mc.Count > 0)
            {
                foreach (Match m in mc)
                {
                    for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
                    {
                        Console.WriteLine("[" + reg.GetGroupNames()[gIdx] + "] = " + m.Groups[gIdx].Value);
                    }
                }
            }


THANK YOU YET AGAIN!!!!!
Oh, yes - Just one quick question...

I noticed that when I run this, I get the output:
[0] = UNH+00000154600001+CONTRL:D:3:UN+CONTRL'
[unhCode1] = 00000154600001
[unhMessageType] = CONTRL
[unhShortCode] = D
[unhVersion] = 3
[unhControlBody] = UN
[unhType] = CONTRL

What is that first line with the [0] ???? And why is it there. Is the first element of a match group at index 0 always the full match or something???

Thanks!


//... excerpt from above code function ...

pattern += @"(?=\r\n|$)";

            Console.WriteLine("Pattern: " + pattern + "\n");
 

           

            //Value Extractor : Uses the generated Regex to extract values from the input

            reg = new Regex(pattern,RegexOptions.Multiline);

            if (!reg.IsMatch(incomingData))

            {

                throw new Exception("The Message IN Message Structure differs from that of the Message IN Template Structure and thus a conversion can not be done between the two. Last Successful Match Key was: " + lastMatch);

             }
 

            MatchCollection mc = reg.Matches(incomingData);

            if (mc.Count > 0)

            {

                foreach (Match m in mc)

                {

                    for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)

                    {

                        Console.WriteLine("[" + reg.GetGroupNames()[gIdx] + "] = " + m.Groups[gIdx].Value);

                    }

                }

            }

            else

            {

                Console.WriteLine("Pattern did not match: ({0}).", templateItem.templateBody);

            }

Open in new window

0
 
LVL 27

Expert Comment

by:ddrudik
ID: 22723092
[0] is the default capture group and is equal to the entire match, if you would have used unnamed capture groups such as "test(test)test..." then the unnamed capture groups would be numbered [1] [2] [3] etc.
0
 

Author Comment

by:djcheeky
ID: 22723133
Hahaha - WOW that was a fast reply Hahaha!
Cool - another lesson learnt! :)

Thanks again - time to continue :)
points awarded!!
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 22723151
Thanks for the question and the points.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Introduction Hi all and welcome to my first article on Experts Exchange. A while ago, someone asked me if i could do some tutorials on object oriented programming. I decided to do them on C#. Now you may ask me, why's that? Well, one of the re…
It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

895 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now