RegEx Match issue - match not happening

Hi

I have a function that build a RegEx and applies it against some incoming data.

The incoming data:
incomingData = @"UNH+00000154600001+CONTRL:D:3:UN+CONTRL'
UCI+00000000000443+ETRADEX+SARS+7'
UCM+00000044300001+CUSDEC:D:96B:UN:ZZZ01+7'
UNT+4+00000154600004'
UNT+5+00000154600005'
UNT+6+00000154600006'
UNZ+1+000001546'";

The RegEx created:
pattern = "^U\s*N\s*H\s*\+(?<unhCode1>.*)\+(?<unhMessageType>.*):(?<unhShortCode>.
*):(?<unhVersion>.*):(?<unhControlBody>.*)\+(?<unhType>.*)'$"

Now when I try go:

 reg = new Regex(pattern);
 if (!reg.IsMatch(incomingData))
            {
                throw new Exception("The Incoming Data differs from the pattern ");
             }

Now I understand why this is happening - because the match is trying to compare the pattern to the WHOLE incoming String as far as I understand, and the pattern only matches the first part of the incoming data.

What I want it to do is find the first occurance of the pattern, in the incoming data, so that if I go:
groups = reg.Match(incomingData).Groups;

Then I don't get an error, but I get the group value matched by the Regular Expression for the UNH line.

Thanks:

djcheekyAsked:
Who is Participating?

Improve company productivity with a Business Account.Sign Up

x
 
ddrudikConnect With a Mentor Commented:
It would seem that your code is missing the (?=\r\n|$) at the end of the pattern (in my earlier code examples):

string pattern = @"^U\s*N\s*H\s*\+(?<unhCode1>.*)\+(?<unhMessageType>.*):(?<unhShortCode>.*):(?<unhVersion>.*):(?<unhControlBody>.*)\+(?<unhType>.*)'(?=\r\n|$)";

Which would catch the first line match or entire match (or any line in Multiline mode).
0
 
ddrudikCommented:
I don't get what you mean by:
What I want it to do is find the first occurance of the pattern, in the incoming data, so that if I go:
groups = reg.Match(incomingData).Groups;

I assume you are abbreviating your code for posting, doesn't seem to be valid.
0
 
djcheekyAuthor Commented:
Hi - sorry for that - I was hoping to avoid having to add all this code. :)
(Value in [ ] are variable names in the code)

Basically, a Regex [pattern] is created from the [templateItem]. Then what I want to do is match that [pattern] against the [incomingData] so that in the current example I gave below, it will match the data "UNH+00000154600001+CONTRL:D:3:UN+CONTRL'" in the [incomingData] .

Curently the code throws the exception because it is comparing my [pattern] against the [incomingData] and not finding a match. I would like to know how to modify this code to get the [pattern] to match the first occurance of the pattern in the [incomingData].

Thanks




using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
 
namespace MessageTranslationExample
{
    class Program
    {
        static void Main(string[] args)
        {
            GetIncomingValues();
            Console.ReadKey();
        }
          
 
         static void GetIncomingValues()
        {
            //Init
            String incomingData = @"UNH+00000154600001+CONTRL:D:3:UN+CONTRL'
UCI+00000000000443+ETRADEX+SARS+7'
UCM+00000044300001+CUSDEC:D:96B:UN:ZZZ01+7'
UNT+4+00000154600004'
UNT+5+00000154600005'
UNT+6+00000154600006'
UNZ+1+000001546'";
            String templateItem = @"UNH+{unhCode1}+{unhMessageType}:{unhShortCode}:{unhVersion}:{unhControlBody}+{unhType}'";
            Regex reg;
            String pattern;
            String keyValue, keyName;
            List<String> keys;
            int index;
            Dictionary<String, String> incomingDataDictionary;
            String lastMatch = "";
            reg = new Regex("(?<text>[^{}]*)({(?<key>[^}]+)})?"); // .NET Regular Expression matching KeyTemplate Grammar
            keys = new List<String>();
            incomingDataDictionary = new Dictionary<String, String>();
 
            // Pattern Start Character
            pattern = "^";
 
            //For each RegEx Template Item match in the Template
            foreach (Match match in reg.Matches(templateItem))
            {
                // Version 1
                //pattern += match.Groups["text"].Value.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)");
 
                // Version 2 : Handle whitespaces in the ValueTemplate
                keyValue = "";
                foreach (char c in match.Groups["text"].Value)
                {
                    if (c != ' ' && c != '\t')
                        keyValue += c + "$$SPACE$$";
                    else
                        keyValue += c;
                }
 
                keyValue = keyValue.Replace("$$SPACE$$ ", "$$SPACE$$");
 
                // Version 3 : Remove the last white space matcher of the pattern
                if (keyValue.EndsWith("$$SPACE$$"))
                    keyValue = keyValue.Substring(0, keyValue.Length - "$$SPACE$$".Length);
 
                pattern += keyValue.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)").Replace("$$SPACE$$", "\\s*");
                // End of version 2
 
                if (match.Groups["key"].Value != "")
                {
                    keyName = match.Groups["key"].Value.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)");
 
                    // Version 4 : Find a valid key name for the result dictionary to avoid duplicates when repeating the template
                    if (keys.Contains(keyName))
                    {
                        index = 1;
                        while (keys.Contains(keyName + "_" + index.ToString())) index++;
                        keyName = keyName + "_" + index.ToString();
                    }
 
                    keys.Add(keyName);
                    //Version 5 (Sample 6): A value may be omitted so make its matcher optionnal
                    pattern += string.Format("(?<{0}>.*)", keyName);
                    lastMatch = keyName;
 
                }
            }
            pattern += "$";
            Console.WriteLine("Pattern: " + pattern + "\n");
 
            //Value Extractor : Uses the generated Regex to extract values from the input
            reg = new Regex(pattern);
            
            if (!reg.IsMatch(incomingData))
            {
                throw new Exception("The templateItem could not be matched in the incoming String");
            }
        }
    }
}

Open in new window

0
Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
djcheekyAuthor Commented:
Hi ddrudik.

Ok, adding (?=\r\n|$) to the end of my RegEx and adding the RegexOptions.Multiline property did the trick.
But I had to change the code I originally had:

FROM::

groups = reg.Match(input).Groups;
                  foreach (string key in keys)
                  {
                        keyValue = groups[key.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)")].Value;
                        dct.Add(key, keyValue);
                  }

TO:

 MatchCollection mc = reg.Matches(incomingData);
            if (mc.Count > 0)
            {
                foreach (Match m in mc)
                {
                    for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
                    {
                        Console.WriteLine("[" + reg.GetGroupNames()[gIdx] + "] = " + m.Groups[gIdx].Value);
                    }
                }
            }


THANK YOU YET AGAIN!!!!!
Oh, yes - Just one quick question...

I noticed that when I run this, I get the output:
[0] = UNH+00000154600001+CONTRL:D:3:UN+CONTRL'
[unhCode1] = 00000154600001
[unhMessageType] = CONTRL
[unhShortCode] = D
[unhVersion] = 3
[unhControlBody] = UN
[unhType] = CONTRL

What is that first line with the [0] ???? And why is it there. Is the first element of a match group at index 0 always the full match or something???

Thanks!


//... excerpt from above code function ...
pattern += @"(?=\r\n|$)";
            Console.WriteLine("Pattern: " + pattern + "\n");
 
           
            //Value Extractor : Uses the generated Regex to extract values from the input
            reg = new Regex(pattern,RegexOptions.Multiline);
            if (!reg.IsMatch(incomingData))
            {
                throw new Exception("The Message IN Message Structure differs from that of the Message IN Template Structure and thus a conversion can not be done between the two. Last Successful Match Key was: " + lastMatch);
             }
 
            MatchCollection mc = reg.Matches(incomingData);
            if (mc.Count > 0)
            {
                foreach (Match m in mc)
                {
                    for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
                    {
                        Console.WriteLine("[" + reg.GetGroupNames()[gIdx] + "] = " + m.Groups[gIdx].Value);
                    }
                }
            }
            else
            {
                Console.WriteLine("Pattern did not match: ({0}).", templateItem.templateBody);
            }

Open in new window

0
 
ddrudikCommented:
[0] is the default capture group and is equal to the entire match, if you would have used unnamed capture groups such as "test(test)test..." then the unnamed capture groups would be numbered [1] [2] [3] etc.
0
 
djcheekyAuthor Commented:
Hahaha - WOW that was a fast reply Hahaha!
Cool - another lesson learnt! :)

Thanks again - time to continue :)
points awarded!!
0
 
ddrudikCommented:
Thanks for the question and the points.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.