Link to home
Start Free TrialLog in
Avatar of djcheeky
djcheekyFlag for United States of America

asked on

RegEx Match issue - match not happening

Hi

I have a function that build a RegEx and applies it against some incoming data.

The incoming data:
incomingData = @"UNH+00000154600001+CONTRL:D:3:UN+CONTRL'
UCI+00000000000443+ETRADEX+SARS+7'
UCM+00000044300001+CUSDEC:D:96B:UN:ZZZ01+7'
UNT+4+00000154600004'
UNT+5+00000154600005'
UNT+6+00000154600006'
UNZ+1+000001546'";

The RegEx created:
pattern = "^U\s*N\s*H\s*\+(?<unhCode1>.*)\+(?<unhMessageType>.*):(?<unhShortCode>.
*):(?<unhVersion>.*):(?<unhControlBody>.*)\+(?<unhType>.*)'$"

Now when I try go:

 reg = new Regex(pattern);
 if (!reg.IsMatch(incomingData))
            {
                throw new Exception("The Incoming Data differs from the pattern ");
             }

Now I understand why this is happening - because the match is trying to compare the pattern to the WHOLE incoming String as far as I understand, and the pattern only matches the first part of the incoming data.

What I want it to do is find the first occurance of the pattern, in the incoming data, so that if I go:
groups = reg.Match(incomingData).Groups;

Then I don't get an error, but I get the group value matched by the Regular Expression for the UNH line.

Thanks:

Avatar of ddrudik
ddrudik
Flag of United States of America image

I don't get what you mean by:
What I want it to do is find the first occurance of the pattern, in the incoming data, so that if I go:
groups = reg.Match(incomingData).Groups;

I assume you are abbreviating your code for posting, doesn't seem to be valid.
Avatar of djcheeky

ASKER

Hi - sorry for that - I was hoping to avoid having to add all this code. :)
(Value in [ ] are variable names in the code)

Basically, a Regex [pattern] is created from the [templateItem]. Then what I want to do is match that [pattern] against the [incomingData] so that in the current example I gave below, it will match the data "UNH+00000154600001+CONTRL:D:3:UN+CONTRL'" in the [incomingData] .

Curently the code throws the exception because it is comparing my [pattern] against the [incomingData] and not finding a match. I would like to know how to modify this code to get the [pattern] to match the first occurance of the pattern in the [incomingData].

Thanks




using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
 
namespace MessageTranslationExample
{
    class Program
    {
        static void Main(string[] args)
        {
            GetIncomingValues();
            Console.ReadKey();
        }
          
 
         static void GetIncomingValues()
        {
            //Init
            String incomingData = @"UNH+00000154600001+CONTRL:D:3:UN+CONTRL'
UCI+00000000000443+ETRADEX+SARS+7'
UCM+00000044300001+CUSDEC:D:96B:UN:ZZZ01+7'
UNT+4+00000154600004'
UNT+5+00000154600005'
UNT+6+00000154600006'
UNZ+1+000001546'";
            String templateItem = @"UNH+{unhCode1}+{unhMessageType}:{unhShortCode}:{unhVersion}:{unhControlBody}+{unhType}'";
            Regex reg;
            String pattern;
            String keyValue, keyName;
            List<String> keys;
            int index;
            Dictionary<String, String> incomingDataDictionary;
            String lastMatch = "";
            reg = new Regex("(?<text>[^{}]*)({(?<key>[^}]+)})?"); // .NET Regular Expression matching KeyTemplate Grammar
            keys = new List<String>();
            incomingDataDictionary = new Dictionary<String, String>();
 
            // Pattern Start Character
            pattern = "^";
 
            //For each RegEx Template Item match in the Template
            foreach (Match match in reg.Matches(templateItem))
            {
                // Version 1
                //pattern += match.Groups["text"].Value.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)");
 
                // Version 2 : Handle whitespaces in the ValueTemplate
                keyValue = "";
                foreach (char c in match.Groups["text"].Value)
                {
                    if (c != ' ' && c != '\t')
                        keyValue += c + "$$SPACE$$";
                    else
                        keyValue += c;
                }
 
                keyValue = keyValue.Replace("$$SPACE$$ ", "$$SPACE$$");
 
                // Version 3 : Remove the last white space matcher of the pattern
                if (keyValue.EndsWith("$$SPACE$$"))
                    keyValue = keyValue.Substring(0, keyValue.Length - "$$SPACE$$".Length);
 
                pattern += keyValue.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)").Replace("$$SPACE$$", "\\s*");
                // End of version 2
 
                if (match.Groups["key"].Value != "")
                {
                    keyName = match.Groups["key"].Value.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)");
 
                    // Version 4 : Find a valid key name for the result dictionary to avoid duplicates when repeating the template
                    if (keys.Contains(keyName))
                    {
                        index = 1;
                        while (keys.Contains(keyName + "_" + index.ToString())) index++;
                        keyName = keyName + "_" + index.ToString();
                    }
 
                    keys.Add(keyName);
                    //Version 5 (Sample 6): A value may be omitted so make its matcher optionnal
                    pattern += string.Format("(?<{0}>.*)", keyName);
                    lastMatch = keyName;
 
                }
            }
            pattern += "$";
            Console.WriteLine("Pattern: " + pattern + "\n");
 
            //Value Extractor : Uses the generated Regex to extract values from the input
            reg = new Regex(pattern);
            
            if (!reg.IsMatch(incomingData))
            {
                throw new Exception("The templateItem could not be matched in the incoming String");
            }
        }
    }
}

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of ddrudik
ddrudik
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi ddrudik.

Ok, adding (?=\r\n|$) to the end of my RegEx and adding the RegexOptions.Multiline property did the trick.
But I had to change the code I originally had:

FROM::

groups = reg.Match(input).Groups;
                  foreach (string key in keys)
                  {
                        keyValue = groups[key.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)")].Value;
                        dct.Add(key, keyValue);
                  }

TO:

 MatchCollection mc = reg.Matches(incomingData);
            if (mc.Count > 0)
            {
                foreach (Match m in mc)
                {
                    for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
                    {
                        Console.WriteLine("[" + reg.GetGroupNames()[gIdx] + "] = " + m.Groups[gIdx].Value);
                    }
                }
            }


THANK YOU YET AGAIN!!!!!
Oh, yes - Just one quick question...

I noticed that when I run this, I get the output:
[0] = UNH+00000154600001+CONTRL:D:3:UN+CONTRL'
[unhCode1] = 00000154600001
[unhMessageType] = CONTRL
[unhShortCode] = D
[unhVersion] = 3
[unhControlBody] = UN
[unhType] = CONTRL

What is that first line with the [0] ???? And why is it there. Is the first element of a match group at index 0 always the full match or something???

Thanks!


//... excerpt from above code function ...
pattern += @"(?=\r\n|$)";
            Console.WriteLine("Pattern: " + pattern + "\n");
 
           
            //Value Extractor : Uses the generated Regex to extract values from the input
            reg = new Regex(pattern,RegexOptions.Multiline);
            if (!reg.IsMatch(incomingData))
            {
                throw new Exception("The Message IN Message Structure differs from that of the Message IN Template Structure and thus a conversion can not be done between the two. Last Successful Match Key was: " + lastMatch);
             }
 
            MatchCollection mc = reg.Matches(incomingData);
            if (mc.Count > 0)
            {
                foreach (Match m in mc)
                {
                    for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
                    {
                        Console.WriteLine("[" + reg.GetGroupNames()[gIdx] + "] = " + m.Groups[gIdx].Value);
                    }
                }
            }
            else
            {
                Console.WriteLine("Pattern did not match: ({0}).", templateItem.templateBody);
            }

Open in new window

[0] is the default capture group and is equal to the entire match, if you would have used unnamed capture groups such as "test(test)test..." then the unnamed capture groups would be numbered [1] [2] [3] etc.
Hahaha - WOW that was a fast reply Hahaha!
Cool - another lesson learnt! :)

Thanks again - time to continue :)
points awarded!!
Thanks for the question and the points.