Solved

RegEx Match issue - match not happening

Posted on 2008-10-15
7
875 Views
Last Modified: 2008-10-15
Hi

I have a function that build a RegEx and applies it against some incoming data.

The incoming data:
incomingData = @"UNH+00000154600001+CONTRL:D:3:UN+CONTRL'
UCI+00000000000443+ETRADEX+SARS+7'
UCM+00000044300001+CUSDEC:D:96B:UN:ZZZ01+7'
UNT+4+00000154600004'
UNT+5+00000154600005'
UNT+6+00000154600006'
UNZ+1+000001546'";

The RegEx created:
pattern = "^U\s*N\s*H\s*\+(?<unhCode1>.*)\+(?<unhMessageType>.*):(?<unhShortCode>.
*):(?<unhVersion>.*):(?<unhControlBody>.*)\+(?<unhType>.*)'$"

Now when I try go:

 reg = new Regex(pattern);
 if (!reg.IsMatch(incomingData))
            {
                throw new Exception("The Incoming Data differs from the pattern ");
             }

Now I understand why this is happening - because the match is trying to compare the pattern to the WHOLE incoming String as far as I understand, and the pattern only matches the first part of the incoming data.

What I want it to do is find the first occurance of the pattern, in the incoming data, so that if I go:
groups = reg.Match(incomingData).Groups;

Then I don't get an error, but I get the group value matched by the Regular Expression for the UNH line.

Thanks:

0
Comment
Question by:djcheeky
  • 4
  • 3
7 Comments
 
LVL 27

Expert Comment

by:ddrudik
ID: 22721580
I don't get what you mean by:
What I want it to do is find the first occurance of the pattern, in the incoming data, so that if I go:
groups = reg.Match(incomingData).Groups;

I assume you are abbreviating your code for posting, doesn't seem to be valid.
0
 

Author Comment

by:djcheeky
ID: 22721900
Hi - sorry for that - I was hoping to avoid having to add all this code. :)
(Value in [ ] are variable names in the code)

Basically, a Regex [pattern] is created from the [templateItem]. Then what I want to do is match that [pattern] against the [incomingData] so that in the current example I gave below, it will match the data "UNH+00000154600001+CONTRL:D:3:UN+CONTRL'" in the [incomingData] .

Curently the code throws the exception because it is comparing my [pattern] against the [incomingData] and not finding a match. I would like to know how to modify this code to get the [pattern] to match the first occurance of the pattern in the [incomingData].

Thanks




using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
 
namespace MessageTranslationExample
{
    class Program
    {
        static void Main(string[] args)
        {
            GetIncomingValues();
            Console.ReadKey();
        }
          
 
         static void GetIncomingValues()
        {
            //Init
            String incomingData = @"UNH+00000154600001+CONTRL:D:3:UN+CONTRL'
UCI+00000000000443+ETRADEX+SARS+7'
UCM+00000044300001+CUSDEC:D:96B:UN:ZZZ01+7'
UNT+4+00000154600004'
UNT+5+00000154600005'
UNT+6+00000154600006'
UNZ+1+000001546'";
            String templateItem = @"UNH+{unhCode1}+{unhMessageType}:{unhShortCode}:{unhVersion}:{unhControlBody}+{unhType}'";
            Regex reg;
            String pattern;
            String keyValue, keyName;
            List<String> keys;
            int index;
            Dictionary<String, String> incomingDataDictionary;
            String lastMatch = "";
            reg = new Regex("(?<text>[^{}]*)({(?<key>[^}]+)})?"); // .NET Regular Expression matching KeyTemplate Grammar
            keys = new List<String>();
            incomingDataDictionary = new Dictionary<String, String>();
 
            // Pattern Start Character
            pattern = "^";
 
            //For each RegEx Template Item match in the Template
            foreach (Match match in reg.Matches(templateItem))
            {
                // Version 1
                //pattern += match.Groups["text"].Value.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)");
 
                // Version 2 : Handle whitespaces in the ValueTemplate
                keyValue = "";
                foreach (char c in match.Groups["text"].Value)
                {
                    if (c != ' ' && c != '\t')
                        keyValue += c + "$$SPACE$$";
                    else
                        keyValue += c;
                }
 
                keyValue = keyValue.Replace("$$SPACE$$ ", "$$SPACE$$");
 
                // Version 3 : Remove the last white space matcher of the pattern
                if (keyValue.EndsWith("$$SPACE$$"))
                    keyValue = keyValue.Substring(0, keyValue.Length - "$$SPACE$$".Length);
 
                pattern += keyValue.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)").Replace("$$SPACE$$", "\\s*");
                // End of version 2
 
                if (match.Groups["key"].Value != "")
                {
                    keyName = match.Groups["key"].Value.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)");
 
                    // Version 4 : Find a valid key name for the result dictionary to avoid duplicates when repeating the template
                    if (keys.Contains(keyName))
                    {
                        index = 1;
                        while (keys.Contains(keyName + "_" + index.ToString())) index++;
                        keyName = keyName + "_" + index.ToString();
                    }
 
                    keys.Add(keyName);
                    //Version 5 (Sample 6): A value may be omitted so make its matcher optionnal
                    pattern += string.Format("(?<{0}>.*)", keyName);
                    lastMatch = keyName;
 
                }
            }
            pattern += "$";
            Console.WriteLine("Pattern: " + pattern + "\n");
 
            //Value Extractor : Uses the generated Regex to extract values from the input
            reg = new Regex(pattern);
            
            if (!reg.IsMatch(incomingData))
            {
                throw new Exception("The templateItem could not be matched in the incoming String");
            }
        }
    }
}

Open in new window

0
 
LVL 27

Accepted Solution

by:
ddrudik earned 500 total points
ID: 22722085
It would seem that your code is missing the (?=\r\n|$) at the end of the pattern (in my earlier code examples):

string pattern = @"^U\s*N\s*H\s*\+(?<unhCode1>.*)\+(?<unhMessageType>.*):(?<unhShortCode>.*):(?<unhVersion>.*):(?<unhControlBody>.*)\+(?<unhType>.*)'(?=\r\n|$)";

Which would catch the first line match or entire match (or any line in Multiline mode).
0
Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 

Author Comment

by:djcheeky
ID: 22723037
Hi ddrudik.

Ok, adding (?=\r\n|$) to the end of my RegEx and adding the RegexOptions.Multiline property did the trick.
But I had to change the code I originally had:

FROM::

groups = reg.Match(input).Groups;
                  foreach (string key in keys)
                  {
                        keyValue = groups[key.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)")].Value;
                        dct.Add(key, keyValue);
                  }

TO:

 MatchCollection mc = reg.Matches(incomingData);
            if (mc.Count > 0)
            {
                foreach (Match m in mc)
                {
                    for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
                    {
                        Console.WriteLine("[" + reg.GetGroupNames()[gIdx] + "] = " + m.Groups[gIdx].Value);
                    }
                }
            }


THANK YOU YET AGAIN!!!!!
Oh, yes - Just one quick question...

I noticed that when I run this, I get the output:
[0] = UNH+00000154600001+CONTRL:D:3:UN+CONTRL'
[unhCode1] = 00000154600001
[unhMessageType] = CONTRL
[unhShortCode] = D
[unhVersion] = 3
[unhControlBody] = UN
[unhType] = CONTRL

What is that first line with the [0] ???? And why is it there. Is the first element of a match group at index 0 always the full match or something???

Thanks!


//... excerpt from above code function ...
pattern += @"(?=\r\n|$)";
            Console.WriteLine("Pattern: " + pattern + "\n");
 
           
            //Value Extractor : Uses the generated Regex to extract values from the input
            reg = new Regex(pattern,RegexOptions.Multiline);
            if (!reg.IsMatch(incomingData))
            {
                throw new Exception("The Message IN Message Structure differs from that of the Message IN Template Structure and thus a conversion can not be done between the two. Last Successful Match Key was: " + lastMatch);
             }
 
            MatchCollection mc = reg.Matches(incomingData);
            if (mc.Count > 0)
            {
                foreach (Match m in mc)
                {
                    for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
                    {
                        Console.WriteLine("[" + reg.GetGroupNames()[gIdx] + "] = " + m.Groups[gIdx].Value);
                    }
                }
            }
            else
            {
                Console.WriteLine("Pattern did not match: ({0}).", templateItem.templateBody);
            }

Open in new window

0
 
LVL 27

Expert Comment

by:ddrudik
ID: 22723092
[0] is the default capture group and is equal to the entire match, if you would have used unnamed capture groups such as "test(test)test..." then the unnamed capture groups would be numbered [1] [2] [3] etc.
0
 

Author Comment

by:djcheeky
ID: 22723133
Hahaha - WOW that was a fast reply Hahaha!
Cool - another lesson learnt! :)

Thanks again - time to continue :)
points awarded!!
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 22723151
Thanks for the question and the points.
0

Featured Post

Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Summary: Persistence is the capability of an application to store the state of objects and recover it when necessary. This article compares the two common types of serialization in aspects of data access, readability, and runtime cost. A ready-to…
Entity Framework is a powerful tool to help you interact with the DataBase but still doesn't help much when we have a Stored Procedure that returns more than one resultset. The solution takes some of out-of-the-box thinking; read on!
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

831 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question