Solved

RegEx Match issue - match not happening

Posted on 2008-10-15
7
873 Views
Last Modified: 2008-10-15
Hi

I have a function that build a RegEx and applies it against some incoming data.

The incoming data:
incomingData = @"UNH+00000154600001+CONTRL:D:3:UN+CONTRL'
UCI+00000000000443+ETRADEX+SARS+7'
UCM+00000044300001+CUSDEC:D:96B:UN:ZZZ01+7'
UNT+4+00000154600004'
UNT+5+00000154600005'
UNT+6+00000154600006'
UNZ+1+000001546'";

The RegEx created:
pattern = "^U\s*N\s*H\s*\+(?<unhCode1>.*)\+(?<unhMessageType>.*):(?<unhShortCode>.
*):(?<unhVersion>.*):(?<unhControlBody>.*)\+(?<unhType>.*)'$"

Now when I try go:

 reg = new Regex(pattern);
 if (!reg.IsMatch(incomingData))
            {
                throw new Exception("The Incoming Data differs from the pattern ");
             }

Now I understand why this is happening - because the match is trying to compare the pattern to the WHOLE incoming String as far as I understand, and the pattern only matches the first part of the incoming data.

What I want it to do is find the first occurance of the pattern, in the incoming data, so that if I go:
groups = reg.Match(incomingData).Groups;

Then I don't get an error, but I get the group value matched by the Regular Expression for the UNH line.

Thanks:

0
Comment
Question by:djcheeky
  • 4
  • 3
7 Comments
 
LVL 27

Expert Comment

by:ddrudik
ID: 22721580
I don't get what you mean by:
What I want it to do is find the first occurance of the pattern, in the incoming data, so that if I go:
groups = reg.Match(incomingData).Groups;

I assume you are abbreviating your code for posting, doesn't seem to be valid.
0
 

Author Comment

by:djcheeky
ID: 22721900
Hi - sorry for that - I was hoping to avoid having to add all this code. :)
(Value in [ ] are variable names in the code)

Basically, a Regex [pattern] is created from the [templateItem]. Then what I want to do is match that [pattern] against the [incomingData] so that in the current example I gave below, it will match the data "UNH+00000154600001+CONTRL:D:3:UN+CONTRL'" in the [incomingData] .

Curently the code throws the exception because it is comparing my [pattern] against the [incomingData] and not finding a match. I would like to know how to modify this code to get the [pattern] to match the first occurance of the pattern in the [incomingData].

Thanks




using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
 
namespace MessageTranslationExample
{
    class Program
    {
        static void Main(string[] args)
        {
            GetIncomingValues();
            Console.ReadKey();
        }
          
 
         static void GetIncomingValues()
        {
            //Init
            String incomingData = @"UNH+00000154600001+CONTRL:D:3:UN+CONTRL'
UCI+00000000000443+ETRADEX+SARS+7'
UCM+00000044300001+CUSDEC:D:96B:UN:ZZZ01+7'
UNT+4+00000154600004'
UNT+5+00000154600005'
UNT+6+00000154600006'
UNZ+1+000001546'";
            String templateItem = @"UNH+{unhCode1}+{unhMessageType}:{unhShortCode}:{unhVersion}:{unhControlBody}+{unhType}'";
            Regex reg;
            String pattern;
            String keyValue, keyName;
            List<String> keys;
            int index;
            Dictionary<String, String> incomingDataDictionary;
            String lastMatch = "";
            reg = new Regex("(?<text>[^{}]*)({(?<key>[^}]+)})?"); // .NET Regular Expression matching KeyTemplate Grammar
            keys = new List<String>();
            incomingDataDictionary = new Dictionary<String, String>();
 
            // Pattern Start Character
            pattern = "^";
 
            //For each RegEx Template Item match in the Template
            foreach (Match match in reg.Matches(templateItem))
            {
                // Version 1
                //pattern += match.Groups["text"].Value.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)");
 
                // Version 2 : Handle whitespaces in the ValueTemplate
                keyValue = "";
                foreach (char c in match.Groups["text"].Value)
                {
                    if (c != ' ' && c != '\t')
                        keyValue += c + "$$SPACE$$";
                    else
                        keyValue += c;
                }
 
                keyValue = keyValue.Replace("$$SPACE$$ ", "$$SPACE$$");
 
                // Version 3 : Remove the last white space matcher of the pattern
                if (keyValue.EndsWith("$$SPACE$$"))
                    keyValue = keyValue.Substring(0, keyValue.Length - "$$SPACE$$".Length);
 
                pattern += keyValue.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)").Replace("$$SPACE$$", "\\s*");
                // End of version 2
 
                if (match.Groups["key"].Value != "")
                {
                    keyName = match.Groups["key"].Value.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)");
 
                    // Version 4 : Find a valid key name for the result dictionary to avoid duplicates when repeating the template
                    if (keys.Contains(keyName))
                    {
                        index = 1;
                        while (keys.Contains(keyName + "_" + index.ToString())) index++;
                        keyName = keyName + "_" + index.ToString();
                    }
 
                    keys.Add(keyName);
                    //Version 5 (Sample 6): A value may be omitted so make its matcher optionnal
                    pattern += string.Format("(?<{0}>.*)", keyName);
                    lastMatch = keyName;
 
                }
            }
            pattern += "$";
            Console.WriteLine("Pattern: " + pattern + "\n");
 
            //Value Extractor : Uses the generated Regex to extract values from the input
            reg = new Regex(pattern);
            
            if (!reg.IsMatch(incomingData))
            {
                throw new Exception("The templateItem could not be matched in the incoming String");
            }
        }
    }
}

Open in new window

0
 
LVL 27

Accepted Solution

by:
ddrudik earned 500 total points
ID: 22722085
It would seem that your code is missing the (?=\r\n|$) at the end of the pattern (in my earlier code examples):

string pattern = @"^U\s*N\s*H\s*\+(?<unhCode1>.*)\+(?<unhMessageType>.*):(?<unhShortCode>.*):(?<unhVersion>.*):(?<unhControlBody>.*)\+(?<unhType>.*)'(?=\r\n|$)";

Which would catch the first line match or entire match (or any line in Multiline mode).
0
Does Powershell have you tied up in knots?

Managing Active Directory does not always have to be complicated.  If you are spending more time trying instead of doing, then it's time to look at something else. For nearly 20 years, AD admins around the world have used one tool for day-to-day AD management: Hyena. Discover why

 

Author Comment

by:djcheeky
ID: 22723037
Hi ddrudik.

Ok, adding (?=\r\n|$) to the end of my RegEx and adding the RegexOptions.Multiline property did the trick.
But I had to change the code I originally had:

FROM::

groups = reg.Match(input).Groups;
                  foreach (string key in keys)
                  {
                        keyValue = groups[key.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)")].Value;
                        dct.Add(key, keyValue);
                  }

TO:

 MatchCollection mc = reg.Matches(incomingData);
            if (mc.Count > 0)
            {
                foreach (Match m in mc)
                {
                    for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
                    {
                        Console.WriteLine("[" + reg.GetGroupNames()[gIdx] + "] = " + m.Groups[gIdx].Value);
                    }
                }
            }


THANK YOU YET AGAIN!!!!!
Oh, yes - Just one quick question...

I noticed that when I run this, I get the output:
[0] = UNH+00000154600001+CONTRL:D:3:UN+CONTRL'
[unhCode1] = 00000154600001
[unhMessageType] = CONTRL
[unhShortCode] = D
[unhVersion] = 3
[unhControlBody] = UN
[unhType] = CONTRL

What is that first line with the [0] ???? And why is it there. Is the first element of a match group at index 0 always the full match or something???

Thanks!


//... excerpt from above code function ...
pattern += @"(?=\r\n|$)";
            Console.WriteLine("Pattern: " + pattern + "\n");
 
           
            //Value Extractor : Uses the generated Regex to extract values from the input
            reg = new Regex(pattern,RegexOptions.Multiline);
            if (!reg.IsMatch(incomingData))
            {
                throw new Exception("The Message IN Message Structure differs from that of the Message IN Template Structure and thus a conversion can not be done between the two. Last Successful Match Key was: " + lastMatch);
             }
 
            MatchCollection mc = reg.Matches(incomingData);
            if (mc.Count > 0)
            {
                foreach (Match m in mc)
                {
                    for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
                    {
                        Console.WriteLine("[" + reg.GetGroupNames()[gIdx] + "] = " + m.Groups[gIdx].Value);
                    }
                }
            }
            else
            {
                Console.WriteLine("Pattern did not match: ({0}).", templateItem.templateBody);
            }

Open in new window

0
 
LVL 27

Expert Comment

by:ddrudik
ID: 22723092
[0] is the default capture group and is equal to the entire match, if you would have used unnamed capture groups such as "test(test)test..." then the unnamed capture groups would be numbered [1] [2] [3] etc.
0
 

Author Comment

by:djcheeky
ID: 22723133
Hahaha - WOW that was a fast reply Hahaha!
Cool - another lesson learnt! :)

Thanks again - time to continue :)
points awarded!!
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 22723151
Thanks for the question and the points.
0

Featured Post

Master Your Team's Linux and Cloud Stack!

The average business loses $13.5M per year to ineffective training (per 1,000 employees). Keep ahead of the competition and combine in-person quality with online cost and flexibility by training with Linux Academy.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Web Form VB.Net  import CSV 4 38
Hey!! 5 33
Cant save 3D 4 20
Need to start a web service from Visual Studio 2015 Pro 2 24
I have been reconstructing a PHP-based application that has grown into a full blown interface system over the last ten years by a developer that has now gone into business for himself building websites. I am not incredibly fond of writing PHP code o…
As most anyone who uses or has come across them can attest to, regular expressions (regex) are a complicated bit of magic. Packed so succinctly within their cryptic syntax lies a great deal of power. It's not the "take over the world" kind of power,…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

772 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question