Solved

RegEx Match issue - match not happening

Posted on 2008-10-15
7
867 Views
Last Modified: 2008-10-15
Hi

I have a function that build a RegEx and applies it against some incoming data.

The incoming data:
incomingData = @"UNH+00000154600001+CONTRL:D:3:UN+CONTRL'
UCI+00000000000443+ETRADEX+SARS+7'
UCM+00000044300001+CUSDEC:D:96B:UN:ZZZ01+7'
UNT+4+00000154600004'
UNT+5+00000154600005'
UNT+6+00000154600006'
UNZ+1+000001546'";

The RegEx created:
pattern = "^U\s*N\s*H\s*\+(?<unhCode1>.*)\+(?<unhMessageType>.*):(?<unhShortCode>.
*):(?<unhVersion>.*):(?<unhControlBody>.*)\+(?<unhType>.*)'$"

Now when I try go:

 reg = new Regex(pattern);
 if (!reg.IsMatch(incomingData))
            {
                throw new Exception("The Incoming Data differs from the pattern ");
             }

Now I understand why this is happening - because the match is trying to compare the pattern to the WHOLE incoming String as far as I understand, and the pattern only matches the first part of the incoming data.

What I want it to do is find the first occurance of the pattern, in the incoming data, so that if I go:
groups = reg.Match(incomingData).Groups;

Then I don't get an error, but I get the group value matched by the Regular Expression for the UNH line.

Thanks:

0
Comment
Question by:djcheeky
  • 4
  • 3
7 Comments
 
LVL 27

Expert Comment

by:ddrudik
Comment Utility
I don't get what you mean by:
What I want it to do is find the first occurance of the pattern, in the incoming data, so that if I go:
groups = reg.Match(incomingData).Groups;

I assume you are abbreviating your code for posting, doesn't seem to be valid.
0
 

Author Comment

by:djcheeky
Comment Utility
Hi - sorry for that - I was hoping to avoid having to add all this code. :)
(Value in [ ] are variable names in the code)

Basically, a Regex [pattern] is created from the [templateItem]. Then what I want to do is match that [pattern] against the [incomingData] so that in the current example I gave below, it will match the data "UNH+00000154600001+CONTRL:D:3:UN+CONTRL'" in the [incomingData] .

Curently the code throws the exception because it is comparing my [pattern] against the [incomingData] and not finding a match. I would like to know how to modify this code to get the [pattern] to match the first occurance of the pattern in the [incomingData].

Thanks




using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.Text.RegularExpressions;
 

namespace MessageTranslationExample

{

    class Program

    {

        static void Main(string[] args)

        {

            GetIncomingValues();

            Console.ReadKey();

        }

          
 

         static void GetIncomingValues()

        {

            //Init

            String incomingData = @"UNH+00000154600001+CONTRL:D:3:UN+CONTRL'

UCI+00000000000443+ETRADEX+SARS+7'

UCM+00000044300001+CUSDEC:D:96B:UN:ZZZ01+7'

UNT+4+00000154600004'

UNT+5+00000154600005'

UNT+6+00000154600006'

UNZ+1+000001546'";

            String templateItem = @"UNH+{unhCode1}+{unhMessageType}:{unhShortCode}:{unhVersion}:{unhControlBody}+{unhType}'";

            Regex reg;

            String pattern;

            String keyValue, keyName;

            List<String> keys;

            int index;

            Dictionary<String, String> incomingDataDictionary;

            String lastMatch = "";

            reg = new Regex("(?<text>[^{}]*)({(?<key>[^}]+)})?"); // .NET Regular Expression matching KeyTemplate Grammar

            keys = new List<String>();

            incomingDataDictionary = new Dictionary<String, String>();
 

            // Pattern Start Character

            pattern = "^";
 

            //For each RegEx Template Item match in the Template

            foreach (Match match in reg.Matches(templateItem))

            {

                // Version 1

                //pattern += match.Groups["text"].Value.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)");
 

                // Version 2 : Handle whitespaces in the ValueTemplate

                keyValue = "";

                foreach (char c in match.Groups["text"].Value)

                {

                    if (c != ' ' && c != '\t')

                        keyValue += c + "$$SPACE$$";

                    else

                        keyValue += c;

                }
 

                keyValue = keyValue.Replace("$$SPACE$$ ", "$$SPACE$$");
 

                // Version 3 : Remove the last white space matcher of the pattern

                if (keyValue.EndsWith("$$SPACE$$"))

                    keyValue = keyValue.Substring(0, keyValue.Length - "$$SPACE$$".Length);
 

                pattern += keyValue.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)").Replace("$$SPACE$$", "\\s*");

                // End of version 2
 

                if (match.Groups["key"].Value != "")

                {

                    keyName = match.Groups["key"].Value.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)");
 

                    // Version 4 : Find a valid key name for the result dictionary to avoid duplicates when repeating the template

                    if (keys.Contains(keyName))

                    {

                        index = 1;

                        while (keys.Contains(keyName + "_" + index.ToString())) index++;

                        keyName = keyName + "_" + index.ToString();

                    }
 

                    keys.Add(keyName);

                    //Version 5 (Sample 6): A value may be omitted so make its matcher optionnal

                    pattern += string.Format("(?<{0}>.*)", keyName);

                    lastMatch = keyName;
 

                }

            }

            pattern += "$";

            Console.WriteLine("Pattern: " + pattern + "\n");

 

            //Value Extractor : Uses the generated Regex to extract values from the input

            reg = new Regex(pattern);

            

            if (!reg.IsMatch(incomingData))

            {

                throw new Exception("The templateItem could not be matched in the incoming String");

            }

        }

    }

}

Open in new window

0
 
LVL 27

Accepted Solution

by:
ddrudik earned 500 total points
Comment Utility
It would seem that your code is missing the (?=\r\n|$) at the end of the pattern (in my earlier code examples):

string pattern = @"^U\s*N\s*H\s*\+(?<unhCode1>.*)\+(?<unhMessageType>.*):(?<unhShortCode>.*):(?<unhVersion>.*):(?<unhControlBody>.*)\+(?<unhType>.*)'(?=\r\n|$)";

Which would catch the first line match or entire match (or any line in Multiline mode).
0
Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

 

Author Comment

by:djcheeky
Comment Utility
Hi ddrudik.

Ok, adding (?=\r\n|$) to the end of my RegEx and adding the RegexOptions.Multiline property did the trick.
But I had to change the code I originally had:

FROM::

groups = reg.Match(input).Groups;
                  foreach (string key in keys)
                  {
                        keyValue = groups[key.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)")].Value;
                        dct.Add(key, keyValue);
                  }

TO:

 MatchCollection mc = reg.Matches(incomingData);
            if (mc.Count > 0)
            {
                foreach (Match m in mc)
                {
                    for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
                    {
                        Console.WriteLine("[" + reg.GetGroupNames()[gIdx] + "] = " + m.Groups[gIdx].Value);
                    }
                }
            }


THANK YOU YET AGAIN!!!!!
Oh, yes - Just one quick question...

I noticed that when I run this, I get the output:
[0] = UNH+00000154600001+CONTRL:D:3:UN+CONTRL'
[unhCode1] = 00000154600001
[unhMessageType] = CONTRL
[unhShortCode] = D
[unhVersion] = 3
[unhControlBody] = UN
[unhType] = CONTRL

What is that first line with the [0] ???? And why is it there. Is the first element of a match group at index 0 always the full match or something???

Thanks!


//... excerpt from above code function ...

pattern += @"(?=\r\n|$)";

            Console.WriteLine("Pattern: " + pattern + "\n");
 

           

            //Value Extractor : Uses the generated Regex to extract values from the input

            reg = new Regex(pattern,RegexOptions.Multiline);

            if (!reg.IsMatch(incomingData))

            {

                throw new Exception("The Message IN Message Structure differs from that of the Message IN Template Structure and thus a conversion can not be done between the two. Last Successful Match Key was: " + lastMatch);

             }
 

            MatchCollection mc = reg.Matches(incomingData);

            if (mc.Count > 0)

            {

                foreach (Match m in mc)

                {

                    for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)

                    {

                        Console.WriteLine("[" + reg.GetGroupNames()[gIdx] + "] = " + m.Groups[gIdx].Value);

                    }

                }

            }

            else

            {

                Console.WriteLine("Pattern did not match: ({0}).", templateItem.templateBody);

            }

Open in new window

0
 
LVL 27

Expert Comment

by:ddrudik
Comment Utility
[0] is the default capture group and is equal to the entire match, if you would have used unnamed capture groups such as "test(test)test..." then the unnamed capture groups would be numbered [1] [2] [3] etc.
0
 

Author Comment

by:djcheeky
Comment Utility
Hahaha - WOW that was a fast reply Hahaha!
Cool - another lesson learnt! :)

Thanks again - time to continue :)
points awarded!!
0
 
LVL 27

Expert Comment

by:ddrudik
Comment Utility
Thanks for the question and the points.
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

We all know that functional code is the leg that any good program stands on when it comes right down to it, however, if your program lacks a good user interface your product may not have the appeal needed to keep your customers happy. This issue can…
Introduction Hi all and welcome to my first article on Experts Exchange. A while ago, someone asked me if i could do some tutorials on object oriented programming. I decided to do them on C#. Now you may ask me, why's that? Well, one of the re…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

763 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

8 Experts available now in Live!

Get 1:1 Help Now