djcheeky
asked on
RegEx Match issue - match not happening
Hi
I have a function that build a RegEx and applies it against some incoming data.
The incoming data:
incomingData = @"UNH+00000154600001+CONTR L:D:3:UN+C ONTRL'
UCI+00000000000443+ETRADEX +SARS+7'
UCM+00000044300001+CUSDEC: D:96B:UN:Z ZZ01+7'
UNT+4+00000154600004'
UNT+5+00000154600005'
UNT+6+00000154600006'
UNZ+1+000001546'";
The RegEx created:
pattern = "^U\s*N\s*H\s*\+(?<unhCode 1>.*)\+(?< unhMessage Type>.*):( ?<unhShort Code>.
*):(?<unhVersion>.*):(?<un hControlBo dy>.*)\+(? <unhType>. *)'$"
Now when I try go:
reg = new Regex(pattern);
if (!reg.IsMatch(incomingData ))
{
throw new Exception("The Incoming Data differs from the pattern ");
}
Now I understand why this is happening - because the match is trying to compare the pattern to the WHOLE incoming String as far as I understand, and the pattern only matches the first part of the incoming data.
What I want it to do is find the first occurance of the pattern, in the incoming data, so that if I go:
groups = reg.Match(incomingData).Gr oups;
Then I don't get an error, but I get the group value matched by the Regular Expression for the UNH line.
Thanks:
I have a function that build a RegEx and applies it against some incoming data.
The incoming data:
incomingData = @"UNH+00000154600001+CONTR
UCI+00000000000443+ETRADEX
UCM+00000044300001+CUSDEC:
UNT+4+00000154600004'
UNT+5+00000154600005'
UNT+6+00000154600006'
UNZ+1+000001546'";
The RegEx created:
pattern = "^U\s*N\s*H\s*\+(?<unhCode
*):(?<unhVersion>.*):(?<un
Now when I try go:
reg = new Regex(pattern);
if (!reg.IsMatch(incomingData
{
throw new Exception("The Incoming Data differs from the pattern ");
}
Now I understand why this is happening - because the match is trying to compare the pattern to the WHOLE incoming String as far as I understand, and the pattern only matches the first part of the incoming data.
What I want it to do is find the first occurance of the pattern, in the incoming data, so that if I go:
groups = reg.Match(incomingData).Gr
Then I don't get an error, but I get the group value matched by the Regular Expression for the UNH line.
Thanks:
ASKER
Hi - sorry for that - I was hoping to avoid having to add all this code. :)
(Value in [ ] are variable names in the code)
Basically, a Regex [pattern] is created from the [templateItem]. Then what I want to do is match that [pattern] against the [incomingData] so that in the current example I gave below, it will match the data "UNH+00000154600001+CONTRL :D:3:UN+CO NTRL'" in the [incomingData] .
Curently the code throws the exception because it is comparing my [pattern] against the [incomingData] and not finding a match. I would like to know how to modify this code to get the [pattern] to match the first occurance of the pattern in the [incomingData].
Thanks
(Value in [ ] are variable names in the code)
Basically, a Regex [pattern] is created from the [templateItem]. Then what I want to do is match that [pattern] against the [incomingData] so that in the current example I gave below, it will match the data "UNH+00000154600001+CONTRL
Curently the code throws the exception because it is comparing my [pattern] against the [incomingData] and not finding a match. I would like to know how to modify this code to get the [pattern] to match the first occurance of the pattern in the [incomingData].
Thanks
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace MessageTranslationExample
{
class Program
{
static void Main(string[] args)
{
GetIncomingValues();
Console.ReadKey();
}
static void GetIncomingValues()
{
//Init
String incomingData = @"UNH+00000154600001+CONTRL:D:3:UN+CONTRL'
UCI+00000000000443+ETRADEX+SARS+7'
UCM+00000044300001+CUSDEC:D:96B:UN:ZZZ01+7'
UNT+4+00000154600004'
UNT+5+00000154600005'
UNT+6+00000154600006'
UNZ+1+000001546'";
String templateItem = @"UNH+{unhCode1}+{unhMessageType}:{unhShortCode}:{unhVersion}:{unhControlBody}+{unhType}'";
Regex reg;
String pattern;
String keyValue, keyName;
List<String> keys;
int index;
Dictionary<String, String> incomingDataDictionary;
String lastMatch = "";
reg = new Regex("(?<text>[^{}]*)({(?<key>[^}]+)})?"); // .NET Regular Expression matching KeyTemplate Grammar
keys = new List<String>();
incomingDataDictionary = new Dictionary<String, String>();
// Pattern Start Character
pattern = "^";
//For each RegEx Template Item match in the Template
foreach (Match match in reg.Matches(templateItem))
{
// Version 1
//pattern += match.Groups["text"].Value.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)");
// Version 2 : Handle whitespaces in the ValueTemplate
keyValue = "";
foreach (char c in match.Groups["text"].Value)
{
if (c != ' ' && c != '\t')
keyValue += c + "$$SPACE$$";
else
keyValue += c;
}
keyValue = keyValue.Replace("$$SPACE$$ ", "$$SPACE$$");
// Version 3 : Remove the last white space matcher of the pattern
if (keyValue.EndsWith("$$SPACE$$"))
keyValue = keyValue.Substring(0, keyValue.Length - "$$SPACE$$".Length);
pattern += keyValue.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)").Replace("$$SPACE$$", "\\s*");
// End of version 2
if (match.Groups["key"].Value != "")
{
keyName = match.Groups["key"].Value.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)");
// Version 4 : Find a valid key name for the result dictionary to avoid duplicates when repeating the template
if (keys.Contains(keyName))
{
index = 1;
while (keys.Contains(keyName + "_" + index.ToString())) index++;
keyName = keyName + "_" + index.ToString();
}
keys.Add(keyName);
//Version 5 (Sample 6): A value may be omitted so make its matcher optionnal
pattern += string.Format("(?<{0}>.*)", keyName);
lastMatch = keyName;
}
}
pattern += "$";
Console.WriteLine("Pattern: " + pattern + "\n");
//Value Extractor : Uses the generated Regex to extract values from the input
reg = new Regex(pattern);
if (!reg.IsMatch(incomingData))
{
throw new Exception("The templateItem could not be matched in the incoming String");
}
}
}
}
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Hi ddrudik.
Ok, adding (?=\r\n|$) to the end of my RegEx and adding the RegexOptions.Multiline property did the trick.
But I had to change the code I originally had:
FROM::
groups = reg.Match(input).Groups;
foreach (string key in keys)
{
keyValue = groups[key.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)")].Value;
dct.Add(key, keyValue);
}
TO:
MatchCollection mc = reg.Matches(incomingData);
if (mc.Count > 0)
{
foreach (Match m in mc)
{
for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
{
Console.WriteLine("[" + reg.GetGroupNames()[gIdx] + "] = " + m.Groups[gIdx].Value);
}
}
}
THANK YOU YET AGAIN!!!!!
Oh, yes - Just one quick question...
I noticed that when I run this, I get the output:
[0] = UNH+00000154600001+CONTRL: D:3:UN+CON TRL'
[unhCode1] = 00000154600001
[unhMessageType] = CONTRL
[unhShortCode] = D
[unhVersion] = 3
[unhControlBody] = UN
[unhType] = CONTRL
What is that first line with the [0] ???? And why is it there. Is the first element of a match group at index 0 always the full match or something???
Thanks!
Ok, adding (?=\r\n|$) to the end of my RegEx and adding the RegexOptions.Multiline property did the trick.
But I had to change the code I originally had:
FROM::
groups = reg.Match(input).Groups;
foreach (string key in keys)
{
keyValue = groups[key.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)")].Value;
dct.Add(key, keyValue);
}
TO:
MatchCollection mc = reg.Matches(incomingData);
if (mc.Count > 0)
{
foreach (Match m in mc)
{
for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
{
Console.WriteLine("[" + reg.GetGroupNames()[gIdx] + "] = " + m.Groups[gIdx].Value);
}
}
}
THANK YOU YET AGAIN!!!!!
Oh, yes - Just one quick question...
I noticed that when I run this, I get the output:
[0] = UNH+00000154600001+CONTRL:
[unhCode1] = 00000154600001
[unhMessageType] = CONTRL
[unhShortCode] = D
[unhVersion] = 3
[unhControlBody] = UN
[unhType] = CONTRL
What is that first line with the [0] ???? And why is it there. Is the first element of a match group at index 0 always the full match or something???
Thanks!
//... excerpt from above code function ...
pattern += @"(?=\r\n|$)";
Console.WriteLine("Pattern: " + pattern + "\n");
//Value Extractor : Uses the generated Regex to extract values from the input
reg = new Regex(pattern,RegexOptions.Multiline);
if (!reg.IsMatch(incomingData))
{
throw new Exception("The Message IN Message Structure differs from that of the Message IN Template Structure and thus a conversion can not be done between the two. Last Successful Match Key was: " + lastMatch);
}
MatchCollection mc = reg.Matches(incomingData);
if (mc.Count > 0)
{
foreach (Match m in mc)
{
for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)
{
Console.WriteLine("[" + reg.GetGroupNames()[gIdx] + "] = " + m.Groups[gIdx].Value);
}
}
}
else
{
Console.WriteLine("Pattern did not match: ({0}).", templateItem.templateBody);
}
[0] is the default capture group and is equal to the entire match, if you would have used unnamed capture groups such as "test(test)test..." then the unnamed capture groups would be numbered [1] [2] [3] etc.
ASKER
Hahaha - WOW that was a fast reply Hahaha!
Cool - another lesson learnt! :)
Thanks again - time to continue :)
points awarded!!
Cool - another lesson learnt! :)
Thanks again - time to continue :)
points awarded!!
Thanks for the question and the points.
What I want it to do is find the first occurance of the pattern, in the incoming data, so that if I go:
groups = reg.Match(incomingData).Gr
I assume you are abbreviating your code for posting, doesn't seem to be valid.