Solved

RegEx MultiLine issue with repeating data.

Posted on 2008-10-15
26
1,179 Views
Last Modified: 2008-10-23
Hi DDrudik :P and All

Ok, the only way I can illustrate this is by attaching all the code, so I will reference everything by line-number (as displayed in Visual C# / text editor) so that you know what I'm referring to.
Apologies for the inconvenience.

GIVEN::

Line 44 - 49:
* TemplateItems are added
* Note especially line 48 in this example for "UNT..."

Line 55 - 63
* Incoming data is added
* Note especially FIRST occurance of UNT data in lines 58 and 59
* Note again SECOND occurance of UNT data in lines 61 - 63

Line 178
* my pattern is finalised

Line 183
* A Regular expression is created using the pattern
* The Multiline property is set

RESULT:
If you run the program and look at the output, you will notice that the UNT items fetched are as follows:

[untTotalCode] [A]
[untTotal] [000001546000AA]
[untTotalCode_1] [B]
[untTotal_1] [000001546000BB]
[untTotalCode_2] [4]
[untTotal_2] [00000154600004]
[untTotalCode_3] [5]
[untTotal_3] [00000154600005]
[untTotalCode_4] [6]
[untTotal_4] [00000154600006]

PROBLEM:

I only need the Regex to match all instances of the UNT match until it encounters something other than the matched pattern. i.e. The items lin line 58 and 59, not 61 - 63. Now I understand why it is doing this because it is getting all matches that it finds for the template in line 48, but how do I change this so that it only gets the matched items while the pattern doesn't change?

In other words, for my template, match all sequential matches, but when something OTHER than the pattern is encountered, stop matching??

Thanks








using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.Text.RegularExpressions;
 

namespace MessageTranslationExample

{

    public class TemplateItem

    {

        public Boolean isTemplateItemRequired { get; set; }

        public Boolean areExactFieldsRequired { get; set; }

        public Boolean isRepeatable { get; set; }

        public String templateBody { get; set; }

        public TemplateItem(Boolean IsTemplateItemRequired, Boolean AreExactFieldsRequired, Boolean IsRepeatable, String TemplateBody)

        {

            isTemplateItemRequired = IsTemplateItemRequired;

            areExactFieldsRequired = AreExactFieldsRequired;

            isRepeatable = IsRepeatable;

            templateBody = TemplateBody;

        }

    }

    

    class MessageTranslation

    {

        static String incomingData;

        static List<TemplateItem> templateItems;

        static Dictionary<String, String> incomingDictionary;

        static Boolean bDeleteMatchedDataFromIncoming;

        static String myPattern;

       

        static void Main(string[] args)

        {

            SetTemplateItems();

            SetIncomingData();

            TranslateMessage();

            DisplayDictionary(incomingDictionary);    

            Console.ReadKey();

        }

                

        //Example Data Structure Template to use

        static void SetTemplateItems()

        {

            templateItems = new List<TemplateItem>();

            templateItems.Add(new TemplateItem(true, true, false, "UNH+{unhCode1}+{unhMessageType}:{unhShortCode}:{unhVersion}:{unhControlBody}+{unhType}'"));

            templateItems.Add(new TemplateItem(true, true, false, "UCI+{uciNumber}+{uciCustomer}+{uciOrganisation}+{uciVersion}'"));

            templateItems.Add(new TemplateItem(true, true, false, "UCM+{ucmNumber}+{ucmType}:{ucmShortCode}:{ucmAbbrev}:{ucmOrganisation}:{ucmIndex}+{ucmIndexCode}'"));

            templateItems.Add(new TemplateItem(true, true, true, "UNT+{untTotalCode}+{untTotal}'"));

            templateItems.Add(new TemplateItem(true, true, false, "UNZ+{unzCode}+{unzId}'"));

        }
 

        //Example Incoming Data to use

        static void SetIncomingData()

        {

            incomingData = @"UNH+00000154600001+CONTRL:D:3:UN+CONTRL'

UCI+00000000000443+ETRADEX+SARS+7'

UCM+00000044300001+CUSDEC:D:96B:UN:ZZZ01+7'

UNT+A+000001546000AA'

UNT+B+000001546000BB'

UNZ+1+000001546'

UNT+4+00000154600004'

UNT+5+00000154600005'

UNT+6+00000154600006'";

        }
 

        //Translate the Incoming Message

        /* DEV HINTS:

        //  * If start index != 0 (thus not located in first item of incoming string) 

                for a template match and the templateItem isRequired, error, else if

                not required, move on to next templateItem without deleting data from incoming

            *  If repeatable, loop through the incoming data until a match is not found for the 

               templateItem.

         */

        /* DEV ISSUES:

       // 

        */ 

        static void TranslateMessage()

        {

            incomingDictionary = new Dictionary<string, string>();

            

            //Analyze each Template Item against the Incoming Data

            foreach (TemplateItem templateItem in templateItems)

            {

                //Init

                bDeleteMatchedDataFromIncoming = true;

                myPattern = "";

                               

                //Generate RegEx Pattern

                SetIncomingValues(templateItem, incomingData);

                                

                if (templateItem.isRepeatable)

                {

                    //Create Regex to match all back to back occurances of the template pattern

                }

                else

                {

                    //Process against incoming data

                    if ("RegEx Match is found" == "RegEx Match found")

                    {
 

                    }

                    else

                    {

                        if (templateItem.isTemplateItemRequired)

                        {

                            

                        }

                    }

                                      

                }

            }

            return;

        }
 

        //Generate Regular Expression Pattern

        static void SetIncomingValues(TemplateItem templateItem, String incomingData)

        {

            //Init

            Regex reg;

            String pattern;

            String keyValue, keyName;

            String processedKeyValue, processedKeyName;

            List<String> keys, processedKeys;

            int index;

            GroupCollection groups;

            String lastMatch = "";

            reg = new Regex("(?<text>[^{}]*)({(?<key>[^}]+)})?"); // .NET Regular Expression matching KeyTemplate Grammar

            keys = new List<String>();

            processedKeys = new List<String>();

            

            // Pattern Start Character

            pattern = "^"; 

            

            //For each RegEx Template Item match in the Template

            foreach (Match match in reg.Matches(templateItem.templateBody))

            {

                //Handle whitespaces in the ValueTemplate

                keyValue = "";

                foreach (char c in match.Groups["text"].Value)

                {

                    if (c != ' ' && c != '\t')

                        keyValue += c + "$$SPACE$$";

                    else

                        keyValue += c;

                }

                keyValue = keyValue.Replace("$$SPACE$$ ", "$$SPACE$$");
 

                //Remove the last white space matcher of the pattern

                if (keyValue.EndsWith("$$SPACE$$"))

                {

                    keyValue = keyValue.Substring(0, keyValue.Length - "$$SPACE$$".Length);

                }

                pattern += keyValue.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)").Replace("$$SPACE$$", "\\s*");

                

                //Extract the key from the match

                if (match.Groups["key"].Value != "")

                {

                    keyName = match.Groups["key"].Value.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)");
 

                    //Find a valid key name for the result dictionary to avoid duplicates when repeating the template

                    if (keys.Contains(keyName))

                    {

                        index = 1;

                        while (keys.Contains(keyName + "_" + index.ToString())) index++;

                        keyName = keyName + "_" + index.ToString();

                    }

                     keys.Add(keyName);

                   

                    //A value may be omitted so make its matcher optionnal

                    pattern += string.Format("(?<{0}>.*)", keyName);

                    

                    //Set last match for error messages

                    lastMatch = keyName; 

                }

            }

           

            //Allows pattern to look at new line for the same pattern

            pattern += @"(?=\r\n|$)";

           

            Console.WriteLine("Pattern: " + pattern + "\n");

                       

            //Value Extractor : Uses the generated Regex to extract values from the input

            reg = new Regex(pattern, RegexOptions.Multiline); //Allows pattern matching to span one or multiple lines

            

            //Match Validation

            if (!reg.IsMatch(incomingData))

            {

                //Only error if templateItem is required, othewrwise match not required

                if (templateItem.isTemplateItemRequired)

                {

                    throw new Exception("The Message Data Structure differs from that of the Message Template Structure and thus a conversion can not be done between the two. Last Successful Match Key was: " + lastMatch);

                }

             }
 

            //Build Key Values based on matches

            MatchCollection mc = reg.Matches(incomingData);

            if (mc.Count > 0)

            {

                foreach (Match m in mc)

                {

                    for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)

                    {

                        //Skip item at index 0 as it contains the full match

                        if (gIdx > 0)

                        {

                            processedKeyName = reg.GetGroupNames()[gIdx];

                            processedKeyValue = m.Groups[gIdx].Value;

                            

                            //Find a valid key name for the result dictionary to avoid duplicates when repeating the template

                            if (processedKeys.Contains(processedKeyName))

                            {

                                index = 1;

                                while (processedKeys.Contains(processedKeyName + "_" + index.ToString())) index++;

                                processedKeyName = processedKeyName + "_" + index.ToString();

                            }

                            processedKeys.Add(processedKeyName);

                           

                            //Add to Dictionary

                            incomingDictionary[processedKeyName] = processedKeyValue;

                        }

                    }

                    // Only match first match unless templateItem is repeatable

                    if (!templateItem.isRepeatable)

                    {

                        break;

                    }

                }

            }

            else

            {

                throw new Exception("Pattern did not match: ( + " + templateItem.templateBody + ").");

            }
 

            return;

        }
 

        static void DisplayDictionary(Dictionary<String, String> dictionary)

        {

            Console.WriteLine("---- DICTIONARY DATA ----\n");

            foreach ( String key in dictionary.Keys)

            {

                Console.WriteLine("[" + key + "] [" + dictionary[key] + "]\r");

            }

            Console.WriteLine("\n\n");

        }

    }

}

Open in new window

0
Comment
Question by:djcheeky
  • 13
  • 13
26 Comments
 
LVL 27

Expert Comment

by:ddrudik
ID: 22724942
Is this something unique to UNT?

You could try to match the source on:
Regex re = new Regex(@".+(?=^(?!UNT))",RegexOptions.Multiline | RegexOptions.Singleline);

Which would result in:
    [0] => UNH+00000154600001+CONTRL:D:3:UN+CONTRL'
UCI+00000000000443+ETRADEX+SARS+7'
UCM+00000044300001+CUSDEC:D:96B:UN:ZZZ01+7'
UNT+A+000001546000AA'
UNT+B+000001546000BB'

Then applying the UNT regex to that result will find only the matches you seek.
0
 

Author Comment

by:djcheeky
ID: 22728926
Basically, with the RegEx that is generated specifically for that templateItem below:
^U\s*N\s*T\s*\+(?<untTotalCode>.*)\+(?<untTotal>.*)'(?=\r\n|$)

The Regular expression goes through all the data:
(A)
 incomingData = @"UNH+00000154600001+CONTRL:D:3:UN+CONTRL'
UCI+00000000000443+ETRADEX+SARS+7'
UCM+00000044300001+CUSDEC:D:96B:UN:ZZZ01+7'
UNT+A+000001546000AA'
UNT+B+000001546000BB'
UNZ+1+000001546'
UNT+4+00000154600004'
UNT+5+00000154600005'
UNT+6+00000154600006'";

(B)
And matches these lines:
UNT+A+000001546000AA'
UNT+B+000001546000BB'
UNT+4+00000154600004'
UNT+5+00000154600005'
UNT+6+00000154600006'

But if you look at (A), the line:
UNZ+1+000001546'

this actually breaks the pattern between:
UNT+A+000001546000AA'
UNT+B+000001546000BB'

AND

UNT+4+00000154600004'
UNT+5+00000154600005'
UNT+6+00000154600006'

But the regular expression matches ALL UNT matches - I just want it to match until it encounters something different, in other words, just:
UNT+A+000001546000AA'
UNT+B+000001546000BB'

So it is matching all back-to-back similar data matches UNTIL something else is encountered, and not EVERY single match for that pattern in the whole string.

Thanks


 
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 22730065
For the repeating pattern you will need to use a process as I described in my comment above, once you match the entire string on every line until the last UNT is consumed in the first group of UNT's, then you can perform your match of all UNT's from that new string.  There's no other regex way of doing what you describe once a pattern has been applied to a string and matches are returned.
0
 

Author Comment

by:djcheeky
ID: 22730459
I'm not too sure how to do that, because your suggestion seems to hard code "UNT" into the Regex, but it could be any element?
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 22730837
If that's the case then you will need to loop through the regex pattern line-by-line, creating a matchcollection with each, and if the count of the matchcollection is more than 1 then you need to use the first three characters (UNT UMT etc.) as a variable in regex pattern as shown above to isolate the part of the string you want to the matches from, then create a matchcollection for those matches.  It's going to be a bit of looping etc. but if that's what you need then that's how I see it can be done.  If you need an example of this from me unfortunately I cannot provide one until later today.
0
 

Author Comment

by:djcheeky
ID: 22734256
Hi drrudik - could you please provide an example. I have tried going through your posts but I dont seem to follow? Thanks
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 22734268
In about 2 hours I will be able to.  Thanks.
0
 

Author Comment

by:djcheeky
ID: 22734308
No prob mate! Thanks a lot!
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 22736490
The solution is not yet finalized, this may have to wait until next day.
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 22736863
See if that fits that you need (it works for your requirement but I'm not sure how that fits with the solution option where the entire regex pattern can match the entire string, if that's still something you were checking for).

The block I added starts with the line:
string subPattern...

Maybe that will give you an idea how to use.
using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.Text.RegularExpressions;

 

namespace MessageTranslationExample

{

    public class TemplateItem

    {

        public Boolean isTemplateItemRequired { get; set; }

        public Boolean areExactFieldsRequired { get; set; }

        public Boolean isRepeatable { get; set; }

        public String templateBody { get; set; }

        public TemplateItem(Boolean IsTemplateItemRequired, Boolean AreExactFieldsRequired, Boolean IsRepeatable, String TemplateBody)

        {

            isTemplateItemRequired = IsTemplateItemRequired;

            areExactFieldsRequired = AreExactFieldsRequired;

            isRepeatable = IsRepeatable;

            templateBody = TemplateBody;

        }

    }

    

    class MessageTranslation

    {

        static String incomingData;

        static List<TemplateItem> templateItems;

        static Dictionary<String, String> incomingDictionary;

        static Boolean bDeleteMatchedDataFromIncoming;

        static String myPattern;

       

        static void Main(string[] args)

        {

            SetTemplateItems();

            SetIncomingData();

            TranslateMessage();

            DisplayDictionary(incomingDictionary);    

            Console.ReadKey();

        }

                

        //Example Data Structure Template to use

        static void SetTemplateItems()

        {

            templateItems = new List<TemplateItem>();

            templateItems.Add(new TemplateItem(true, true, false, "UNH+{unhCode1}+{unhMessageType}:{unhShortCode}:{unhVersion}:{unhControlBody}+{unhType}'"));

            templateItems.Add(new TemplateItem(true, true, false, "UCI+{uciNumber}+{uciCustomer}+{uciOrganisation}+{uciVersion}'"));

            templateItems.Add(new TemplateItem(true, true, false, "UCM+{ucmNumber}+{ucmType}:{ucmShortCode}:{ucmAbbrev}:{ucmOrganisation}:{ucmIndex}+{ucmIndexCode}'"));

            templateItems.Add(new TemplateItem(true, true, true, "UNT+{untTotalCode}+{untTotal}'"));

            templateItems.Add(new TemplateItem(true, true, false, "UNZ+{unzCode}+{unzId}'"));

        }

 

        //Example Incoming Data to use

        static void SetIncomingData()

        {

            incomingData = @"UNH+00000154600001+CONTRL:D:3:UN+CONTRL'

UCI+00000000000443+ETRADEX+SARS+7'

UCM+00000044300001+CUSDEC:D:96B:UN:ZZZ01+7'

UNT+A+000001546000AA'

UNT+B+000001546000BB'

UNZ+1+000001546'

UNT+4+00000154600004'

UNT+5+00000154600005'

UNT+6+00000154600006'";

        }

 

        //Translate the Incoming Message

        /* DEV HINTS:

        //  * If start index != 0 (thus not located in first item of incoming string) 

                for a template match and the templateItem isRequired, error, else if

                not required, move on to next templateItem without deleting data from incoming

            *  If repeatable, loop through the incoming data until a match is not found for the 

               templateItem.

         */

        /* DEV ISSUES:

       // 

        */ 

        static void TranslateMessage()

        {

            incomingDictionary = new Dictionary<string, string>();

            

            //Analyze each Template Item against the Incoming Data

            foreach (TemplateItem templateItem in templateItems)

            {

                //Init

                bDeleteMatchedDataFromIncoming = true;

                myPattern = "";

                               

                //Generate RegEx Pattern

                SetIncomingValues(templateItem, incomingData);

                                

                if (templateItem.isRepeatable)

                {

                    //Create Regex to match all back to back occurances of the template pattern

                }

                else

                {

                    //Process against incoming data

                    if ("RegEx Match is found" == "RegEx Match found")

                    {

 

                    }

                    else

                    {

                        if (templateItem.isTemplateItemRequired)

                        {

                            

                        }

                    }

                                      

                }

            }

            return;

        }

 

        //Generate Regular Expression Pattern

        static void SetIncomingValues(TemplateItem templateItem, String incomingData)

        {

            //Init

            Regex reg;

            String pattern;

            String keyValue, keyName;

            String processedKeyValue, processedKeyName;

            List<String> keys, processedKeys;

            int index;

            GroupCollection groups;

            String lastMatch = "";

            reg = new Regex("(?<text>[^{}]*)({(?<key>[^}]+)})?"); // .NET Regular Expression matching KeyTemplate Grammar

            keys = new List<String>();

            processedKeys = new List<String>();

            

            // Pattern Start Character

            pattern = "^"; 

            

            //For each RegEx Template Item match in the Template

            foreach (Match match in reg.Matches(templateItem.templateBody))

            {

                //Handle whitespaces in the ValueTemplate

                keyValue = "";

                foreach (char c in match.Groups["text"].Value)

                {

                    if (c != ' ' && c != '\t')

                        keyValue += c + "$$SPACE$$";

                    else

                        keyValue += c;

                }

                keyValue = keyValue.Replace("$$SPACE$$ ", "$$SPACE$$");

 

                //Remove the last white space matcher of the pattern

                if (keyValue.EndsWith("$$SPACE$$"))

                {

                    keyValue = keyValue.Substring(0, keyValue.Length - "$$SPACE$$".Length);

                }

                pattern += keyValue.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)").Replace("$$SPACE$$", "\\s*");

                

                //Extract the key from the match

                if (match.Groups["key"].Value != "")

                {

                    keyName = match.Groups["key"].Value.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)");

 

                    //Find a valid key name for the result dictionary to avoid duplicates when repeating the template

                    if (keys.Contains(keyName))

                    {

                        index = 1;

                        while (keys.Contains(keyName + "_" + index.ToString())) index++;

                        keyName = keyName + "_" + index.ToString();

                    }

                     keys.Add(keyName);

                   

                    //A value may be omitted so make its matcher optionnal

                    pattern += string.Format("(?<{0}>.*)", keyName);

                    

                    //Set last match for error messages

                    lastMatch = keyName; 

                }

            }

           

            //Allows pattern to look at new line for the same pattern

            pattern += @"(?=\r\n|$)";

            Console.WriteLine("Pattern: " + pattern + "\n");
 

            string subPattern = pattern.Substring(0, pattern.IndexOf(@"+") + 1);

            Console.WriteLine("subPattern: " + subPattern + "\n");

            Regex reSub = new Regex(subPattern + @".*?(?=\r\n(?!" + subPattern + @"))", RegexOptions.Multiline | RegexOptions.Singleline);

            Match mm = reSub.Match(incomingData);

            string newData = mm.Groups[0].Value;

       

            //Value Extractor : Uses the generated Regex to extract values from the input

            reg = new Regex(pattern, RegexOptions.Multiline); //Allows pattern matching to span one or multiple lines

            

            //Match Validation

            if (!reg.IsMatch(newData))

            {

                //Only error if templateItem is required, othewrwise match not required

                if (templateItem.isTemplateItemRequired)

                {

                    throw new Exception("The Message Data Structure differs from that of the Message Template Structure and thus a conversion can not be done between the two. Last Successful Match Key was: " + lastMatch);

                }

             }

 

            //Build Key Values based on matches

            MatchCollection mc = reg.Matches(newData);

            if (mc.Count > 0)

            {

                foreach (Match m in mc)

                {

                    for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)

                    {

                        //Skip item at index 0 as it contains the full match

                        if (gIdx > 0)

                        {

                            processedKeyName = reg.GetGroupNames()[gIdx];

                            processedKeyValue = m.Groups[gIdx].Value;

                            

                            //Find a valid key name for the result dictionary to avoid duplicates when repeating the template

                            if (processedKeys.Contains(processedKeyName))

                            {

                                index = 1;

                                while (processedKeys.Contains(processedKeyName + "_" + index.ToString())) index++;

                                processedKeyName = processedKeyName + "_" + index.ToString();

                            }

                            processedKeys.Add(processedKeyName);

                           

                            //Add to Dictionary

                            incomingDictionary[processedKeyName] = processedKeyValue;

                        }

                    }

                    // Only match first match unless templateItem is repeatable

                    if (!templateItem.isRepeatable)

                    {

                        break;

                    }

                }

            }

            else

            {

                throw new Exception("Pattern did not match: ( + " + templateItem.templateBody + ").");

            }

 

            return;

        }

 

        static void DisplayDictionary(Dictionary<String, String> dictionary)

        {

            Console.WriteLine("---- DICTIONARY DATA ----\n");

            foreach ( String key in dictionary.Keys)

            {

                Console.WriteLine("[" + key + "] [" + dictionary[key] + "]\r");

            }

            Console.WriteLine("\n\n");

        }

    }

}

Open in new window

0
 

Author Comment

by:djcheeky
ID: 22738587
Hi ddrudik

Will get to this one over the weekend and get back to you - ta!

0
 

Author Comment

by:djcheeky
ID: 22739201
Hi ddrudik

Ok, so I ran the code and it does do what I want, but only for that particular template / incoming string.
This functionality actually caters for any type of message type, not just that EDIFACT example I gave, for exampl, if you take the code snippet below, you will see that there is now XML data, which doesn't contain that + sign used in the previous example.

Thanks
 //Example Data Structure Template to use

        static void SetTemplateItems()

        {

            templateItems = new List<TemplateItem>();

           templateItems.Add(new TemplateItem(true, true, true, "<UNT untTotalCode={untTotalCode} untTotal={untTotal} />"));

        }
 

        //Example Incoming Data to use

        static void SetIncomingData()

        {

          incomingData = @"<UNT untTotalCode={25} untTotal={hello} />";

        }

Open in new window

0
 
LVL 27

Expert Comment

by:ddrudik
ID: 22744205
You would need to decide what makes a repeating rows then, I was using the first part of the regex pattern to do that, I just happened to stop at the first index of "+", I suppose if you change the input source format then you would need change the pattern.  Will the patterns always be three letters with \s* after each of the three letters?
0
How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

 

Author Comment

by:djcheeky
ID: 22755623
Hi ddrudik.

The pattern may not always start with three characters and \s after the letters.
It will however be:

* Start with a alphanumeric character or special character
* Contain text
* end with a space or special character

For example (These are possible starting characters before the ... ):

<Unbheader ...  // Starts with a '<' and ends with a ' '
UnbHeader ...  // Starts with a 'U' and ends with a ' '
UnbHeader+    // Starts with a 'U' and ends with a '+'
<UnbHeader+  // Starts with a '<' and ends with a '+'

So it can be any one of tyhose four combinations - note that the '<' and '+' characters used in the example could be any special characters. But the similarity is that it will always start with an alphanumeric character or special character and end with a space (or \s) or special character.

Thanks
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 22756483
In order for a regex solution to be identified I would need to know what specifically you want to consider a "special character" other than " " and "+".
0
 

Author Comment

by:djcheeky
ID: 22756571
Sure, it could be:

+
-
<
>
{
}
=
`
'
/
\
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 22757862
You might have UNT{A+000001546000AA' or UNT<A+000001546000AA' ?

The use of < or { as special characters will make it problematic to parse properly since {} is used throughout the pattern and < could be used at the front of the pattern.
0
 

Author Comment

by:djcheeky
ID: 22757953
No, but I could have:

<UNT>
<name>Blah</name>
</UNT>

OR

<person>
<name>{name}</name>
</person>

OR

UNT+Blah:AnotherBlah-Nothing'

OR

person_{myName}+{anything}*{nothing}'


So I guess the main characters are >, +, - and perhaps a few other, but if I see the code for those few I should be able to modify it for any others that arise :)

Thanks
0
 
LVL 27

Accepted Solution

by:
ddrudik earned 500 total points
ID: 22758089
Here's an inclusion of the subpattern I was thinking of, if this fails with your other data please show an extended example of that repeating data.
using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.Text.RegularExpressions;
 

namespace MessageTranslationExample

{

    public class TemplateItem

    {

        public Boolean isTemplateItemRequired { get; set; }

        public Boolean areExactFieldsRequired { get; set; }

        public Boolean isRepeatable { get; set; }

        public String templateBody { get; set; }

        public TemplateItem(Boolean IsTemplateItemRequired, Boolean AreExactFieldsRequired, Boolean IsRepeatable, String TemplateBody)

        {

            isTemplateItemRequired = IsTemplateItemRequired;

            areExactFieldsRequired = AreExactFieldsRequired;

            isRepeatable = IsRepeatable;

            templateBody = TemplateBody;

        }

    }
 

    class MessageTranslation

    {

        static String incomingData;

        static List<TemplateItem> templateItems;

        static Dictionary<String, String> incomingDictionary;

        static Boolean bDeleteMatchedDataFromIncoming;

        static String myPattern;
 

        static void Main(string[] args)

        {

            SetTemplateItems();

            SetIncomingData();

            TranslateMessage();

            DisplayDictionary(incomingDictionary);

            Console.ReadKey();

        }
 

        //Example Data Structure Template to use

        static void SetTemplateItems()

        {

            templateItems = new List<TemplateItem>();

            templateItems.Add(new TemplateItem(true, true, false, "UNH+{unhCode1}+{unhMessageType}:{unhShortCode}:{unhVersion}:{unhControlBody}+{unhType}'"));

            templateItems.Add(new TemplateItem(true, true, false, "UCI+{uciNumber}+{uciCustomer}+{uciOrganisation}+{uciVersion}'"));

            templateItems.Add(new TemplateItem(true, true, false, "UCM+{ucmNumber}+{ucmType}:{ucmShortCode}:{ucmAbbrev}:{ucmOrganisation}:{ucmIndex}+{ucmIndexCode}'"));

            templateItems.Add(new TemplateItem(true, true, true, "UNT+{untTotalCode}+{untTotal}'"));

            templateItems.Add(new TemplateItem(true, true, false, "UNZ+{unzCode}+{unzId}'"));

        }
 

        //Example Incoming Data to use

        static void SetIncomingData()

        {

            incomingData = @"UNH+00000154600001+CONTRL:D:3:UN+CONTRL'

UCI+00000000000443+ETRADEX+SARS+7'

UCM+00000044300001+CUSDEC:D:96B:UN:ZZZ01+7'

UNT+A+000001546000AA'

UNT+B+000001546000BB'

UNZ+1+000001546'

UNT+4+00000154600004'

UNT+5+00000154600005'

UNT+6+00000154600006'";

        }
 

        //Translate the Incoming Message

        /* DEV HINTS:

        //  * If start index != 0 (thus not located in first item of incoming string) 

                for a template match and the templateItem isRequired, error, else if

                not required, move on to next templateItem without deleting data from incoming

            *  If repeatable, loop through the incoming data until a match is not found for the 

               templateItem.

         */

        /* DEV ISSUES:

       // 

        */

        static void TranslateMessage()

        {

            incomingDictionary = new Dictionary<string, string>();
 

            //Analyze each Template Item against the Incoming Data

            foreach (TemplateItem templateItem in templateItems)

            {

                //Init

                bDeleteMatchedDataFromIncoming = true;

                myPattern = "";
 

                //Generate RegEx Pattern

                SetIncomingValues(templateItem, incomingData);
 

                if (templateItem.isRepeatable)

                {

                    //Create Regex to match all back to back occurances of the template pattern

                }

                else

                {

                    //Process against incoming data

                    if ("RegEx Match is found" == "RegEx Match found")

                    {
 

                    }

                    else

                    {

                        if (templateItem.isTemplateItemRequired)

                        {
 

                        }

                    }
 

                }

            }

            return;

        }
 

        //Generate Regular Expression Pattern

        static void SetIncomingValues(TemplateItem templateItem, String incomingData)

        {

            //Init

            Regex reg;

            String pattern;

            String keyValue, keyName;

            String processedKeyValue, processedKeyName;

            List<String> keys, processedKeys;

            int index;

            GroupCollection groups;

            String lastMatch = "";

            reg = new Regex("(?<text>[^{}]*)({(?<key>[^}]+)})?"); // .NET Regular Expression matching KeyTemplate Grammar

            keys = new List<String>();

            processedKeys = new List<String>();
 

            // Pattern Start Character

            pattern = "^";
 

            //For each RegEx Template Item match in the Template

            foreach (Match match in reg.Matches(templateItem.templateBody))

            {

                //Handle whitespaces in the ValueTemplate

                keyValue = "";

                foreach (char c in match.Groups["text"].Value)

                {

                    if (c != ' ' && c != '\t')

                        keyValue += c + "$$SPACE$$";

                    else

                        keyValue += c;

                }

                keyValue = keyValue.Replace("$$SPACE$$ ", "$$SPACE$$");
 

                //Remove the last white space matcher of the pattern

                if (keyValue.EndsWith("$$SPACE$$"))

                {

                    keyValue = keyValue.Substring(0, keyValue.Length - "$$SPACE$$".Length);

                }

                pattern += keyValue.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)").Replace("$$SPACE$$", "\\s*");
 

                //Extract the key from the match

                if (match.Groups["key"].Value != "")

                {

                    keyName = match.Groups["key"].Value.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)");
 

                    //Find a valid key name for the result dictionary to avoid duplicates when repeating the template

                    if (keys.Contains(keyName))

                    {

                        index = 1;

                        while (keys.Contains(keyName + "_" + index.ToString())) index++;

                        keyName = keyName + "_" + index.ToString();

                    }

                    keys.Add(keyName);
 

                    //A value may be omitted so make its matcher optionnal

                    pattern += string.Format("(?<{0}>.*)", keyName);
 

                    //Set last match for error messages

                    lastMatch = keyName;

                }

            }
 

            //Allows pattern to look at new line for the same pattern

            pattern += @"(?=\r\n|$)";

            Console.WriteLine("Pattern: " + pattern + "\n");
 

            Match mmm = Regex.Match(pattern,@".*?[>+ -]");

            string subPattern = mmm.Groups[0].Value;

            Console.WriteLine("subPattern: " + subPattern + "\n");

            Regex reSub = new Regex(subPattern + @".*?(?=\r\n(?!" + subPattern + @"))", RegexOptions.Multiline | RegexOptions.Singleline);

            Match mm = reSub.Match(incomingData);

            string newData = mm.Groups[0].Value;
 

            //Value Extractor : Uses the generated Regex to extract values from the input

            reg = new Regex(pattern, RegexOptions.Multiline); //Allows pattern matching to span one or multiple lines
 

            //Match Validation

            if (!reg.IsMatch(newData))

            {

                //Only error if templateItem is required, othewrwise match not required

                if (templateItem.isTemplateItemRequired)

                {

                    throw new Exception("The Message Data Structure differs from that of the Message Template Structure and thus a conversion can not be done between the two. Last Successful Match Key was: " + lastMatch);

                }

            }
 

            //Build Key Values based on matches

            MatchCollection mc = reg.Matches(newData);

            if (mc.Count > 0)

            {

                foreach (Match m in mc)

                {

                    for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)

                    {

                        //Skip item at index 0 as it contains the full match

                        if (gIdx > 0)

                        {

                            processedKeyName = reg.GetGroupNames()[gIdx];

                            processedKeyValue = m.Groups[gIdx].Value;
 

                            //Find a valid key name for the result dictionary to avoid duplicates when repeating the template

                            if (processedKeys.Contains(processedKeyName))

                            {

                                index = 1;

                                while (processedKeys.Contains(processedKeyName + "_" + index.ToString())) index++;

                                processedKeyName = processedKeyName + "_" + index.ToString();

                            }

                            processedKeys.Add(processedKeyName);
 

                            //Add to Dictionary

                            incomingDictionary[processedKeyName] = processedKeyValue;

                        }

                    }

                    // Only match first match unless templateItem is repeatable

                    if (!templateItem.isRepeatable)

                    {

                        break;

                    }

                }

            }

            else

            {

                throw new Exception("Pattern did not match: ( + " + templateItem.templateBody + ").");

            }
 

            return;

        }
 

        static void DisplayDictionary(Dictionary<String, String> dictionary)

        {

            Console.WriteLine("---- DICTIONARY DATA ----\n");

            foreach (String key in dictionary.Keys)

            {

                Console.WriteLine("[" + key + "] [" + dictionary[key] + "]\r");

            }

            Console.WriteLine("\n\n");

        }

    }

}

Open in new window

0
 

Author Comment

by:djcheeky
ID: 22765101
Hi ddrudik.

I have taken your solution and implemented it below. But I get a very strange behaviour. If you run the code, you will notice that the program keeps crashing on the last item in the incoming string / template.

In this case it is:
UNP+P+000001546000PP'

but if I remove that line from the incoming string as well as the item from tempalteItems, it still does the same.

It just seems to be crashing on the last record and I don't know why.
Would you like me to post this in a seperate issue?

Thanks.
using System;

using System.Collections.Generic;

using System.Linq;

using System.Text;

using System.Text.RegularExpressions;
 

namespace MessageTranslationExample

{

    public class TemplateItem

    {

        public Boolean isTemplateItemRequired { get; set; }

        public Boolean areExactFieldsRequired { get; set; }

        public Boolean isRepeatable { get; set; }

        public String templateBody { get; set; }

        public TemplateItem(Boolean IsTemplateItemRequired, Boolean AreExactFieldsRequired, Boolean IsRepeatable, String TemplateBody)

        {

            isTemplateItemRequired = IsTemplateItemRequired;

            areExactFieldsRequired = AreExactFieldsRequired;

            isRepeatable = IsRepeatable;

            templateBody = TemplateBody;

        }

    }

    

    class MessageTranslation

    {

        static String incomingData;

        static List<TemplateItem> templateItems;

        static Dictionary<String, String> incomingDictionary;

        static Boolean bDeleteMatchedDataFromIncoming;

      

        static void Main(string[] args)

        {

            SetTemplateItems();

            SetIncomingData();

            TranslateMessage();

            DisplayDictionary(incomingDictionary);    

            Console.ReadKey();

        }

                

        //Example Data Structure Template to use

        static void SetTemplateItems()

        {

            templateItems = new List<TemplateItem>();

            templateItems.Add(new TemplateItem(true, true, false, "UNH+{unhCode1}+{unhMessageType}:{unhShortCode}:{unhVersion}:{unhControlBody}+{unhType}'"));

           templateItems.Add(new TemplateItem(true, true, false, "UCI+{uciNumber}+{uciCustomer}+{uciOrganisation}+{uciVersion}'"));

           templateItems.Add(new TemplateItem(true, true, false, "UCM+{ucmNumber}+{ucmType}:{ucmShortCode}:{ucmAbbrev}:{ucmOrganisation}:{ucmIndex}+{ucmIndexCode}'"));

            templateItems.Add(new TemplateItem(true, true, true, "UNT+{untTotalCode}+{untTotal}'"));

            templateItems.Add(new TemplateItem(true, true, false, "UNZ+{unzCode}+{unzId}'"));

            templateItems.Add(new TemplateItem(true, true, true, "UNT+{untTotalCode}+{untTotal}'"));

            templateItems.Add(new TemplateItem(true, true, true, "UNP+{unpTotalCode}+{unpTotal}'"));

        }
 

        //Example Incoming Data to use

        static void SetIncomingData()

        {

            incomingData = @"UNH+00000154600001+CONTRL:D:3:UN+CONTRL'                 

UCI+00000000000443+ETRADEX+SARS+7'             

UCM+00000044300001+CUSDEC:D:96B:UN:ZZZ01+7'               

UNT+A+000001546000AA' 

UNT+B+000001546000BB'     

UNT+C+000001546000CC'  

UNZ+1+000001546' 

UNT+D+000001546000DD'     

UNT+E+000001546000EE'

UNP+P+000001546000PP'";

        }
 

        //Translate the Incoming Message

        /* DEV HINTS:

        //  * If start index != 0 (thus not located in first item of incoming string) 

                for a template match and the templateItem isRequired, error, else if

                not required, move on to next templateItem without deleting data from incoming

            *  If repeatable, loop through the incoming data until a match is not found for the 

               templateItem.

         */

        /* DEV ISSUES:

       // 

        */ 

        static void TranslateMessage()

        {

            incomingDictionary = new Dictionary<string, string>();

            

            //Analyze each Template Item against the Incoming Data

            foreach (TemplateItem templateItem in templateItems)

            {

                //Init

                bDeleteMatchedDataFromIncoming = true;
 

                Console.WriteLine("BEFORE: \n" + templateItem.templateBody);

                

                //Trim Preceding and Trailing Indentation and Whitespace

                templateItem.templateBody = templateItem.templateBody.Trim();

                Regex precedingWS = new Regex(@"\n\s+<");

                templateItem.templateBody = precedingWS.Replace(templateItem.templateBody, "\n<");

                Regex trailingWS = new Regex(@"\s+\n");

                templateItem.templateBody = trailingWS.Replace(templateItem.templateBody, "\n");
 

                Console.WriteLine("\n\nAFTER: \n" + templateItem.templateBody);

               

                

                //Generate RegEx Pattern

                SetMessageInDictionary(templateItem);

                                

                if (templateItem.isRepeatable)

                {

                    //Create Regex to match all back to back occurances of the template pattern

                }

                else

                {

                    //Process against incoming data

                    if ("RegEx Match is found" == "RegEx Match found")

                    {
 

                    }

                    else

                    {

                        if (templateItem.isTemplateItemRequired)

                        {

                            

                        }

                    }

                                      

                }

            }

            return;

        }
 

        //Build the Message IN Dictionary keys and values

        static void SetMessageInDictionary(TemplateItem templateItem)

        {

            //Init

            Regex reg;

            String pattern;

            String keyValue, keyName;

            String processedKeyValue, processedKeyName;

            List<String> keys, processedKeys;

            int index;

            String lastMatch = "";

            reg = new Regex("(?<text>[^{}]*)({(?<key>[^}]+)})?"); // .NET Regular Expression matching KeyTemplate Grammar

            keys = new List<String>();

            processedKeys = new List<String>();

            

            // Pattern Start Character

            pattern = "^"; 

            

            //For each RegEx Template Item match in the Template

            foreach (Match match in reg.Matches(templateItem.templateBody))

            {

                //Handle whitespaces in the ValueTemplate

                keyValue = "";

                foreach (char c in match.Groups["text"].Value)

                {

                    if (c != ' ' && c != '\t')

                        keyValue += c + "$$SPACE$$";

                    else

                        keyValue += c;

                }

                keyValue = keyValue.Replace("$$SPACE$$ ", "$$SPACE$$");
 

                //Remove the last white space matcher of the pattern

                if (keyValue.EndsWith("$$SPACE$$"))

                {

                    keyValue = keyValue.Substring(0, keyValue.Length - "$$SPACE$$".Length);

                }

                pattern += keyValue.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)").Replace("$$SPACE$$", "\\s*");

                

                //Extract the key from the match

                if (match.Groups["key"].Value != "")

                {

                    keyName = match.Groups["key"].Value.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)");
 

                    //Find a valid key name for the result dictionary to avoid duplicates when repeating the template

                    if (keys.Contains(keyName))

                    {

                        index = 1;

                        while (keys.Contains(keyName + "_" + index.ToString())) index++;

                        keyName = keyName + "_" + index.ToString();

                    }

                     keys.Add(keyName);

                   

                    //A value may be omitted so make its matcher optionnal

                    pattern += string.Format("(?<{0}>.*)", keyName);

                    

                    //Set last match for error messages

                    lastMatch = keyName; 

                }

            }

           

            //Allows pattern to look at new line for the same pattern

            pattern += @"(?=\s+|$)";

            Console.WriteLine("Pattern: " + pattern + "\r");

            

            Match mmm = Regex.Match(pattern, @".*?[>+ -]");

            string subPattern = mmm.Groups[0].Value;

            Console.WriteLine("SubPattern: " + subPattern + "\r");

            Regex reSub = new Regex(subPattern + @".*?(?=\r\n(?!" + subPattern + @"))", RegexOptions.Multiline | RegexOptions.Singleline);

            Match mm = reSub.Match(incomingData);

            string newData = mm.Groups[0].Value;

            Console.WriteLine("NewData: " + newData + "\r");

            

            //Value Extractor : Uses the generated Regex to extract values from the input

            reg = new Regex(pattern, RegexOptions.Multiline); //Allows pattern matching to span one or multiple lines

            

            //Trim Rubbish from incoming Data

            incomingData = incomingData.Trim();

            Console.WriteLine("IncomingData before match:\n" + incomingData);

           

            

            

            //Match Validation

            if (!reg.IsMatch(newData))

            {

                //Only error if templateItem is required, othewrwise match not required

                if (templateItem.isTemplateItemRequired)

                {

                    throw new Exception("The Message Data Structure differs from that of the Message Template Structure and thus a conversion can not be done between the two. Last Successful Match Key was: " + lastMatch);

                }

            }
 

            //Build Key Values based on matches

            MatchCollection mc = reg.Matches(newData);

            Console.WriteLine("Matches: " + mc.Count.ToString() + "\n");

            if (mc.Count > 0)

            {

                foreach (Match m in mc)

                {

                    for (int gIdx = 0; gIdx < m.Groups.Count; gIdx++)

                    {

                        //Skip item at index 0 as it contains the full match

                        if (gIdx > 0)

                        {

                            processedKeyName = reg.GetGroupNames()[gIdx];

                            processedKeyValue = m.Groups[gIdx].Value;

                            

                            //Find a valid key name for the result dictionary to avoid duplicates when repeating the template

                            if (processedKeys.Contains(processedKeyName))

                            {

                                index = 1;

                                if (templateItem.isRepeatable)

                                {

                                    while (processedKeys.Contains(processedKeyName + "_" + index.ToString() + "|R")) index++;

                                     {

                                        processedKeyName = processedKeyName + "_" + index.ToString() + "|R";

                                     }

                                }

                               else 

                                {

                                    while (processedKeys.Contains(processedKeyName + "_" + index.ToString())) index++;

                                     {

                                        processedKeyName = processedKeyName + "_" + index.ToString();

                                     }

                                }

                            }

                            processedKeys.Add(processedKeyName);

                           

                            //Add to Dictionary

                            incomingDictionary[processedKeyName] = processedKeyValue;

                        }

                     }

                    Console.WriteLine("\nRemoveDataMatchFromIncomingData(" + "0, " + (m.Length) + "\n");

                    incomingData.Trim();

                    RemoveDataMatchFromIncomingData(0, m.Length);

                    incomingData.Trim();
 

                    

                    //Remove Matched text from incoming data

                   

                    Console.WriteLine("Pattern to apply to data: " + pattern + "\n");
 

                    

                   

                }

            }

            else

            {

                throw new Exception("Pattern did not match: ( + " + templateItem.templateBody + ").");

            }
 

            return;

        }
 

        static void RemoveDataMatchFromIncomingData(int startPosition, int lengthOfMatch)

        {

            Console.WriteLine("\nIncomingData BEFORE: \r\n");

            Console.WriteLine(incomingData + "\n\n");

            incomingData = incomingData.Remove(startPosition, lengthOfMatch);

            incomingData = incomingData.Trim();

            Console.WriteLine("IncomingData AFTER: \r");

            Console.WriteLine(incomingData + "\n\n");

        }
 

        static void DisplayDictionary(Dictionary<String, String> dictionary)

        {

            Console.WriteLine("---- DICTIONARY DATA ----\n");

            foreach ( String key in dictionary.Keys)

            {

                Console.WriteLine("[" + key + "] [" + dictionary[key] + "]\r");

            }

            Console.WriteLine("\n\n");

        }

    }

}

Open in new window

0
 

Author Comment

by:djcheeky
ID: 22774046
Any ideas?? I think what I am ging thave to do is split this program so that each message type (e.g. XML / EFIFACT etc) has its own variable extraction function.

I will start posting seperate questions for that in the meantime.

Thanks a mill!
Paolo
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 22776831
It will take a bit to determine what was changed, a copy-and-paste of the code in 22758089 does not produce the exception regarding not matching the data you have in that last example.
0
 

Author Comment

by:djcheeky
ID: 22777010
And what happens when you run the code in 22765101?

Thanks
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 22777062
Exception stating that the data doesn't match the pattern, that's an exception you manually set.
0
 

Author Comment

by:djcheeky
ID: 22784209
Yeah - thats what I was trying to avoid - but I am going to close this question now because I am doing things using XML. Will open a new question again if anything similar is required.

Thanks for all your help!
0
 
LVL 27

Expert Comment

by:ddrudik
ID: 22785271
Thanks for the question and the points.
0

Featured Post

Maximize Your Threat Intelligence Reporting

Reporting is one of the most important and least talked about aspects of a world-class threat intelligence program. Here’s how to do it right.

Join & Write a Comment

In order to hide the "ugly" records selectors (triangles) in the rowheaders, here are some suggestions. Microsoft doesn't have a direct method/property to do it. You can only hide the rowheader column. First solution, the easy way The first sol…
It was really hard time for me to get the understanding of Delegates in C#. I went through many websites and articles but I found them very clumsy. After going through those sites, I noted down the points in a easy way so here I am sharing that unde…
Learn how to match and substitute tagged data using PHP regular expressions. Demonstrated on Windows 7, but also applies to other operating systems. Demonstrated technique applies to PHP (all versions) and Firefox, but very similar techniques will w…
Explain concepts important to validation of email addresses with regular expressions. Applies to most languages/tools that uses regular expressions. Consider email address RFCs: Look at HTML5 form input element (with type=email) regex pattern: T…

706 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now