Solved

Extracting Keys and Values from two Strings.

Posted on 2008-09-30
26
541 Views
Last Modified: 2012-05-05
Hi.

I am trying to compare two strings to eachother, and extract a Dictionary of keys and values based on the result of this comparison.

The first string is the template containing  the key names (Note the keys can be called anything:
BGM+{key1}+{myKey2}_{theKey3}-{aKey4}'

The second string is the actual message that contains the values for each key.
BGM+value1+valueB_value2-value600'

So I have a function that extracts the keys into a Dictionary, which works well (see code below).
But I also require the values to be populated into the dictionary too, and I don't know how to do this.

The result I Have currently is (for illustrative purposes):
myDictionary
[key1][null]
[myKey2][null]
[theKey3][null]
[aKey4][null]

What I need is:
myDictionary
[key1][value1]
[myKey2][valueB]
[theKey3][value2]
[aKey4][value600]

Please can someone advise / help with this?? Somehow the key string has to be compared against the value string and the keys and values extracted into the Dictionary based on that comparison!
public Dictionary<String, String> GetMessageTypeVariables(String KeyTemplate, String ValueTemplate)

		{

			Dictionary<String, String> keys = new Dictionary<String, String>();

			int startPos = KeyTemplate.IndexOf('{');
 

			while (startPos != -1)

			{

				int endPos = KeyTemplate.IndexOf('}', startPos + 1);

				if (endPos != -1)

				{

					string key = KeyTemplate.Substring(startPos + 1, endPos - startPos - 1);

					string value = null;     // What to do HERE????

					keys.Add(key,value);

					startPos = KeyTemplate.IndexOf('{', endPos + 1);

				}

				else

				{

					startPos = -1;

				}

			}

			return keys;

		}

Open in new window

0
Comment
Question by:djcheeky
  • 11
  • 9
  • 6
26 Comments
 
LVL 16

Expert Comment

by:CuteBug
ID: 22604458
What you are doing in the code above is that you are just getting all the keys and inserting them into a dictionary with the value null.

That is why you are not getting the desired result.

What I want to know now is that how will you separate each of the values from the ValueTemplate. In the KeyTemplate each key was contained within { }.

What about the values in this case. Are they separated by '+'?

If that is the case then you must do the following



public static Dictionary<String, String> GetMessageTypeVariables(String KeyTemplate, String ValueTemplate)

{

    Dictionary<String, String> Result = new Dictionary<String, String>();

    int startPos = KeyTemplate.IndexOf('{');
 

    List<string> keys = new List<string>();

    while (startPos != -1)

    {

        int endPos = KeyTemplate.IndexOf('}', startPos + 1);

        if (endPos != -1)

        {

            string key = KeyTemplate.Substring(startPos + 1, endPos - startPos - 1);

            keys.Add(key);

            startPos = KeyTemplate.IndexOf('{', endPos + 1);

        }

        else

        {

            startPos = -1;

        }

    }
 

    startPos = ValueTemplate.IndexOf('+');

    List<string> values = new List<string>();

    while (startPos != -1)

    {

        int endPos = ValueTemplate.IndexOf('+', startPos + 1);

        if (endPos != -1)

        {

            string key = ValueTemplate.Substring(startPos + 1, endPos - startPos - 1);

            values.Add(key);

            startPos = ValueTemplate.IndexOf('+', endPos);

        }

        else

        {

            string key = ValueTemplate.Substring(startPos + 1);

            values.Add(key);

            startPos = -1;

        }

    }
 

    try

    {

        for (int i = 0; i < keys.Count; i++)

        {

            Result.Add(keys[i], values[i]);

        }

    }

    catch (Exception e)

    {

        // Exception will be raised if the number of values is less than the number of keys

        // In this case just end the loop

    }
 

    return Result;

}

Open in new window

0
 

Author Comment

by:djcheeky
ID: 22604548
Hi Cutebug - glad to see you again - was hoping you would see this :)

Ok - I originally added nulls deliberately as I did not know how to extract the values.

In the ValueTemplate, the Delimiter could be anything, be it a character or string, which is why again I deliberately used different characters, i.e.: '+', '_', '-' etc.

What I imagine has to happen is that you actually have to work in reverse almost, placing the KeyTemplate on top of the ValueTemplate, and then whatever differs is assigned to the keys/variables in the dictionary.
BGM+{key1}+{myKey2}_{theKey3}-{aKey4}'
BGM+value1+valueB_value2-value600'

So BGM+ is the same and is ignored and the first difference (key/value pair) encountered is:
{key1}
value1

I don't even know if this is possible?
Reckon this is going to be a headscratcher :)
Thanks
0
 
LVL 2

Expert Comment

by:sandson
ID: 22604650
Hi djcheeky,

A better implementation for your problem would be to use regular expressions to catch key names and build a pattern to catch values. Thus you're not limited in terms of template form or values.

Look at my code snippet, it functions with the sample data you put in your question. First it takes your template, then extracts key names with a regular expression that fits your requirements (ie. a sentence compound of some characters followed by a key name between { } and again some characters then key name and so on ...), then builds another regular expression representing your template (this part, in fact, translates your template into a valid .NET regular expression).
Then it uses the generated regexp to match the input message and extract the values.
Finally it builds a dictionnary with each pair of key name and value.

Thus you're not limited in putting '+' between each case as suggested by CuteBug and can easily transform the template, the function will adapt to the changes (except if you change the grammar of the template, for example by removing '{' and '}' for key names. In this case you should rewrite the first regular expression "(?<text>[^{}]*)({(?<key>[^}]+)})?" to match your new grammar).

regards,
Abdel
private Dictionary<string, string> Parse(string template, string input)

    {

      //The first string is the template containing  the key names (Note the keys can be called anything:

      //BGM+{key1}+{myKey2}_{theKey3}-{aKey4}'
 

      //The second string is the actual message that contains the values for each key. 

      //BGM+value1+valueB_value2-value600' 
 

      Regex reg = new Regex("(?<text>[^{}]*)({(?<key>[^}]+)})?");

      string pattern;

      Dictionary<string, string> dct = new Dictionary<string, string>();

      MatchCollection matches;

      GroupCollection groups;

      string keyName, keyValue;
 

      if (!reg.IsMatch(template))

      {

        MessageBox.Show("Invalid Template");

        return dct;

      }
 

      pattern = "^";

      matches = reg.Matches(template);

      foreach (Match match in matches)

      {

        pattern += match.Groups["text"].Value.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)");

        if (match.Groups["key"].Value != "")

        {

          pattern += string.Format("(?<{0}>.+)", match.Groups["key"].Value.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)"));

        }

      }
 

      pattern += "$";
 

      // Generated Pattern

      //^BGM\+(?<key1>.+)\+(?<myKey2>.+)_(?<theKey3>.+)-(?<aKey4>.+)'$

//      MessageBox.Show(pattern);
 

      reg = new Regex(pattern);

      if (!reg.IsMatch(input))

      {

        MessageBox.Show("Pattern doesn't match");

        return dct;

      }
 

      foreach (Match match in matches)

      {

        groups = reg.Match(input).Groups;
 

        keyName = match.Groups["key"].Value;

        keyValue = groups[match.Groups["key"].Value.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)")].Value;
 

        if (!string.IsNullOrEmpty(keyName))

          dct.Add(keyName, keyValue);

      }
 

      return dct;

    }

Open in new window

0
 
LVL 16

Expert Comment

by:CuteBug
ID: 22604790
Hmmm...
A liitle headscratching was required...
But it is not impossible

Try the code below
public static Dictionary<string, string> GetMessageTypeVariables(String KeyTemplate, String ValueTemplate)

{

    Dictionary<String, String> Result = new Dictionary<String, String>();

    int startPos = KeyTemplate.IndexOf('{');

    char startDelim = KeyTemplate[startPos - 1];

    int startValPos = ValueTemplate.IndexOf(startDelim);
 

    while (startPos != -1)

    {

        int endPos = KeyTemplate.IndexOf('}', startPos + 1);

        if (endPos != -1)

        {

            string key = KeyTemplate.Substring(startPos + 1, endPos - startPos - 1);

            string val = null;

            int endValPos = -1;

            if (endPos + 1 >= KeyTemplate.Length)

            {

                val = ValueTemplate.Substring(startValPos + 1);

            }

            else

            {

                char endDelim = KeyTemplate[endPos + 1];

                endValPos = ValueTemplate.IndexOf(endDelim, startValPos + 1);

                val = ValueTemplate.Substring(startValPos + 1, endValPos - startValPos - 1);

            }
 

            Result.Add(key, val);

            startPos = KeyTemplate.IndexOf('{', endPos + 1);

            startValPos = endValPos;

        }

        else

        {

            startPos = -1;

        }

    }
 

    return Result;

}

Open in new window

0
 

Author Comment

by:djcheeky
ID: 22606148
Hey CuteBug

Ran the code, and got the following dictionary:
[key1][{key1}]
[myKey2][{myKey2}]
[theKey3][{theKey3}]
[aKey4][{aKey4}]

Note that the value in  the dicitionary is merely the key surrounded by {} parenthesis.

Instead of:
[key1][{value1}]
[myKey2][{valueB}]
[theKey3][{value2}]
[aKey4][{value600}]

???
Thanks
0
 
LVL 16

Expert Comment

by:CuteBug
ID: 22610869
Hey it is running perfectly here

This is the code which I used to call the method given by me above

It gave the following result

[key1][value1]
[myKey2][valueB]
[theKey3][value2]
[aKey4][value600]
using System;

using System.Collections.Generic;
 

namespace ConsoleApplication1

{

    class Program

    {

        static void Main(string[] args)

        {

            string KeyTemplate = "BGM+{key1}+{myKey2}_{theKey3}-{aKey4}";

            string ValueTemplate = "BGM+value1+valueB_value2-value600";
 

            Dictionary<string, string> result = GetMessageTypeVariables(KeyTemplate, ValueTemplate);
 

            foreach (string key in result.Keys)

            {

                Console.WriteLine("[{0}][{1}]", key, result[key]);

            }

        }

    }

}

Open in new window

0
 
LVL 2

Expert Comment

by:sandson
ID: 22612095
Hi djcheeky,

Didn't you try my solution ?

regards,
Abdel
0
 

Author Comment

by:djcheeky
ID: 22612291
>> CuteBug

Thanks - I found the issue - was something silly on my side and when I ran your code it did in fact work! :)
However, if I take an actual value to process, other than the example ones I gave above, it seems to be doing the job, but with a slight error.
i.e. If you run the code with:
string KeyTemplate = "BGM+{DocumentMessageName}+{DocumentMessageId}' NAD+{PartyQualifier}+{PartyIdentificationDetails}'";
string ValueTemplate = "BGM+962+2537' NAD+AG+00047178'";

The Dictionary returned is:
[DocumentMessageName][962]
[DocumentMessageId][2537]
[PartyQualifier][NAD]
[PartyIdentificationDetails][AG+00047178]
 
As opposed to what it should be:
[DocumentMessageName][962]
[DocumentMessageId][2537]
[PartyQualifier][AG]
[PartyIdentificationDetails][00047178]

Thanks
 
0
 

Author Comment

by:djcheeky
ID: 22612295
Hi Abdel.

I will review your option as well and get back to you.
I am just giving Cutebug's solution priority at this moment as this issue ties together with an existing one we originally worked on.

Thanks.
0
 
LVL 2

Expert Comment

by:sandson
ID: 22612318
Ok, no problem, I just thought you didn't see it (I'm new to this forum and with my 'Expert Limited Access' I don't know exactly what are my limitations).

Just to be sure my solution works, I ran it on your second sample above (in your response to CuteBug) and it returns the correct result as you expect.

Best regards,
Abdel
0
 

Author Comment

by:djcheeky
ID: 22612666
Hi Abdel.

Your solution worked great too - I just had one small hiccup that I had to debug that was cuasing it to fail, which i finally figured out was that there was one space in the ValueTemplate that wasn't in the KeyTemplate.

So, on that note, I was wondering if it would be possible to ignore whitespace / spaces inbetween elemets, but not actually if they occur in the values.

In other words, given the following template:
string KeyTemplate = "BGM+{DocumentMessageName}+{DocumentMessageId}'NAD+{PartyQualifier}+{PartyIdentificationDetails}'";

The following ValueTemplate below will have spaces removed:
string ValueTemplate = "BGM+962+2537'      NAD+AG+00047178'";

Whereas the template below wouldn't as the spaces are in a value:
string ValueTemplate = "BGM+962+25 37'NAD+AG+000 47178'";

Is this possible??

Thanks
Paolo
0
 
LVL 2

Expert Comment

by:sandson
ID: 22612777
Hi djcheeky,

Here is a little variation of the method that handle spaces in ValueTemplate.

the modification made in it concerns the pattern translator, it now inserts space catching in the generated regular expression. As I don't know exactly how is built your ValueTemplate I made it generic enough to handle all possible case by introducing the possibility to have a space/whitespace between every character of the string.

I tested it with all the sample pairs of KeyTemplate/ValueTemplate you put in this questions and it works, so no regression issue has been introduced by this modification.

Look for the comment 'Version 2' to see what changed between the previous solution and this one.

Regards,
Abdel
    private Dictionary<string, string> Parse(string template, string input)

    {

      //The first string is the template containing  the key names (Note the keys can be called anything:

      //BGM+{key1}+{myKey2}_{theKey3}-{aKey4}'
 

      //The second string is the actual message that contains the values for each key. 

      //BGM+value1+valueB_value2-value600' 
 

      Regex reg = new Regex("(?<text>[^{}]*)({(?<key>[^}]+)})?");

      string pattern;

      Dictionary<string, string> dct = new Dictionary<string, string>();

      MatchCollection matches;

      GroupCollection groups;

      string keyName, keyValue;
 

      if (!reg.IsMatch(template))

      {

        MessageBox.Show("Invalid Template");

        return dct;

      }
 

      pattern = "^";

      matches = reg.Matches(template);

      foreach (Match match in matches)

      {

        

        // Version 1

        //pattern += match.Groups["text"].Value.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)");
 

        // Version 2 : Handle whitespaces in the ValueTemplate

        keyValue = "";

        foreach (char c in match.Groups["text"].Value)

        {

          keyValue += c + "$$SPACE$$";

        }

        pattern += keyValue.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)").Replace("$$SPACE$$", "\\s*");

        // End of version 2
 

        if (match.Groups["key"].Value != "")

        {

          pattern += string.Format("(?<{0}>.+)", match.Groups["key"].Value.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)"));

        }

      }
 

      pattern += "$";
 

      // Generated Pattern

      //^BGM\+(?<key1>.+)\+(?<myKey2>.+)_(?<theKey3>.+)-(?<aKey4>.+)'$

      //MessageBox.Show(pattern);
 

      reg = new Regex(pattern);

      if (!reg.IsMatch(input))

      {

        MessageBox.Show("Pattern doesn't match");

        return dct;

      }
 

      foreach (Match match in matches)

      {

        groups = reg.Match(input).Groups;
 

        keyName = match.Groups["key"].Value;

        keyValue = groups[match.Groups["key"].Value.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)")].Value;
 

        if (!string.IsNullOrEmpty(keyName))

          dct.Add(keyName, keyValue);

      }
 

      return dct;

    }

Open in new window

0
 
LVL 2

Expert Comment

by:sandson
ID: 22612827
Hi djcheeky,

Find below the same code as above in which I added regions and comment to make it clearer to read and understand.

Regards,
Abdel
    private Dictionary<string, string> Parse(string template, string input)

    {

      #region Sample Data

      #region Sample 1

      //The first string is the template containing  the key names (Note the keys can be called anything:

      //BGM+{key1}+{myKey2}_{theKey3}-{aKey4}'
 

      //The second string is the actual message that contains the values for each key. 

      //BGM+value1+valueB_value2-value600' 

      #endregion
 

      #region Sample 2

      //string KeyTemplate = "BGM+{DocumentMessageName}+{DocumentMessageId}' NAD+{PartyQualifier}+{PartyIdentificationDetails}'";

      //string ValueTemplate = "BGM+962+2537' NAD+AG+00047178'";

      #endregion
 

      #region Sample 3

      //input = "BGM+{DocumentMessageName}+{DocumentMessageId}'NAD+{PartyQualifier}+{PartyIdentificationDetails}'";

      //The following ValueTemplate below will have spaces removed:

      //string ValueTemplate = "BGM+962+2537'      NAD+AG+00047178'";

      //Whereas the template below wouldn't as the spaces are in a value:

      //string ValueTemplate = "BGM+962+25 37'NAD+AG+000 47178'";

      #endregion

      #endregion
 

      #region Local Variables

      Regex reg;

      string pattern;

      Dictionary<string, string> dct;

      MatchCollection matches;

      GroupCollection groups;

      string keyName, keyValue;

      #endregion
 

      #region Initializations : Initialize local variables

      reg = new Regex("(?<text>[^{}]*)({(?<key>[^}]+)})?"); // .NET Regular Expression matching KeyTemplate Grammar

      dct = new Dictionary<string, string>();               // Initialize the result dictionary to return an empty dictionary instead of null in case of error

      #endregion
 

      #region Preconditions : Check if the given template conforms to the KeyTemplate syntax

      if (!reg.IsMatch(template))

      {

        MessageBox.Show("Invalid Template");

        return dct;

      }

      #endregion
 

      #region Template Translator : Translates the KeyTemplate into a .NET Regular Expression that will match the KeyValue

      pattern = "^";

      matches = reg.Matches(template);
 

      foreach (Match match in matches)

      {

        // Version 1

        //pattern += match.Groups["text"].Value.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)");
 

        // Version 2 : Handle whitespaces in the ValueTemplate

        keyValue = "";

        foreach (char c in match.Groups["text"].Value)

        {

          keyValue += c + "$$SPACE$$";

        }

        pattern += keyValue.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)").Replace("$$SPACE$$", "\\s*");

        // End of version 2
 

        if (match.Groups["key"].Value != "")

        {

          pattern += string.Format("(?<{0}>.+)", match.Groups["key"].Value.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)"));

        }

      }
 

      pattern += "$";
 

      // Generated Pattern

      //^BGM\+(?<key1>.+)\+(?<myKey2>.+)_(?<theKey3>.+)-(?<aKey4>.+)'$

      //MessageBox.Show(pattern);

      #endregion
 

      #region Value Extractor : Uses the generated Regex to extract values from the input

      reg = new Regex(pattern);

      if (!reg.IsMatch(input))

      {

        MessageBox.Show("Pattern doesn't match");

        return dct;

      }
 

      foreach (Match match in matches)

      {

        groups = reg.Match(input).Groups;
 

        keyName = match.Groups["key"].Value;

        keyValue = groups[match.Groups["key"].Value.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)")].Value;
 

        if (!string.IsNullOrEmpty(keyName))

          dct.Add(keyName, keyValue);

      }

      #endregion
 

      #region Return resulting dictionary

      return dct;

      #endregion

    }

Open in new window

0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 

Author Comment

by:djcheeky
ID: 22613401
Genius!! Watching this code in action is ALMOST tempting me to learn regular expressions hahaha. :)

It works almost perfectly - there is one small thing though - and I would change it myself if I could, but after inspecting the Regex I'm afraid to say I just don't get them. Time to get a Regex Book :) The issue really isn't too important but if it could be done, that would be great.

Given the values:
string KeyTemplate = "BGM+{DocumentMessageName}+{DocumentMessageId}' NAD+{PartyQualifier}+{PartyIdentificationDetails}'";
string ValueTemplate = "BGM+962+2537' NAD+AG+    0004 7178    '";
     
You will notice the extra spaces at the beginning and end of the number 0004 7178
The middle space gets preserved correctly.
The Beginning spaces after the + are however removed.
And the End spaces are removed except for one.

Ideally they should remain intact as were as they form part of the variable value. Is this possible?
Also, is it a nightmare to learn Regex because I keep holding back on learning it, butthe stuff I see people do is amazing! Is it very complex?

Ta and thanks for the great solution posted.
0
 

Author Comment

by:djcheeky
ID: 22613513
Hi Abdel

I am having another issue when you repeat KeyTemplate items, for example, take the template:
string KeyTemplate = "BGM+{DocumentMessageName}+{DocumentMessageId}' BGM+{DocumentMessageName}+{DocumentMessageId}' '";

And the Values:
string ValueTemplate = "BGM+962+2537'BGM+123+456'";

Then a duplicate key error is generated which is obvious.
I will try myself to resolve this - I think maybe adding an index to each key before adding the key to the dictionary will resolve the issue.
i.e. DocumentMessageName_1 DocumentMessageName_02 etc.

Thanks

0
 
LVL 2

Expert Comment

by:sandson
ID: 22613972
Hi djcheeky,

Try this new code below. It handles your two last issues, spaces at the begining of the value and repeating template. Look for "Version 3" and "Version 4" comments to see how I deal with your issues.

In your last template (or value) there is an error. You put at the end " '" (a space followed a quote) in the template but these to characters doesn't exist in the value string, so either remove them from the template string or add them to the value string to make the two match.

As always, I tested the code with the five samples you gave and it works :)

regards,
Abdel


    private Dictionary<string, string> Parse(string template, string input)

    {

      #region Sample Data

      #region Sample 1

      //The first string is the template containing  the key names (Note the keys can be called anything:

      //BGM+{key1}+{myKey2}_{theKey3}-{aKey4}'
 

      //The second string is the actual message that contains the values for each key. 

      //BGM+value1+valueB_value2-value600' 

      #endregion
 

      #region Sample 2

      //string KeyTemplate = "BGM+{DocumentMessageName}+{DocumentMessageId}' NAD+{PartyQualifier}+{PartyIdentificationDetails}'";

      //string ValueTemplate = "BGM+962+2537' NAD+AG+00047178'";

      #endregion
 

      #region Sample 3

      //input = "BGM+{DocumentMessageName}+{DocumentMessageId}'NAD+{PartyQualifier}+{PartyIdentificationDetails}'";

      //The following ValueTemplate below will have spaces removed:

      //string ValueTemplate = "BGM+962+2537'      NAD+AG+00047178'";

      //Whereas the template below wouldn't as the spaces are in a value:

      //string ValueTemplate = "BGM+962+25 37'NAD+AG+000 47178'";

      #endregion
 

      #region Sample 4

      //string KeyTemplate = "BGM+{DocumentMessageName}+{DocumentMessageId}' NAD+{PartyQualifier}+{PartyIdentificationDetails}'";

      //string ValueTemplate = "BGM+962+2537' NAD+AG+    0004 7178    '";

      #endregion
 

      #region Sample 5

      //string KeyTemplate = "BGM+{DocumentMessageName}+{DocumentMessageId}' BGM+{DocumentMessageName}+{DocumentMessageId}' '";

      //string ValueTemplate = "BGM+962+2537'BGM+123+456'";

      #endregion

      #endregion
 

      #region Local Variables

      Regex reg;

      string pattern;

      Dictionary<string, string> dct;

      List<string> keys;      

      GroupCollection groups;

      string keyName, keyValue;

      int index;

      #endregion
 

      #region Initializations : Initialize local variables

      reg = new Regex("(?<text>[^{}]*)({(?<key>[^}]+)})?"); // .NET Regular Expression matching KeyTemplate Grammar

      dct = new Dictionary<string, string>();               // Initialize the result dictionary to return an empty dictionary instead of null in case of error

      keys = new List<string>();

      #endregion
 

      #region Preconditions : Check if the given template conforms to the KeyTemplate syntax

      if (!reg.IsMatch(template))

      {

        MessageBox.Show("Invalid Template");

        return dct;

      }

      #endregion
 

      #region Template Translator : Translates the KeyTemplate into a .NET Regular Expression that will match the KeyValue

      pattern = "^";
 

      foreach (Match match in reg.Matches(template))

      {

        // Version 1

        //pattern += match.Groups["text"].Value.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)");
 

        // Version 2 : Handle whitespaces in the ValueTemplate

        keyValue = "";

        foreach (char c in match.Groups["text"].Value)

        {

          if( c != ' ' && c != '\t' )

            keyValue += c + "$$SPACE$$";

          else

            keyValue += c;

        }
 

        keyValue = keyValue.Replace("$$SPACE$$ ", "$$SPACE$$");
 

        // Version 3 : Remove the last white space matcher of the pattern

        if( keyValue.EndsWith("$$SPACE$$") )

          keyValue = keyValue.Substring(0, keyValue.Length - "$$SPACE$$".Length);
 

        pattern += keyValue.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)").Replace("$$SPACE$$", "\\s*");

        // End of version 2
 

        if (match.Groups["key"].Value != "")

        {

          keyName = match.Groups["key"].Value.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)");
 

          // Version 4 : Find a valid key name for the result dictionary to avoid duplicates when repeating the template

          if (keys.Contains(keyName))

          {

            index = 2;

            while (keys.Contains(keyName + "_" + index.ToString())) index++;

            keyName = keyName + "_" + index.ToString();

          }
 

          keys.Add(keyName);          

          pattern += string.Format("(?<{0}>.+)", keyName);

        }

      }
 

      pattern += "$";
 

      // Generated Pattern

      //^BGM\+(?<key1>.+)\+(?<myKey2>.+)_(?<theKey3>.+)-(?<aKey4>.+)'$

      //MessageBox.Show(pattern);

      #endregion
 

      #region Value Extractor : Uses the generated Regex to extract values from the input

      reg = new Regex(pattern);

      if (!reg.IsMatch(input))

      {

        MessageBox.Show("Pattern doesn't match");

        return dct;

      }
 

      groups = reg.Match(input).Groups;
 

      foreach (string key in keys)

      {

        keyValue = groups[key.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)")].Value;

        dct.Add(key, keyValue);

      }

      #endregion
 

      #region Return resulting dictionary

      return dct;

      #endregion

    }

Open in new window

0
 
LVL 2

Expert Comment

by:sandson
ID: 22614421
djcheeky,

Regarding your question about learning Regexp, no it's not that hard to learn. The biggest issue in Regexp is that each provider will have its own features added to the base (this is why I said in my answers here and in the comments of my code that it generate a .NET Regexp).

You can learn the basics of Regexp by looking at PERL's regular expressions, as this is the standard if I could say this.
.NET Regexp is compatible with PERL's regexp but adds some useful feature to help catching elements of a string. For example for your solution I used group naming.
In this regexp "(?<text>[^{}]*)({(?<key>[^}]+)})?" I set up 2 groups "text" and "key" I can then reference when looping though the matches in the template string to get the actual value.

I have a simple simple when writing a regexp : Talk to myself to explain myself how are built the string I want to match.

For your first KeyTemplate (BGM+{key1}+{myKey2}_{theKey3}-{aKey4}') this gives :

Regards,
Abdel



1 - It starts with 0 or more characters but not a { or a }

2 - Then there is a {

3 - Then there are 1 or more characters but not a }

4 - Then there is a }

5 - But sometimes there is no { after the first step above

6 - The pattern from 1 to 5 may repeat
 

With this in mind, you simply translate into "Regexp language"
 

1 - [^{}]*

2 - {

3 - [^}]+

4 - }
 

Here we end with the Regex : [^{}]*{[^}]+}
 

5 - as 2,3 and 4 are optional then we group the three steps and make them optional (by putting them between ( and ) and adding ? to tell '0 or 1 occurence): ({[^}]+})?
 

Here we end with the Regex : [^{}]*({[^}]+})?
 

6 - As it may repeat, I won't force the Regex to match the whole string and by default it will restart from the beginning of the pattern (step 1) to try to find more occurences of the pattern. If I wanted to force it I may add "^" at the begining of the regexp and "$" at the end (as I did in the generated regexp) thus I tell the Regexp engine that it must start at the begining of the input string and when it reachs the end of the regexp it should be also the end of the input string.
 

Then having this written on paper, I talk again to myself and wonder which part of the input string (the KeyTemplate) I need to extract for future use: 
 

A - I need the text captured by step 1, which is static text that exists in the ValueTemplate and compounds the grammar of ValueTemplate

B - I need the text captured by step 3, which is the variable name of my values held in ValueTemplate
 

So I create two variables to hold values A and B : 

- A will be put in variable "text"

- B will be put in variable "key"
 

I finally translate this into .NET Regex language (note that here I use a feature of .NET Regexp, it's not present in all Regexp implementations, or in another forms)
 

A - (?<text>[^{}]*)

B - (?<key>[^}]+)
 

The complete Regex becomes : (?<text>[^{}]*)({(?<key>[^}]+)})?
 

Then I run reg.Matches(KeyTemplate) to get all matches/occurences of this pattern in the string, I loop through them and for each one I retrieve the values of A and B using match.Groups["key"].Value and match.Groups["text"].Value.

Open in new window

0
 

Author Comment

by:djcheeky
ID: 22614434
Hi again :)

What I meant was that if I add spaces anywhere in a value, the code only keeps ONE space in the value of the dictionary as opposed to all of them, i.e: In the code snippet posted below, the values in the dictionary are:

<A>
keys: [DocumentMessageName] [DocumentMessageId] [PartyQualifier] [PartyIdentificationDetails] values: [962] [2537] [AG] [ 000 471 78 ]

As opposed to:
<B>
keys: [DocumentMessageName] [DocumentMessageId] [PartyQualifier] [PartyIdentificationDetails] values: [962] [2537] [AG] [    0004     7178    ]

Notice the first set of values <A> only have one space as opposed to the desired results<B>

I will check case 5 now.

Thanks


string KeyTemplate = "BGM+{DocumentMessageName}+{DocumentMessageId}' NAD+{PartyQualifier}+{PartyIdentificationDetails}'";

string ValueTemplate = "BGM+962+2537' NAD+AG+    0004     7178    '";
 

     

Open in new window

0
 
LVL 2

Expert Comment

by:sandson
ID: 22614648
This is strange. I run the program with the sample above without having changed anything to the last code I sent and I have correct values (I added [ ] to surround each part of the values to be sure) :

On which platform do you compile/run the program ?
I used myself Visual Studio 2005 to compile the program, .NET 3.5 SP1 and .NET 2.0 SP1 are installed on my computer running Windows Server 2003

regards,
Abdel
[DocumentMessageName] = [962]

[DocumentMessageId] = [2537]

[PartyQualifier] = [AG]

[PartyIdentificationDetails] = [    0004     7178    ]

Open in new window

0
 
LVL 16

Expert Comment

by:CuteBug
ID: 22614657
Hi djcheeky,
        Sorry for the late reply.

        I did a minor modification to my method and it works in all the cases specified above (including preserving of spaces). See the code below
public static Dictionary<string, string> GetMessageTypeVariables(String KeyTemplate, String ValueTemplate)

{

    Dictionary<String, String> Result = new Dictionary<String, String>();

    int startPos = KeyTemplate.IndexOf('{');

    int startValPos = 0;
 

    while (startPos != -1)

    {

        char startDelim = KeyTemplate[startPos - 1];

        startValPos = ValueTemplate.IndexOf(startDelim, startValPos);
 

        int endPos = KeyTemplate.IndexOf('}', startPos + 1);

        if (endPos != -1)

        {

            string key = KeyTemplate.Substring(startPos + 1, endPos - startPos - 1);

            string val = null;

            int endValPos = -1;

            if (endPos + 1 >= KeyTemplate.Length)

            {

                val = ValueTemplate.Substring(startValPos + 1);

            }

            else

            {

                char endDelim = KeyTemplate[endPos + 1];

                endValPos = ValueTemplate.IndexOf(endDelim, startValPos + 1);

                val = ValueTemplate.Substring(startValPos + 1, endValPos - startValPos - 1);

            }
 

            Result.Add(key, val);

            startPos = KeyTemplate.IndexOf('{', endPos + 1);

            startValPos = endValPos;

        }

        else

        {

            startPos = -1;

        }

    }
 

    return Result;

}

Open in new window

0
 
LVL 16

Expert Comment

by:CuteBug
ID: 22614707
The above method does not solve the multiple key issue. I just saw your comment on that. Gimme some time. I will fix that too...
0
 
LVL 16

Accepted Solution

by:
CuteBug earned 150 total points
ID: 22614812
Hi djcheeky,
       The following method solves all the scenarios mentioned above(including multiple keys)
public static Dictionary<string, string> GetMessageTypeVariables(String KeyTemplate, String ValueTemplate)

{

    Dictionary<String, String> Result = new Dictionary<String, String>();

    int startPos = KeyTemplate.IndexOf('{');

    int startValPos = 0;
 

    while (startPos != -1)

    {

        char startDelim = KeyTemplate[startPos - 1];

        startValPos = ValueTemplate.IndexOf(startDelim, startValPos);
 

        int endPos = KeyTemplate.IndexOf('}', startPos + 1);

        if (endPos != -1)

        {

            string key = KeyTemplate.Substring(startPos + 1, endPos - startPos - 1);

            string val = null;

            int endValPos = -1;

            if (endPos + 1 >= KeyTemplate.Length)

            {

                val = ValueTemplate.Substring(startValPos + 1);

            }

            else

            {

                char endDelim = KeyTemplate[endPos + 1];

                endValPos = ValueTemplate.IndexOf(endDelim, startValPos + 1);

                val = ValueTemplate.Substring(startValPos + 1, endValPos - startValPos - 1);

            }
 

            if (Result.ContainsKey(key))

            {

                int count = 0;

                foreach (string k in Result.Keys)

                {

                    if ((k == key) || (k.Contains(key + "_")))

                    {

                        count++;

                    }

                }
 

                key += "_" + count.ToString();

            }
 

            Result.Add(key, val);

            startPos = KeyTemplate.IndexOf('{', endPos + 1);

            startValPos = endValPos;

        }

        else

        {

            startPos = -1;

        }

    }
 

    return Result;

}

Open in new window

0
 

Author Comment

by:djcheeky
ID: 22622153
Hi all and thanks for the help. I had to get some sleep at some point :)

>>CuteBug
I am busy implementing your solution as well and will get back to you once complete :)

>>Abdel
Thanks - Your comment about platform actually sent me in the right direction.
The reason I wasn't seeing those bloody spaces (and they WERE there all along) was because I am outputting my output to HTML (this is a Web project built with Visual Web Developer) and HTML trims all leading and trailing spaces of any output to one SPACE when output. So after dumping the data to file I actually saw it was working. :)

BUT (I hate it - everytime I try something - then something else pops up)
If I use the following:
string KeyTemplate = "LOC+{PlaceLocationQualifier}+{PlaceLocationIdentification}:{CodeListQualifier}:{CodeListResponsibleAgency}'";
string ValueTemplate = "LOC+22+CTN:8:ZZZ'"; //This works
BUT
string ValueTemplate = "LOC+22+CTN::ZZZ'"; //This doesn't work due to the 8 not being there (as it is optional and was ommitted)

Is this possible?
Thanks
Paolo



0
 
LVL 2

Assisted Solution

by:sandson
sandson earned 150 total points
ID: 22622203
Hi djcheeky,

Sure it's possible, everything is possible with regular expressions. take a look at the code below (look for the comment "Version 5").

Best regards,
    private Dictionary<string, string> Parse(string template, string input)

    {

      #region Sample Data

      #region Sample 1

      //The first string is the template containing  the key names (Note the keys can be called anything:

      //BGM+{key1}+{myKey2}_{theKey3}-{aKey4}'
 

      //The second string is the actual message that contains the values for each key. 

      //BGM+value1+valueB_value2-value600' 

      #endregion
 

      #region Sample 2

      //string KeyTemplate = "BGM+{DocumentMessageName}+{DocumentMessageId}' NAD+{PartyQualifier}+{PartyIdentificationDetails}'";

      //string ValueTemplate = "BGM+962+2537' NAD+AG+00047178'";

      #endregion
 

      #region Sample 3

      //input = "BGM+{DocumentMessageName}+{DocumentMessageId}'NAD+{PartyQualifier}+{PartyIdentificationDetails}'";

      //The following ValueTemplate below will have spaces removed:

      //string ValueTemplate = "BGM+962+2537'      NAD+AG+00047178'";

      //Whereas the template below wouldn't as the spaces are in a value:

      //string ValueTemplate = "BGM+962+25 37'NAD+AG+000 47178'";

      #endregion
 

      #region Sample 4

      //string KeyTemplate = "BGM+{DocumentMessageName}+{DocumentMessageId}' NAD+{PartyQualifier}+{PartyIdentificationDetails}'";

      //string ValueTemplate = "BGM+962+2537' NAD+AG+    0004 7178    '";

      #endregion
 

      #region Sample 5

      //string KeyTemplate = "BGM+{DocumentMessageName}+{DocumentMessageId}' BGM+{DocumentMessageName}+{DocumentMessageId}' '";

      //string ValueTemplate = "BGM+962+2537'BGM+123+456'";

      #endregion
 

      #region Sample 6

      //string KeyTemplate = "LOC+{PlaceLocationQualifier}+{PlaceLocationIdentification}:{CodeListQualifier}:{CodeListResponsibleAgency}'";

      //string ValueTemplate = "LOC+22+CTN:8:ZZZ'"; //This works

      //BUT

      //string ValueTemplate = "LOC+22+CTN::ZZZ'"; //This doesn't work due to the 8 not being there (as it is optional and was ommitted)

      #endregion

      #endregion
 

      #region Local Variables

      Regex reg;

      string pattern;

      Dictionary<string, string> dct;

      List<string> keys;      

      GroupCollection groups;

      string keyName, keyValue;

      int index;

      #endregion
 

      #region Initializations : Initialize local variables

      reg = new Regex("(?<text>[^{}]*)({(?<key>[^}]+)})?"); // .NET Regular Expression matching KeyTemplate Grammar

      dct = new Dictionary<string, string>();               // Initialize the result dictionary to return an empty dictionary instead of null in case of error

      keys = new List<string>();

      #endregion
 

      #region Preconditions : Check if the given template conforms to the KeyTemplate syntax

      if (!reg.IsMatch(template))

      {

        MessageBox.Show("Invalid Template");

        return dct;

      }

      #endregion
 

      #region Template Translator : Translates the KeyTemplate into a .NET Regular Expression that will match the KeyValue

      pattern = "^";
 

      foreach (Match match in reg.Matches(template))

      {

        // Version 1

        //pattern += match.Groups["text"].Value.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)");
 

        // Version 2 : Handle whitespaces in the ValueTemplate

        keyValue = "";

        foreach (char c in match.Groups["text"].Value)

        {

          if( c != ' ' && c != '\t' )

            keyValue += c + "$$SPACE$$";

          else

            keyValue += c;

        }
 

        keyValue = keyValue.Replace("$$SPACE$$ ", "$$SPACE$$");
 

        // Version 3 : Remove the last white space matcher of the pattern

        if( keyValue.EndsWith("$$SPACE$$") )

          keyValue = keyValue.Substring(0, keyValue.Length - "$$SPACE$$".Length);
 

        pattern += keyValue.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)").Replace("$$SPACE$$", "\\s*");

        // End of version 2
 

        if (match.Groups["key"].Value != "")

        {

          keyName = match.Groups["key"].Value.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)");
 

          // Version 4 : Find a valid key name for the result dictionary to avoid duplicates when repeating the template

          if (keys.Contains(keyName))

          {

            index = 2;

            while (keys.Contains(keyName + "_" + index.ToString())) index++;

            keyName = keyName + "_" + index.ToString();

          }
 

          keys.Add(keyName);          

          //Version 5 (Sample 6): A value may be omitted so make its matcher optionnal

          pattern += string.Format("(?<{0}>.*)", keyName);

        }

      }
 

      pattern += "$";
 

      // Generated Pattern

      //^BGM\+(?<key1>.+)\+(?<myKey2>.+)_(?<theKey3>.+)-(?<aKey4>.+)'$

      //MessageBox.Show(pattern);

      #endregion
 

      #region Value Extractor : Uses the generated Regex to extract values from the input

      reg = new Regex(pattern);

      if (!reg.IsMatch(input))

      {

        MessageBox.Show("Pattern doesn't match");

        return dct;

      }
 

      groups = reg.Match(input).Groups;
 

      foreach (string key in keys)

      {

        keyValue = groups[key.Replace("+", "\\+").Replace(".", "\\.").Replace("*", "\\*").Replace("?", "\\?").Replace("(", "\\(").Replace("[", "\\[").Replace("]", "\\]").Replace(")", "\\)")].Value;

        dct.Add(key, keyValue);

      }

      #endregion
 

      #region Return resulting dictionary

      return dct;

      #endregion

    }

Open in new window

0
 

Author Comment

by:djcheeky
ID: 22622218
Hi Cutebug

Your solution works perfectly, including the issue with values not being supplied (i.e. blank dictionary values, which is great) - The only difference however is that Sandson's solution can actually detect a difference between the template structure and the value structure, which is very important.
Would you be able to something like that with your version?

Thanks
Paolo
0
 

Author Comment

by:djcheeky
ID: 22622637
Hi Abdel
Your solution also works perfectly now.

>>Abdel & Cutebug
I reckon I have it where I need it to be at the moment - so thanks to the both of you for great help.

After handling the message IN template and values, I now have to take those values in the dictionary and apply them to a message OUT template, which shouldn't be too tough.
I really appreciate the quality help I have received from you guys so is there any way I can contact you for future assistance on this site as I am sure I will have more issues relating to this sort of development?? Otherwise I'll just hope that you see my issues!! :)

I will award points now evenly, but increases them to 300 then split.

Thanks again
(And cutebug - if you do find a way to do that validation, please just reply to this thread as I most certainly will take a look and implement - I have actually implemented both versions and will be using both to study and learn at a later stage)
0

Featured Post

How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

Join & Write a Comment

Suggested Solutions

For most people, the WrapPanel seems like a magic when they switch from WinForms to WPF. Most of us will think that the code that is used to write a control like that would be difficult. However, most of the work is done by the WPF engine, and the W…
Calculating holidays and working days is a function that is often needed yet it is not one found within the Framework. This article presents one approach to building a working-day calculator for use in .NET.
This is Part 3 in a 3-part series on Experts Exchange to discuss error handling in VBA code written for Excel. Part 1 of this series discussed basic error handling code using VBA. http://www.experts-exchange.com/videos/1478/Excel-Error-Handlin…
Internet Business Fax to Email Made Easy - With eFax Corporate (http://www.enterprise.efax.com), you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, fr…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now