Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

VS 2008 Looking for Regular Expressions

Posted on 2011-03-05
13
Medium Priority
?
414 Views
Last Modified: 2012-05-11
Hi, I have a string list which has hundrens of elements. Now I want to find a rule to distinguish them according to their similarity. There are these kinds of pattern.


Group 1: "ABC 20", "ABC 20 Dup". There is a substring "DUP" in the last position.
Group 2: "saliva 4", "saliva 4_2","siliva 4_3", etc. There is a substring "_" plus a number in the last position.
Group 3: "sal_1", "sal_1b","sal_1c", etc. There is a character in the last popsition.
Group 4: "NA222", "NA222b", there is a character in the last position.
Group 5: "1","1_2","10","10_2" etc.

I want to put them into a dictionary if they are similar. Thanks for help with C# code.
0
Comment
Question by:zhshqzyc
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 8
  • 5
13 Comments
 

Author Comment

by:zhshqzyc
ID: 35044518
Maybe my group rule is wrong.
 only group 5 is enough. I want to find the similarty of the strings then put into a dictionary.

My expected result is:

 
Dictionary<string,List<string>> dict = new Dictionary<string,List<string>>();
dict["ABC 20"] = {"ABC 20","ABC 20 Dup"};
 dict["1"] = {"1","1_2"};

Open in new window

The question is how to avoid dict["1"]={"1","10"};

0
 
LVL 19

Expert Comment

by:Shahan Ayyub
ID: 35045760
Hi!

Your above attached code should be like this:

            Dictionary<string,List<string>> dict = new Dictionary<string,List<string>>();
            dict["ABC 20"] = new List<string> { "ABC 20 DUP" };
            dict["1"] = new List<string> { "1", "1_2" };

Open in new window


In the code these part are considerable:
            Dictionary<string,List<string>> dict = new Dictionary<string,List<string>>();
            dict["ABC 20"] = new List<string> { "ABC DUP 20","Dup 20" };
            dict["1"] = new List<string> { "1", "1_2" };

Now, Can you elaborate a little bit that if you have these values:
"saliva 4_2","siliva 4_3"
under the key:
"saliva 4"

Then what you want next ??
Please clarify this:
>>>I want to put them into a dictionary if they are similar
0
 

Author Comment

by:zhshqzyc
ID: 35047219
dict["ABC 20"]=new List<string>{"ABC 20","ABC 20 Dup"};
dict["saliva 4"]=new List<string>{"saliva 4","saliva 4_2","siliva 4_3"};

Rule:
•Only digits or
•mix letters and digits or underscore(non pure digits)

Case 1: digits only
add the string to the dictionary as a new key.Search the entire string list,
  if a string is found for the portion before non digit
  add the string to the list referenced by the found key.(ex. "10","10_2","10_b")
Case 2: mix letters and digits or underscore/white space(non pure digits)
 add the string to the dictionary as a new key.Search the entire string list,
 if a string is found for the portion
  just add the string to the list referenced by the found key.(ex. "ABC 20","ABC 20 Dup")("N2222","N2222b")("Sal_1","Sal_1b")("saliva 4","saliva 4_2")


0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 

Author Comment

by:zhshqzyc
ID: 35047330
We are doing biology experiments. Each bio-sample has an uniqe id. But the experiment may be repeated for each sample. For example  sample "1" was tested then we have to duplicate the experiment, the second time we used "1_1" or "1_b" as control group name for analysis comparation. These sample names are stranger such as "ABC 20", then duplicated experiment is marked "ABC 20 Dup". So the key should be an element of the list or array. Thus you can not split "ABC 20" to "ABC". "saliva 4" is a sample name, the duplicate experiment should be named as  "saliva 4" plus something such as "saliva 4_2", so you can not split "sliva 4" to "saliva".

And also "1" and "10" are different sample, they can not grouped together. "1" and "1_1" or "1_b" are the same sample but used in different experiments. I think the hard part is when the sample id is a number, then the duplicate one is the number plus nondigit character. But we should prevent to put "1" and "10_2" together although portion "0_2" is a pure digit.

0
 
LVL 19

Expert Comment

by:Shahan Ayyub
ID: 35047862
Hi!

As I understood:
In dict[key, value]
Here key refers original sample and value refers the duplicate. So what i need to do is to group duplicates under their samples like this:
key:                   Value:
dict["ABC"] =   { "ABC"}
dict["ABC Dup"] = {"ABC Dup"}
dict["1"] = {"1"}
dict["1_1"] = {"1_1"}

to this:
dict["ABC"] = {"ABC" , "ABC Dup"}
dict["1"] = {"1" , "1_1"}

Here ABC Dup is a duplicate of ABC and 1_1 is a duplicate of 1.

If their is something wrong please correct it.
0
 

Author Comment

by:zhshqzyc
ID: 35048222
Exactly.
0
 

Author Comment

by:zhshqzyc
ID: 35064693
My primary code. Need help.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
namespace Test
{
    class Program
    {
       
        static void Main(string[] args)
        {
            GetSimilarities();
        }

         static void GetSimilarities()
        {
             Dictionary<string, List<string>> dict1 = new Dictionary<string, List<string>>();

            for (int i = 0; i < header1.Length; i++)
            {
                string key = header1[i];
                bool result = key.All(Char.IsDigit);
                for (int j = 0; j < header1.Length; j++)
                {
                    string value = header1[j];
                    if (value.Length <= key.Length)
                        continue;
                    if (result == true)
                    {
                        string pattern = key + @"[^\d]+\w";
                        if (Regex.IsMatch(value, pattern))
                        {
                            if (key == "1")
                                Console.ReadLine();
                            if (!dict1.ContainsKey(key))
                            {
                                dict1[key] = new List<string>();
                                dict1[key].Add(key);
                                dict1[key].Add(value);
                            }
                        }
                    }
                    else
                    {
                        string pattern = key + @".+";
                        if (Regex.IsMatch(value, pattern))
                        {
                            if (!dict1.ContainsKey(key))
                            {
                                dict1[key] = new List<string>();
                                dict1[key].Add(key);
                                dict1[key].Add(value);  
                            }
                        }
                    }
                }

            }
        }

        static string[] header1 = new string[]
        { "XYZ 20", "XYZ 20 Dup", "Saliva 1_2", "Saliva 1", "Sal_Lb", "Sal_L", "Sal_2b", "Sal_2", "Sal_1b", "Sal_1", "KA_2", "KA", "JDT_2", "JDT", "JA_2", "JA", "8_2", "8_2b", "6688", "6688b", "test1a_2", "test1a", "1", "1_1", "2", "2_1", "3", "3_1", "10", "10_1" };
    }
    
}

Open in new window

When key ="1", the value is incorrect.
0
 
LVL 19

Expert Comment

by:Shahan Ayyub
ID: 35073527
Hi!

Sorry for the late response, i was busy somewhere. I didn't test your method yet, but came up with this one. A full version:

     
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Text.RegularExpressions;
using System.Diagnostics;
using Microsoft.VisualBasic; 

namespace ConsoleCS
{
    class Program
    {
        static string[] header1 = new string[] {
                                                 "XYZ 20", "XYZ 20 Dup", "Saliva 1_2", "Saliva 1", 
                                                 "Sal_Lb", "Sal_L", "Sal_2b", "Sal_2", "Sal_1b", "Sal_1",
                                                 "KA_2", "KA", "JDT_2", "JDT", "JA_2", "JA", "8_2", 
                                                 "8_2b", "6688", "6688b", "test1a_2", "test1a", "1", 
                                                 "1_1", "2", "2_1", "3", "3_1", "10", "10_1",
                                                 "XYZ 20 DUP_2"
                                               };
        static void Main(string[] args)
        {
            Dictionary<string,List<string>> dict = new Dictionary<string,List<string>>();
            dict = GroupSamples();
            PrintValues(dict);
        }

        public static void PrintValues(Dictionary<string, List<string>> dict)
        {
            foreach (string k in dict.Keys)
            {
                Console.WriteLine(k);
                foreach (string v in dict[k])
                {
                    Console.WriteLine("    " + v);
                }
            }
        }

        static Dictionary<string, List<string>> GroupSamples()
        {
            Dictionary<string, List<string>> dict = new Dictionary<string, List<string>>();
            Array.Sort(header1);
            for (int i = 0; i < header1.Length; i++)
            {
                for (int j = i + 1; j < header1.Length; j++)
                {
                    if (!Information.IsNumeric(header1[i]))
                    {
                        if (Regex.IsMatch(header1[j], "(?i)^" + header1[i] + "(_[a-z]|[a-z]|_\\d)+$"))
                        {
                            if (!dict.ContainsKey(header1[i]))
                            {
                                dict[header1[i]] = new List<string>(new string[] { header1[j] });
                            }
                            else
                            {
                                dict[header1[i]].Add(header1[j]);
                            }
                        }
                    }
                    if (Information.IsNumeric(header1[i]) && header1[i].Contains("_"))                        
                    {
                        if (Regex.IsMatch(header1[j], "(?i)^" + header1[i] + @"([\s\da-z]+)$"))
                        {
                            if (!dict.ContainsKey(header1[i]))
                            {
                                dict[header1[i]] = new List<string>(new string[] { header1[j] });
                            }
                            else
                            {
                                dict[header1[i]].Add(header1[j]);
                            }
                        }
                    }
                    if(Information.IsNumeric(header1[i]) && !header1[i].Contains("_"))
                    {
                        if (Regex.IsMatch(header1[j], "(?i)^" + header1[i] + @"(_\d+|[a-z]+)$"))
                        {
                            if (!dict.ContainsKey(header1[i]))
                            {
                                dict[header1[i]] = new List<string>(new string[] { header1[j] });
                            }
                            else
                            {
                                dict[header1[i]].Add(header1[j]);
                            }
                        }
                    }
                    if (!Information.IsNumeric(header1[i]) && !header1[i].Contains("_"))
                    {
                        if (Regex.IsMatch(header1[j], "(?i)^" + header1[i] + @"[\sa-z\d]+$"))
                        {
                            if (!dict.ContainsKey(header1[i]))
                            {
                                dict[header1[i]] = new List<string>(new string[] { header1[j] });
                            }
                            else
                            {
                                dict[header1[i]].Add(header1[j]);
                            }
                        }
                    }

                }
            }
            return dict;
        }
    }
}

Open in new window

0
 

Author Comment

by:zhshqzyc
ID: 35074046
I already figured it out. Thank you very much.
0
 
LVL 19

Expert Comment

by:Shahan Ayyub
ID: 35074064
So your problem solved ???
0
 
LVL 19

Accepted Solution

by:
Shahan Ayyub earned 2000 total points
ID: 35074089
Did you test my solution ???
0
 

Author Comment

by:zhshqzyc
ID: 35087506
Your code should work but I would like a general case. In your pattern you used '_', but it may be other char. For example, {"KA","KA*2}  instead of {"KA","KA_2"}.
0
 

Author Comment

by:zhshqzyc
ID: 35098015
Sorry, it is a mistakenly hit.
0

Featured Post

On Demand Webinar: Networking for the Cloud Era

Did you know SD-WANs can improve network connectivity? Check out this webinar to learn how an SD-WAN simplified, one-click tool can help you migrate and manage data in the cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Article by: Najam
Having new technologies does not mean they will completely replace old components.  Recently I had to create WCF that will be called by VB6 component.  Here I will describe what steps one should follow while doing so, please feel free to post any qu…
Introduction Hi all and welcome to my first article on Experts Exchange. A while ago, someone asked me if i could do some tutorials on object oriented programming. I decided to do them on C#. Now you may ask me, why's that? Well, one of the re…
This tutorial will teach you the special effect of super speed similar to the fictional character Wally West aka "The Flash" After Shake : http://www.videocopilot.net/presets/after_shake/ All lightning effects with instructions : http://www.mediaf…
In this video, Percona Solutions Engineer Barrett Chambers discusses some of the basic syntax differences between MySQL and MongoDB. To learn more check out our webinar on MongoDB administration for MySQL DBA: https://www.percona.com/resources/we…

722 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question