• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 632
  • Last Modified:

Use LINQ or Similar to Compare Array with Long String

Hi Experts,

Your thoughts and opinions please.
I am looking for a fast way to do something, and I am sure LINQ will have a great quick function I can't find to do it.

I have a List/ array [kitchen,breakfast,turkey sandwich]
And a several long strings [I went to the kitchen and had breakfast],[I had a turkey sandwhich for breakfast]

I would like to loop through the long strings and get the intersection of each from the array, I was thinking using LINQ would be easy.  

I did try this (where array is ListTags and long string is PageContent)
PageContent = PageContent + Environment.NewLine + dp.vchPageContent.ToString();
                    
                    // intCompanyID = t.intCompanyID != null ? (int)t.intCompanyID : 0 };


                    string[] parts = PageContent.ToString().Split(' ');

                    List<string> ListPageStrings = new List<string>(parts);

                    List<string> ListTags = (from dtg in db.DT_tblTags
                                             join drt in db.DT_tblRelDocuments_Tags on dtg.intTagID equals drt.intTagID
                                             select dtg.vchTag).ToList();

                    IEnumerable<string> LIntersection = ListTags.Intersect(ListPageStrings);

                    TagOperations to = new TagOperations();
                    TagElements te;

                    foreach (string tag in LIntersection)
                    {
                        //Add the tag to the Document Record
                        te = new TagElements();
                        te.intCompanyID = doc.intCompanyID;
                        te.intDocumentID = doc.intDocumentID;
                        te.intUserID = doc.intUserID;
                        te.vchTag = tag.ToString();

                        to.AddTagToDoc(te);
                    }

Open in new window


but I was using "split" and space to get two lists, which did't work when I had array members with more than one word.

I also thought about looping through for each member of the array and testing if it was in the long string using .contains, but that I feel will take a long time - there are 1050 array members in my Db)

So I am looking for a solution for the quickest way to find the intersection of the Long string and the List/ array?

Thoughts? Ideas?
0
Craig Lambie
Asked:
Craig Lambie
  • 8
  • 3
  • 3
  • +1
1 Solution
 
EyalCommented:
 public static List<Team> GetTeams(DBContext ctx, IEnumerable<string> teamsList)
        {
                return(from t in ctx.Teams
                        where teamsList.Contains(t.ID.ToString())
                        select t).ToList()  
        }
0
 
Craig LambieAuthor Commented:
Hi Eyal,

Thanks for your reply; not sure if I have got it right

Here is your code changed to fit my circumstances above
        public static List<DT_tblTag> MatchTags(DocTagDataContext ctx, IEnumerable<string> PageContent)
        {
            return (from t in ctx.DT_tblTags
                    where PageContent.Contains(t.vchTag.ToString())
                    select t).ToList();  
        }
        static IEnumerable<string> makeStringEnum(string[] arr)
        {
            foreach (string str in arr)
            {
                yield return str;
            }
        }

Open in new window


You can see I have had to make the string IENumberable to go into your function, but I am not getting any return.  Note I am inputing the whole string (as I want a "long string" to match all the possible recrods in the <list>

In your example I send a string "teamsList" of say "blue, red, green yellow, orange" as string, convert this to an IENumerable with
string[] stringArr = new string[] { teamsList};

Then send to your function
List<Team> MatchedTeams = GetTeams(ctx, stringArr);

Is that right?
0
 
EyalCommented:
you have to split the string to items in the array
0
NEW Veeam Agent for Microsoft Windows

Backup and recover physical and cloud-based servers and workstations, as well as endpoint devices that belong to remote users. Avoid downtime and data loss quickly and easily for Windows-based physical or public cloud-based workloads!

 
Craig LambieAuthor Commented:
Eyal - my question was how to compare a long string with an array, now you are telling me to split the long string - so I am back to the beginning?
Or do you mean something else?
0
 
Craig LambieAuthor Commented:
Thanks :)
0
 
Bob LearnedCommented:
I would like to understand the problem space, and what happened with your original code.  You are using string.Split with the long string to get an array of strings to compare with another list of strings.  I would have thought that an Intersect, or join would have created the condition that you required.
0
 
Craig LambieAuthor Commented:
Hi TheLearnedOne,

Well actually no, as the list of strings (in this case they are called Tags) could have spaces in it, so I want to compare the Tag itself with the Long String.
This is what I am doing, but it is severally inefficient I think.
 public List<DT_tblTag> FindMatchingTags(string PageContent)
        {
            if (ListOfTags.Count() == 0)
            {
                ListOfTags = (from t in db.DT_tblTags
                           select t).ToList();

             } // end if

            List<DT_tblTag> IntersectionTags = new List<DT_tblTag>();

            foreach (DT_tblTag t in ListOfTags)
            {
                if (PageContent.Contains(t.vchTag))
                {
                    IntersectionTags.Add(t);
                }//end if
            } //end foreach

            return IntersectionTags;
        } //end FindMatchingTags

Open in new window


What do you think?
0
 
MikeTooleCommented:
You can use a regular expression to get the matches.
The following expression matches the string contained by the variable 'tag':

            Regex r = new Regex(@"\b" + tag + @"\b");

It's wrapped by \b to force a word boundary match before and after the tag text.

To get the index position of all matches in the string 'input':
            var result = from Match m in r.Matches(input) select m.Index;

If you're only interested in the presence of the tag, not the position, the equivalent of 'contains' is:

            r.IsMatch(input)  
0
 
Craig LambieAuthor Commented:
Hi MIkeToole,
Thanks for that.
I am a little lacking of understanding.  Is this a way to more efficiently look for each Tag in the long string?
Or am I missing something?

I really want to get the intersection of
LongString - intersection - List<Tags>

Leaving me with a list of Tags that match words in the LongString

If so, then is r.IsMatch simply more efficient than LINQ .contains?
0
 
Bob LearnedCommented:
In your example, you have these long strings:

   I went to the kitchen and had breakfast
   I had a turkey sandwich for breakfast

which are just a string array of these elements:

   I
   went
   to
   the
   kitchen
   and
   had
   breakfast

   I
   had
   a
   turkey
   sandwich
   for
   breakfast

and these tags:

   kitchen
   breakfast
   turkey sandwich

and you want to know if the long strings contain the tags?  Regular expressions are powerful, but they are not very efficient for long strings.
0
 
Craig LambieAuthor Commented:
Hi TheLearnedOne,

So what do you suggest then?  Or is the way I am doing it Best Practice?
0
 
Bob LearnedCommented:
If I have the assessment correct, please tell me what the resultant output would be from your example?
0
 
Craig LambieAuthor Commented:

Right, so the resultant output from string one would be an array / list with
breakfast
kitchen

and string two would be an array /list with
breakfast
turkey sandwich

Hence why I can't split the long strings using a space, as some of the tags contain spaces.

Thanks :)
0
 
MikeTooleCommented:
The attached function returns all matched Tags from a Part string.
It first wraps each tag in word boundary markers, then constructs a RegEx expression by joining each tag to the next with an Or symbol (|)
The Linq statement selects the string representation of each Match and returns a List<string>

Sample use:
?CSharp.RegEx.GetMatchedTags({"One","Three", "Five Four", "Five Six"}, "One Two Three Four Five Six Seven")
Count = 3
    (0): "One"
    (1): "Three"
    (2): "Five Six"
public static List<string> GetMatchedTags(string[] tags, string part)
        {
            tags = tags.Select(t => @"\b" + t + @"\b").ToArray();
            var rx = new Regex(string.Join("|", tags));
            return rx.Matches(part).OfType<Match>().Select(m => m.ToString()).ToList();
        }

Open in new window

0
 
MikeTooleCommented:
... The 'sample use' in my last post was from the Immediate Window - ignore the "CSharp.RegEx.", that was just the namespace I happened to put the code in.
0
 
Craig LambieAuthor Commented:
Sorry I haven't needed this again yet, and I haven't tested, but looks like it will work, so accepting to stop question getting lost.
0

Featured Post

Fill in the form and get your FREE NFR key NOW!

Veeam is happy to provide a FREE NFR server license to certified engineers, trainers, and bloggers.  It allows for the non‑production use of Veeam Agent for Microsoft Windows. This license is valid for five workstations and two servers.

  • 8
  • 3
  • 3
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now