Link to home
Create AccountLog in
Avatar of Craig Lambie
Craig LambieFlag for Australia

asked on

Use LINQ or Similar to Compare Array with Long String

Hi Experts,

Your thoughts and opinions please.
I am looking for a fast way to do something, and I am sure LINQ will have a great quick function I can't find to do it.

I have a List/ array [kitchen,breakfast,turkey sandwich]
And a several long strings [I went to the kitchen and had breakfast],[I had a turkey sandwhich for breakfast]

I would like to loop through the long strings and get the intersection of each from the array, I was thinking using LINQ would be easy.  

I did try this (where array is ListTags and long string is PageContent)
PageContent = PageContent + Environment.NewLine + dp.vchPageContent.ToString();
                    
                    // intCompanyID = t.intCompanyID != null ? (int)t.intCompanyID : 0 };


                    string[] parts = PageContent.ToString().Split(' ');

                    List<string> ListPageStrings = new List<string>(parts);

                    List<string> ListTags = (from dtg in db.DT_tblTags
                                             join drt in db.DT_tblRelDocuments_Tags on dtg.intTagID equals drt.intTagID
                                             select dtg.vchTag).ToList();

                    IEnumerable<string> LIntersection = ListTags.Intersect(ListPageStrings);

                    TagOperations to = new TagOperations();
                    TagElements te;

                    foreach (string tag in LIntersection)
                    {
                        //Add the tag to the Document Record
                        te = new TagElements();
                        te.intCompanyID = doc.intCompanyID;
                        te.intDocumentID = doc.intDocumentID;
                        te.intUserID = doc.intUserID;
                        te.vchTag = tag.ToString();

                        to.AddTagToDoc(te);
                    }

Open in new window


but I was using "split" and space to get two lists, which did't work when I had array members with more than one word.

I also thought about looping through for each member of the array and testing if it was in the long string using .contains, but that I feel will take a long time - there are 1050 array members in my Db)

So I am looking for a solution for the quickest way to find the intersection of the Long string and the List/ array?

Thoughts? Ideas?
Avatar of Eyal
Eyal
Flag of Israel image

 public static List<Team> GetTeams(DBContext ctx, IEnumerable<string> teamsList)
        {
                return(from t in ctx.Teams
                        where teamsList.Contains(t.ID.ToString())
                        select t).ToList()  
        }
Avatar of Craig Lambie

ASKER

Hi Eyal,

Thanks for your reply; not sure if I have got it right

Here is your code changed to fit my circumstances above
        public static List<DT_tblTag> MatchTags(DocTagDataContext ctx, IEnumerable<string> PageContent)
        {
            return (from t in ctx.DT_tblTags
                    where PageContent.Contains(t.vchTag.ToString())
                    select t).ToList();  
        }
        static IEnumerable<string> makeStringEnum(string[] arr)
        {
            foreach (string str in arr)
            {
                yield return str;
            }
        }

Open in new window


You can see I have had to make the string IENumberable to go into your function, but I am not getting any return.  Note I am inputing the whole string (as I want a "long string" to match all the possible recrods in the <list>

In your example I send a string "teamsList" of say "blue, red, green yellow, orange" as string, convert this to an IENumerable with
string[] stringArr = new string[] { teamsList};

Then send to your function
List<Team> MatchedTeams = GetTeams(ctx, stringArr);

Is that right?
you have to split the string to items in the array
Eyal - my question was how to compare a long string with an array, now you are telling me to split the long string - so I am back to the beginning?
Or do you mean something else?
Thanks :)
Avatar of Bob Learned
I would like to understand the problem space, and what happened with your original code.  You are using string.Split with the long string to get an array of strings to compare with another list of strings.  I would have thought that an Intersect, or join would have created the condition that you required.
Hi TheLearnedOne,

Well actually no, as the list of strings (in this case they are called Tags) could have spaces in it, so I want to compare the Tag itself with the Long String.
This is what I am doing, but it is severally inefficient I think.
 public List<DT_tblTag> FindMatchingTags(string PageContent)
        {
            if (ListOfTags.Count() == 0)
            {
                ListOfTags = (from t in db.DT_tblTags
                           select t).ToList();

             } // end if

            List<DT_tblTag> IntersectionTags = new List<DT_tblTag>();

            foreach (DT_tblTag t in ListOfTags)
            {
                if (PageContent.Contains(t.vchTag))
                {
                    IntersectionTags.Add(t);
                }//end if
            } //end foreach

            return IntersectionTags;
        } //end FindMatchingTags

Open in new window


What do you think?
You can use a regular expression to get the matches.
The following expression matches the string contained by the variable 'tag':

            Regex r = new Regex(@"\b" + tag + @"\b");

It's wrapped by \b to force a word boundary match before and after the tag text.

To get the index position of all matches in the string 'input':
            var result = from Match m in r.Matches(input) select m.Index;

If you're only interested in the presence of the tag, not the position, the equivalent of 'contains' is:

            r.IsMatch(input)  
Hi MIkeToole,
Thanks for that.
I am a little lacking of understanding.  Is this a way to more efficiently look for each Tag in the long string?
Or am I missing something?

I really want to get the intersection of
LongString - intersection - List<Tags>

Leaving me with a list of Tags that match words in the LongString

If so, then is r.IsMatch simply more efficient than LINQ .contains?
In your example, you have these long strings:

   I went to the kitchen and had breakfast
   I had a turkey sandwich for breakfast

which are just a string array of these elements:

   I
   went
   to
   the
   kitchen
   and
   had
   breakfast

   I
   had
   a
   turkey
   sandwich
   for
   breakfast

and these tags:

   kitchen
   breakfast
   turkey sandwich

and you want to know if the long strings contain the tags?  Regular expressions are powerful, but they are not very efficient for long strings.
Hi TheLearnedOne,

So what do you suggest then?  Or is the way I am doing it Best Practice?
If I have the assessment correct, please tell me what the resultant output would be from your example?

Right, so the resultant output from string one would be an array / list with
breakfast
kitchen

and string two would be an array /list with
breakfast
turkey sandwich

Hence why I can't split the long strings using a space, as some of the tags contain spaces.

Thanks :)
ASKER CERTIFIED SOLUTION
Avatar of MikeToole
MikeToole
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
See answer
... The 'sample use' in my last post was from the Immediate Window - ignore the "CSharp.RegEx.", that was just the namespace I happened to put the code in.
Sorry I haven't needed this again yet, and I haven't tested, but looks like it will work, so accepting to stop question getting lost.