Use LINQ or Similar to Compare Array with Long String

Hi Experts,

Your thoughts and opinions please.
I am looking for a fast way to do something, and I am sure LINQ will have a great quick function I can't find to do it.

I have a List/ array [kitchen,breakfast,turkey sandwich]
And a several long strings [I went to the kitchen and had breakfast],[I had a turkey sandwhich for breakfast]

I would like to loop through the long strings and get the intersection of each from the array, I was thinking using LINQ would be easy.  

I did try this (where array is ListTags and long string is PageContent)
PageContent = PageContent + Environment.NewLine + dp.vchPageContent.ToString();
                    
                    // intCompanyID = t.intCompanyID != null ? (int)t.intCompanyID : 0 };


                    string[] parts = PageContent.ToString().Split(' ');

                    List<string> ListPageStrings = new List<string>(parts);

                    List<string> ListTags = (from dtg in db.DT_tblTags
                                             join drt in db.DT_tblRelDocuments_Tags on dtg.intTagID equals drt.intTagID
                                             select dtg.vchTag).ToList();

                    IEnumerable<string> LIntersection = ListTags.Intersect(ListPageStrings);

                    TagOperations to = new TagOperations();
                    TagElements te;

                    foreach (string tag in LIntersection)
                    {
                        //Add the tag to the Document Record
                        te = new TagElements();
                        te.intCompanyID = doc.intCompanyID;
                        te.intDocumentID = doc.intDocumentID;
                        te.intUserID = doc.intUserID;
                        te.vchTag = tag.ToString();

                        to.AddTagToDoc(te);
                    }

Open in new window


but I was using "split" and space to get two lists, which did't work when I had array members with more than one word.

I also thought about looping through for each member of the array and testing if it was in the long string using .contains, but that I feel will take a long time - there are 1050 array members in my Db)

So I am looking for a solution for the quickest way to find the intersection of the Long string and the List/ array?

Thoughts? Ideas?
LVL 1
Craig LambieAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

EyalCommented:
 public static List<Team> GetTeams(DBContext ctx, IEnumerable<string> teamsList)
        {
                return(from t in ctx.Teams
                        where teamsList.Contains(t.ID.ToString())
                        select t).ToList()  
        }
Craig LambieAuthor Commented:
Hi Eyal,

Thanks for your reply; not sure if I have got it right

Here is your code changed to fit my circumstances above
        public static List<DT_tblTag> MatchTags(DocTagDataContext ctx, IEnumerable<string> PageContent)
        {
            return (from t in ctx.DT_tblTags
                    where PageContent.Contains(t.vchTag.ToString())
                    select t).ToList();  
        }
        static IEnumerable<string> makeStringEnum(string[] arr)
        {
            foreach (string str in arr)
            {
                yield return str;
            }
        }

Open in new window


You can see I have had to make the string IENumberable to go into your function, but I am not getting any return.  Note I am inputing the whole string (as I want a "long string" to match all the possible recrods in the <list>

In your example I send a string "teamsList" of say "blue, red, green yellow, orange" as string, convert this to an IENumerable with
string[] stringArr = new string[] { teamsList};

Then send to your function
List<Team> MatchedTeams = GetTeams(ctx, stringArr);

Is that right?
EyalCommented:
you have to split the string to items in the array
10 Tips to Protect Your Business from Ransomware

Did you know that ransomware is the most widespread, destructive malware in the world today? It accounts for 39% of all security breaches, with ransomware gangsters projected to make $11.5B in profits from online extortion by 2019.

Craig LambieAuthor Commented:
Eyal - my question was how to compare a long string with an array, now you are telling me to split the long string - so I am back to the beginning?
Or do you mean something else?
Craig LambieAuthor Commented:
Thanks :)
Bob LearnedCommented:
I would like to understand the problem space, and what happened with your original code.  You are using string.Split with the long string to get an array of strings to compare with another list of strings.  I would have thought that an Intersect, or join would have created the condition that you required.
Craig LambieAuthor Commented:
Hi TheLearnedOne,

Well actually no, as the list of strings (in this case they are called Tags) could have spaces in it, so I want to compare the Tag itself with the Long String.
This is what I am doing, but it is severally inefficient I think.
 public List<DT_tblTag> FindMatchingTags(string PageContent)
        {
            if (ListOfTags.Count() == 0)
            {
                ListOfTags = (from t in db.DT_tblTags
                           select t).ToList();

             } // end if

            List<DT_tblTag> IntersectionTags = new List<DT_tblTag>();

            foreach (DT_tblTag t in ListOfTags)
            {
                if (PageContent.Contains(t.vchTag))
                {
                    IntersectionTags.Add(t);
                }//end if
            } //end foreach

            return IntersectionTags;
        } //end FindMatchingTags

Open in new window


What do you think?
MikeTooleCommented:
You can use a regular expression to get the matches.
The following expression matches the string contained by the variable 'tag':

            Regex r = new Regex(@"\b" + tag + @"\b");

It's wrapped by \b to force a word boundary match before and after the tag text.

To get the index position of all matches in the string 'input':
            var result = from Match m in r.Matches(input) select m.Index;

If you're only interested in the presence of the tag, not the position, the equivalent of 'contains' is:

            r.IsMatch(input)  
Craig LambieAuthor Commented:
Hi MIkeToole,
Thanks for that.
I am a little lacking of understanding.  Is this a way to more efficiently look for each Tag in the long string?
Or am I missing something?

I really want to get the intersection of
LongString - intersection - List<Tags>

Leaving me with a list of Tags that match words in the LongString

If so, then is r.IsMatch simply more efficient than LINQ .contains?
Bob LearnedCommented:
In your example, you have these long strings:

   I went to the kitchen and had breakfast
   I had a turkey sandwich for breakfast

which are just a string array of these elements:

   I
   went
   to
   the
   kitchen
   and
   had
   breakfast

   I
   had
   a
   turkey
   sandwich
   for
   breakfast

and these tags:

   kitchen
   breakfast
   turkey sandwich

and you want to know if the long strings contain the tags?  Regular expressions are powerful, but they are not very efficient for long strings.
Craig LambieAuthor Commented:
Hi TheLearnedOne,

So what do you suggest then?  Or is the way I am doing it Best Practice?
Bob LearnedCommented:
If I have the assessment correct, please tell me what the resultant output would be from your example?
Craig LambieAuthor Commented:

Right, so the resultant output from string one would be an array / list with
breakfast
kitchen

and string two would be an array /list with
breakfast
turkey sandwich

Hence why I can't split the long strings using a space, as some of the tags contain spaces.

Thanks :)
MikeTooleCommented:
The attached function returns all matched Tags from a Part string.
It first wraps each tag in word boundary markers, then constructs a RegEx expression by joining each tag to the next with an Or symbol (|)
The Linq statement selects the string representation of each Match and returns a List<string>

Sample use:
?CSharp.RegEx.GetMatchedTags({"One","Three", "Five Four", "Five Six"}, "One Two Three Four Five Six Seven")
Count = 3
    (0): "One"
    (1): "Three"
    (2): "Five Six"
public static List<string> GetMatchedTags(string[] tags, string part)
        {
            tags = tags.Select(t => @"\b" + t + @"\b").ToArray();
            var rx = new Regex(string.Join("|", tags));
            return rx.Matches(part).OfType<Match>().Select(m => m.ToString()).ToList();
        }

Open in new window

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
MikeTooleCommented:
... The 'sample use' in my last post was from the Immediate Window - ignore the "CSharp.RegEx.", that was just the namespace I happened to put the code in.
Craig LambieAuthor Commented:
Sorry I haven't needed this again yet, and I haven't tested, but looks like it will work, so accepting to stop question getting lost.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
.NET Programming

From novice to tech pro — start learning today.