Craig Lambie
asked on
Use LINQ or Similar to Compare Array with Long String
Hi Experts,
Your thoughts and opinions please.
I am looking for a fast way to do something, and I am sure LINQ will have a great quick function I can't find to do it.
I have a List/ array [kitchen,breakfast,turkey sandwich]
And a several long strings [I went to the kitchen and had breakfast],[I had a turkey sandwhich for breakfast]
I would like to loop through the long strings and get the intersection of each from the array, I was thinking using LINQ would be easy.
I did try this (where array is ListTags and long string is PageContent)
but I was using "split" and space to get two lists, which did't work when I had array members with more than one word.
I also thought about looping through for each member of the array and testing if it was in the long string using .contains, but that I feel will take a long time - there are 1050 array members in my Db)
So I am looking for a solution for the quickest way to find the intersection of the Long string and the List/ array?
Thoughts? Ideas?
Your thoughts and opinions please.
I am looking for a fast way to do something, and I am sure LINQ will have a great quick function I can't find to do it.
I have a List/ array [kitchen,breakfast,turkey sandwich]
And a several long strings [I went to the kitchen and had breakfast],[I had a turkey sandwhich for breakfast]
I would like to loop through the long strings and get the intersection of each from the array, I was thinking using LINQ would be easy.
I did try this (where array is ListTags and long string is PageContent)
PageContent = PageContent + Environment.NewLine + dp.vchPageContent.ToString();
// intCompanyID = t.intCompanyID != null ? (int)t.intCompanyID : 0 };
string[] parts = PageContent.ToString().Split(' ');
List<string> ListPageStrings = new List<string>(parts);
List<string> ListTags = (from dtg in db.DT_tblTags
join drt in db.DT_tblRelDocuments_Tags on dtg.intTagID equals drt.intTagID
select dtg.vchTag).ToList();
IEnumerable<string> LIntersection = ListTags.Intersect(ListPageStrings);
TagOperations to = new TagOperations();
TagElements te;
foreach (string tag in LIntersection)
{
//Add the tag to the Document Record
te = new TagElements();
te.intCompanyID = doc.intCompanyID;
te.intDocumentID = doc.intDocumentID;
te.intUserID = doc.intUserID;
te.vchTag = tag.ToString();
to.AddTagToDoc(te);
}
but I was using "split" and space to get two lists, which did't work when I had array members with more than one word.
I also thought about looping through for each member of the array and testing if it was in the long string using .contains, but that I feel will take a long time - there are 1050 array members in my Db)
So I am looking for a solution for the quickest way to find the intersection of the Long string and the List/ array?
Thoughts? Ideas?
ASKER
Hi Eyal,
Thanks for your reply; not sure if I have got it right
Here is your code changed to fit my circumstances above
You can see I have had to make the string IENumberable to go into your function, but I am not getting any return. Note I am inputing the whole string (as I want a "long string" to match all the possible recrods in the <list>
In your example I send a string "teamsList" of say "blue, red, green yellow, orange" as string, convert this to an IENumerable with
string[] stringArr = new string[] { teamsList};
Then send to your function
List<Team> MatchedTeams = GetTeams(ctx, stringArr);
Is that right?
Thanks for your reply; not sure if I have got it right
Here is your code changed to fit my circumstances above
public static List<DT_tblTag> MatchTags(DocTagDataContext ctx, IEnumerable<string> PageContent)
{
return (from t in ctx.DT_tblTags
where PageContent.Contains(t.vchTag.ToString())
select t).ToList();
}
static IEnumerable<string> makeStringEnum(string[] arr)
{
foreach (string str in arr)
{
yield return str;
}
}
You can see I have had to make the string IENumberable to go into your function, but I am not getting any return. Note I am inputing the whole string (as I want a "long string" to match all the possible recrods in the <list>
In your example I send a string "teamsList" of say "blue, red, green yellow, orange" as string, convert this to an IENumerable with
string[] stringArr = new string[] { teamsList};
Then send to your function
List<Team> MatchedTeams = GetTeams(ctx, stringArr);
Is that right?
you have to split the string to items in the array
ASKER
Eyal - my question was how to compare a long string with an array, now you are telling me to split the long string - so I am back to the beginning?
Or do you mean something else?
Or do you mean something else?
ASKER
Thanks :)
I would like to understand the problem space, and what happened with your original code. You are using string.Split with the long string to get an array of strings to compare with another list of strings. I would have thought that an Intersect, or join would have created the condition that you required.
ASKER
Hi TheLearnedOne,
Well actually no, as the list of strings (in this case they are called Tags) could have spaces in it, so I want to compare the Tag itself with the Long String.
This is what I am doing, but it is severally inefficient I think.
What do you think?
Well actually no, as the list of strings (in this case they are called Tags) could have spaces in it, so I want to compare the Tag itself with the Long String.
This is what I am doing, but it is severally inefficient I think.
public List<DT_tblTag> FindMatchingTags(string PageContent)
{
if (ListOfTags.Count() == 0)
{
ListOfTags = (from t in db.DT_tblTags
select t).ToList();
} // end if
List<DT_tblTag> IntersectionTags = new List<DT_tblTag>();
foreach (DT_tblTag t in ListOfTags)
{
if (PageContent.Contains(t.vchTag))
{
IntersectionTags.Add(t);
}//end if
} //end foreach
return IntersectionTags;
} //end FindMatchingTags
What do you think?
You can use a regular expression to get the matches.
The following expression matches the string contained by the variable 'tag':
Regex r = new Regex(@"\b" + tag + @"\b");
It's wrapped by \b to force a word boundary match before and after the tag text.
To get the index position of all matches in the string 'input':
var result = from Match m in r.Matches(input) select m.Index;
If you're only interested in the presence of the tag, not the position, the equivalent of 'contains' is:
r.IsMatch(input)
The following expression matches the string contained by the variable 'tag':
Regex r = new Regex(@"\b" + tag + @"\b");
It's wrapped by \b to force a word boundary match before and after the tag text.
To get the index position of all matches in the string 'input':
var result = from Match m in r.Matches(input) select m.Index;
If you're only interested in the presence of the tag, not the position, the equivalent of 'contains' is:
r.IsMatch(input)
ASKER
Hi MIkeToole,
Thanks for that.
I am a little lacking of understanding. Is this a way to more efficiently look for each Tag in the long string?
Or am I missing something?
I really want to get the intersection of
LongString - intersection - List<Tags>
Leaving me with a list of Tags that match words in the LongString
If so, then is r.IsMatch simply more efficient than LINQ .contains?
Thanks for that.
I am a little lacking of understanding. Is this a way to more efficiently look for each Tag in the long string?
Or am I missing something?
I really want to get the intersection of
LongString - intersection - List<Tags>
Leaving me with a list of Tags that match words in the LongString
If so, then is r.IsMatch simply more efficient than LINQ .contains?
In your example, you have these long strings:
I went to the kitchen and had breakfast
I had a turkey sandwich for breakfast
which are just a string array of these elements:
I
went
to
the
kitchen
and
had
breakfast
I
had
a
turkey
sandwich
for
breakfast
and these tags:
kitchen
breakfast
turkey sandwich
and you want to know if the long strings contain the tags? Regular expressions are powerful, but they are not very efficient for long strings.
I went to the kitchen and had breakfast
I had a turkey sandwich for breakfast
which are just a string array of these elements:
I
went
to
the
kitchen
and
had
breakfast
I
had
a
turkey
sandwich
for
breakfast
and these tags:
kitchen
breakfast
turkey sandwich
and you want to know if the long strings contain the tags? Regular expressions are powerful, but they are not very efficient for long strings.
ASKER
Hi TheLearnedOne,
So what do you suggest then? Or is the way I am doing it Best Practice?
So what do you suggest then? Or is the way I am doing it Best Practice?
If I have the assessment correct, please tell me what the resultant output would be from your example?
ASKER
Right, so the resultant output from string one would be an array / list with
breakfast
kitchen
and string two would be an array /list with
breakfast
turkey sandwich
Hence why I can't split the long strings using a space, as some of the tags contain spaces.
Thanks :)
ASKER CERTIFIED SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
... The 'sample use' in my last post was from the Immediate Window - ignore the "CSharp.RegEx.", that was just the namespace I happened to put the code in.
ASKER
Sorry I haven't needed this again yet, and I haven't tested, but looks like it will work, so accepting to stop question getting lost.
{
return(from t in ctx.Teams
where teamsList.Contains(t.ID.To
select t).ToList()
}