latsubs
asked on
How-to find Dictionary items by partial key matching?
I am developing an application which operates with items of some sort (assume that each Item is represented by class MyItem). Now i have to add to my app the following functionality: the ability to find item by keyword(s). I have made conclusion to implement internally such functionality by using multi keyed dictionary where value of each dictionary item is MyItem class. when using great C5 collections library this looks like
C5.HashDictionary<C5.HashB ag<string> , MyItem>
I can rapidly find item by specifying the exact bag of item's keywords. But the problem is that i don't know how-to find all items in dictionary by specifying non-exact ba of keywords. For example if item's bag of keys is "key1, key2" i want to find it by specifying only "key1". Is this possible?
C5.HashDictionary<C5.HashB
I can rapidly find item by specifying the exact bag of item's keywords. But the problem is that i don't know how-to find all items in dictionary by specifying non-exact ba of keywords. For example if item's bag of keys is "key1, key2" i want to find it by specifying only "key1". Is this possible?
Iterate through all the items in the dictionary and find all keys that contains the partial key using string.Contains() function
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
now i'm starting to think that it wasn't brigth idea to implement searching using multikeyed dictionary. i'm considering switching to generic List and implementing keyword also as generic string List in MyItem class. then i can easily get needed items by using anonymous delegate:
private class Item
{
public Item()
{
keys = new System.Collections.Generic .List<stri ng>();
name = String.Empty;
}
public Item(string[] keys, string value) : this()
{
this.keys.AddRange(keys);
this.name = value;
}
public System.Collections.Generic .List<stri ng> keys;
public string name;
}
private void Test()
{
System.Collections.Generic .List<stri ng> keysToSearch = new System.Collections.Generic .List<stri ng>();
keysToSearch.Add("key1");
keysToSearch.Add("key3");
System.Collections.Generic .List<Item > items = new System.Collections.Generic .List<Item >();
items.Add(new Item(new string[] { "key1", "key2" }, "item1"));
items.Add(new Item(new string[] { "key1", "key3" }, "item2"));
items.Add(new Item(new string[] { "key2", "key3" }, "item3"));
System.Collections.Generic .List<Item > foundItems = items.FindAll(delegate(Ite m item)
{
bool contains = false;
foreach (string keyToSearch in keysToSearch)
{
contains = item.keys.Contains(keyToSe arch);
if (!contains)
{
return false;
}
}
return contains;
});
foreach (Item foundItem in foundItems)
{
System.Diagnostics.Trace.W riteLine(f oundItem.n ame);
}
}
the next thing to implement would be hashing.
private class Item
{
public Item()
{
keys = new System.Collections.Generic
name = String.Empty;
}
public Item(string[] keys, string value) : this()
{
this.keys.AddRange(keys);
this.name = value;
}
public System.Collections.Generic
public string name;
}
private void Test()
{
System.Collections.Generic
keysToSearch.Add("key1");
keysToSearch.Add("key3");
System.Collections.Generic
items.Add(new Item(new string[] { "key1", "key2" }, "item1"));
items.Add(new Item(new string[] { "key1", "key3" }, "item2"));
items.Add(new Item(new string[] { "key2", "key3" }, "item3"));
System.Collections.Generic
{
bool contains = false;
foreach (string keyToSearch in keysToSearch)
{
contains = item.keys.Contains(keyToSe
if (!contains)
{
return false;
}
}
return contains;
});
foreach (Item foundItem in foundItems)
{
System.Diagnostics.Trace.W
}
}
the next thing to implement would be hashing.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
If your above code is ok for you performance wise (O(n)) then just use IEnumerable<T> Filter(Fun<T,bool> predicate) defined on DictionaryBase in c5 to do the same thing with a predicate
Thanks Greg.
Jim
Jim
ASKER
currently the performance is ok. i have made test with 150k items - searches are made instantly.
but if i need more performance in future what should i do?
but if i need more performance in future what should i do?
use a custom (or modify their binary tree o support, ah the wonder of open source) ...
When doing a non-kyed or predicate search (i.e. I have no idea) I am forced to do a linear walk (all items in a hash table or all items in a tree) ...
If however my field being searched is in the form XXX* and its my KEY I can use a binary tree to find the root (if any) that partially matches XXX nodes below this node will match XXX*. I can find this node as a binary search O(log n) and can search nodes below ...
If you want I can go through a partial match seach (or probably write you a method for the sorteddictionary very quickly) but its pretty straight forward to implement ... consider the following tree (in ascii notation)
mom
dad
cat
aardvark
bob
dog
pink
open
young
if we search for d*... we visit mom, then left->dad which meets d* ... all d* nodes will be under dad. This is true at any level and with *lots* of nodes can save us alot of time (lots: read millions+ or a slow processor :))
Cheers,
Greg
When doing a non-kyed or predicate search (i.e. I have no idea) I am forced to do a linear walk (all items in a hash table or all items in a tree) ...
If however my field being searched is in the form XXX* and its my KEY I can use a binary tree to find the root (if any) that partially matches XXX nodes below this node will match XXX*. I can find this node as a binary search O(log n) and can search nodes below ...
If you want I can go through a partial match seach (or probably write you a method for the sorteddictionary very quickly) but its pretty straight forward to implement ... consider the following tree (in ascii notation)
mom
dad
cat
aardvark
bob
dog
pink
open
young
if we search for d*... we visit mom, then left->dad which meets d* ... all d* nodes will be under dad. This is true at any level and with *lots* of nodes can save us alot of time (lots: read millions+ or a slow processor :))
Cheers,
Greg