Link to home
Create AccountLog in
Avatar of dleone617
dleone617

asked on

Need a better way to wildcard search a dictionary

C#.
I currently have a dictionary  defined as Dictionary<string, List<string>>
Basically the Key is a string and there can be multiple values for each key.
I have a need to search a key for only a partial key. For example if I have the keys  "A123", "A1239", "A12345", "A12378", "A12389", "A12396",  "A123345", "A124" I need a way to pull all values from any keys that begin with "A123***" and with a Key Length that <= 6.
Therefore it would return the values from "A123", "A1239", "A12345", "A12378", "A12389"

Now this seems pretty easy for a very short dictionary however my dictionary has over 300,000 Keys, and I may need to match over a million pieces of data to those keys.

I have tried using a HashTable, Dictionary, TST (Ternary Search Tree Implementation for C# by Jonathan de Halleux),  and .Mdb table with the dictionary and every implentation I have made is very very slow.

Maybe using a dictionary is not the answer or maybe a better approach of searching might be faster. Any suggestions with Code and examples would be appreciated.
Avatar of jandromeda
jandromeda
Flag of Sri Lanka image

What is your data source? I mean how you get the dictionary populated?
Avatar of dleone617
dleone617

ASKER

Currently the dictionary is populated through a text file. The Key is calculated based on the Text in the line read from the text file.
Loadding the dictionary in a Hash or Dictionary is reasonably fast and I have no problems with the length of time to load at this point.
Just thought I would also let you know the Key can be used as a primary key. I was thinking maybe creating a memory DataTable with two fields Key, Value.  The Key would be the Primary key. I think I can then filter it using a "like "A123%" And Length(Key) <= 6".. but I'm not sure. So that is why I asked for Advice and code to figure out the best way of accomplishing this.
It might be better if you split the data into a set of dictionaries while populating. This will reduce size of each dictionary. You may have a principle to split it for the most efficacious search. For example, a dictionary contains only key starts with 1 and length = 1, another contains key only starts with 1 and length = 2 and so on ... just an idea.
I wrote a small program to compare the time to extract the data from a Dictionary and a DataTable, and Dictionary is faster than the DataTable. So I think the dictionary is the best approach.
ASKER CERTIFIED SOLUTION
Avatar of Gururaj Badam
Gururaj Badam
Flag of India image

Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account
SOLUTION
Link to home
membership
Create an account to see this answer
Signing up is free. No credit card required.
Create Account