Link to home
Start Free TrialLog in
Avatar of shankarkrupa
shankarkrupa

asked on

Fastest way to search - 2D array!

Hi all...

I have a 2D array.

arr(1,0)="emp1"
arr(1,1)=20
arr(1,2)="dept"
arr(2,0)="emp2"
arr(2,1)=34
arr(2,2)="pur"
arr(3,0)="emp3"
arr(3,1)=32
arr(3,2)="sal"

and so on...something like that.  So when I get a string argument, I have to return "emp3" or "32" (either one...as specified).

Public Function GetArrayEquivalent(ByVal strString As String) As String
Dim strTmp As String
Dim lngStart As Long
Dim lngEnd As Long

lngEnd = 1000 'Number of array items/rows

strTmp = strString

For lngStart = 1 To lngEnd
    If strString  = arr(lngStart, 1) Then strTmp = arr(lngStart, 0): Exit For
Next
GetArrayEquivalent = strTmp
End Function

That method is okay.  But when I use it for searching in bulk, the program becomes very slow.  Of course there is the possibility of DoEvents, but still I feel I am not doing it correctly.  Am I correct?

So is there any way to search an array speedily?

(If I use Collections also, maybe the same thing happens internally...like the same search takes place comparing every array item till it finds the particular item?.)

krupa
Avatar of ___XXX_X_XXX___
___XXX_X_XXX___

Do...Loop sometimes is faster than For...Next

Fastest way is to use precompiled assembler code to search array for you, but it is hard to write such code, especially for searching strings when you search multilanguage strings (not only Latin characters).
SOLUTION
Avatar of Mike Tomlinson
Mike Tomlinson
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
VB.Collection includes fast binary search and you can use it like this:
     Dim colIndices As collection
     ' build a collection
     Set colIndices = New collection
     For i=1 to ubound(arr,1)
         colIndices.Add i, "k" & arr(i,0)
     Next

It will hold the right indexes, e.g. colIndices.Item("kemp1") will return 1, so you can do this:
     myidx = colIndices.Item("kemp1")
     MsgBox arr(myidx,1) & vbCrLf & arr(myidx,2)

"k" is used to make sure key is a string, when adding to collection.

You can also use other, more complicated methods, e.g. "Speed up searches with hash tables"
http://www.vb2themax.com/Item.asp?PageID=TipBank&ID=412

Also, 2D arrays in VB are a bit slow, I suggest using three 1D arrays instead (arrNames, arrAge, arrSomething), each with its best data type, if possible.
For author:
Remember that you CAN'T add two items in collection with same keys. So you can't add this to your collection:

colCollection.Add "Some number","Key1"
colCollection.Add "Some number","Key2"
colCollection.Add "Some number","Key3"
colCollection.Add "Some number","Key2" ' An error will occur here because the key is already associated with item in collection.

Does you 2D array contains same values ? Like
arr(1,0)="emp1"
arr(1,1)=20
arr(1,2)="dept"
arr(2,0)="emp2"
arr(2,1)=34
arr(2,2)="pur"
arr(3,0)="emp3"
arr(3,1)=32
arr(3,2)="pur" ' The same value as arr(2,2), so you cannot use "pur" as key for your collection twice.


Avatar of shankarkrupa

ASKER

___XXX_X_XXX___:

Thanks for the suggestion. But machine code...me...? I am out. :-(

I tested using the Do/While..but I could not see a remarkable difference.

By the way, of course you are correct. I am searching Unicode characters (non-English). :-)

Idle_Mind:

I have not made the entire array contents into a collection object still.  I will feed all the entries first.

Actually, the search will not be based on just the third item...it can be anything.  Sometimes it may to search by giving the "emp1" string and search for the other two items, sometimes it may be "20" and search for the first and third items, sometimes it will be "dept" to search for the first or second items.

So does it mean I have to make three copies of the content and look in the appropriate collection item?

I fear it will affect the speed...

Not necessarily that I should have an array/collection sort of things. It can be anything.  Either a speedier search subroutine, or some other format/method to save the data and retrieve?

What is the best method?

krupa
You can use 3 collections, or, to search anywhere, add other items to the same collection:
     For i=1 to ubound(arr,1)
         colIndices.Add i, "k" & arr(i,1)
     Next
     For i=1 to ubound(arr,1)
         colIndices.Add i, "k" & arr(i,2)
     Next
or add the same item three times into one collection and use each value as the key:

Set f = New Class1
f.name = "emp1"
f.id = 20
f.key = "dept"
data.Add f, f.name
data.Add f, f.id
data.Add f, f.key

Even though the object has been added three times, they all reference the same object.  If you make a change however, then you would have to remove all three references using the old values and then add them back in using the new values since a collection items key cannot be changed.

As mentioned before, you can only have one item per collection with a specific key.  So this approach will only work if there are no duplicate values across your entire data set.

Idle_Mind
you would have to change

    data.Add f, f.id

to

    data.Add f, Cstr(f.id)

because a key must be a string value
Why not store the information in a database and use its built in capabilities to find the data?
ameba:
---------

Your code apart, the link you gave also threw some knowledge in my mind.  Thanks.

But will this be speedier than the array method?

I am working on to convert all the 2D array items to three separate arrays, though.


___XXX_X_XXX___:
----------------------------

Of course no array item has the same value...everything is unique.  The example I gave (the employee, department, and age) is just for an example.  In reality, I am dealing with an unicode editor for my language.  We already have an 8-bit encoding standard.

While moving to unicode, we need to take care of all the three things - the Roman script, the-now-old 8-bit encoding, and the unicode.  I was afraid it may complicate  things further actually...that is why I translated the real thing to this dummy employee table.  I am very sorry if I did anything wrong, honestly I did not intend to present a false data.

Idle_Mind:
-------------

Having a unique key is not a problem for this program.  Because, as I said, it involves characters (of a spoken/written language, not the computer language :-)) so each letter is unique in itself.  All I am worried is the speed.  Really even arrays do not fit in the picture when it comes to large data transfer. :-((

Well, I should try it with collections + 1D array first, as ameba suggested.

BlueDevilFan:
-----------------

Oh no...accessing things from database will be even slower.  I am looking for someway to speed up things even quicker...quicker even than arrays or doing it better with arrays/collections itself.

krupa
Hello all...

I am facing a very strange problem.  When I tried to load the bunch of data to a Collection object/variable, I get an error that the key has already been defined.

I am pretty sure that the ascii values of the letters I loaded are different, but it says that it has already been added and identifies a particular set of letters with another one.

Seems I have to satisfy myself with just arrays...:-((

krupa
Then you cannot use collection for your binary search - its key is not case-sensitive.
You can use mentioned hash search, or sort your array and use binary search like
"BinarySearch - Fast search in a sorted array"
http://www.vb2themax.com/Item.asp?PageID=CodeBank&ID=17

Linear search can be fast sometimes, e.g. if you are comparing LONGs, 1000 comparisons will be done in less than 1 ms - there's no point in adding to complexity, to reduce number of comparisons to 11-12, if you are happy with the speed.
When I try to add the data to the collection, I get the error that the key has already been added.  Apparently the key is not the same one.

The ascii value is different, apparently.

It is a big load of data, however I am posting some of it here.  When I try to load it to the collection, I get the error.

arr(1, 0) = "«"
arr(2, 0) = "¬ "
arr(3, 0) = "þ"
arr(4, 0) = "®"
arr(5, 0) = "¯"
arr(6, 0) = "°"
arr(7, 0) = "±"
arr(8, 0) = "²"
arr(9, 0) = "³"
arr(10, 0) = "´"
arr(11, 0) = "µ"
arr(12, 0) = "¶"
arr(13, 0) = "·"
arr(14, 0) = "ì"
arr(15, 0) = "í"
arr(16, 0) = "î"
arr(17, 0) = "ï"
arr(18, 0) = "ð"
arr(19, 0) = "ñ"
arr(20, 0) = "ò"
arr(21, 0) = "ó"
arr(22, 0) = "ô"
arr(23, 0) = "õ"
arr(24, 0) = "ö"
arr(25, 0) = "÷"
arr(26, 0) = "ø"
arr(27, 0) = "ù"
arr(28, 0) = "ú"
arr(29, 0) = "û"
arr(30, 0) = "ü"
arr(31, 0) = "ý"
arr(32, 0) = "Š"
arr(33, 0) = "‹"
arr(34, 0) = "ˆ"
arr(35, 0) = "Œ"
arr(36, 0) = "‰"
arr(37, 0) = "‚"
arr(38, 0) = "¸"
arr(39, 0) = "¸¡"
arr(40, 0) = "¸¢"
arr(41, 0) = "¸£"
arr(42, 0) = "Ì"
arr(43, 0) = "Ü"
arr(44, 0) = "¦¸"
arr(45, 0) = "§¸"
arr(46, 0) = "¨¸"
arr(47, 0) = "¦¸¡"
arr(48, 0) = "§¸¡"
arr(49, 0) = "¦¸ª"
arr(50, 0) = "¹"
arr(51, 0) = "¹¡"
arr(52, 0) = "¹¢"
arr(53, 0) = "¹£"
arr(54, 0) = "™"
arr(55, 0) = "›"
arr(56, 0) = "¦¹"
arr(57, 0) = "§¹"
arr(58, 0) = "¨¹"
arr(59, 0) = "¦¹¡"
arr(60, 0) = "§¹¡"
arr(61, 0) = "¦¹ª"
arr(62, 0) = "º"
arr(63, 0) = "º¡"
arr(64, 0) = "º¢"
arr(65, 0) = "º£"
arr(66, 0) = "Í"
arr(67, 0) = "Ý"
arr(68, 0) = "¦º"
arr(69, 0) = "§º"
arr(70, 0) = "¨º"
arr(71, 0) = "¦º¡"
arr(72, 0) = "§º¡"
arr(73, 0) = "¦ºª"
arr(74, 0) = "»"
arr(75, 0) = "»¡"
arr(76, 0) = "Ȣ"
arr(77, 0) = "ȣ"
arr(78, 0) = "š"
arr(79, 0) = "œ"
arr(80, 0) = "¦»"
arr(81, 0) = "§»"
arr(82, 0) = "¨»"
arr(83, 0) = "¦»¡"
arr(84, 0) = "§»¡"
arr(85, 0) = "¦»ª"

krupa
As I said, in that case you cannot use collection.  arr(14,0) and arr(42,0) are the same for the collection.
You'll have to use different method.
Vb2themax site changed and the links are not working any more.
Not sure if finding other examples will help you, or you already decided you'll use just array.
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hello Ameba...

With this thread, I learned new things....it is nice.

Of course I am not an enemy to Hashtable/collections to decide not to use it. :-)
 
(just kidding)...

But there is not a remarkable performance speed when it comes to bulk search & replace....like processing 4000 or so characters.

However, I am thankful for all these offered help.

And I want to give points (split?) to both Ameba & Idle_Mind.  But I fear one of you will reject it if I post another linke for the points.  Promise (anyone of you) that you will accept if I post like "Points for <you>"...


Krupa
You can split the points without creating a new question.

Take a look here to find out how: https://www.experts-exchange.com/Programming/Programming_Languages/Visual_Basic/help.jsp#hi69

Idle_Mind
You can use Split feature - make sure grade is A, if you think it is not, you can ask for clarification.
Thanks!