asked on

MoveNext()

Anybody know if it's any faster to MoveNext() on an IEnum when the collection object it refrences is an Array vs. HashTable ?

Thanks.

eltic

While the HashTable is optimized for searching entries by their hash code,
it seems, that the HashTable is organized as a tree or something internally.
The memory structure of that hash table would be a pointer to childnode
structure which holds a reference to the table entry, while an array
(not ArrayList) is organized as an linear appearance of references
which can be processed in-line by the memory hardware. (New DDR2-RAMs
are only faster than DDR when you access the memory lineary).

IMHO, since you don't have hundres or thousands of elements in your
hashtable or your array, you will not realize any differences. If you have so,
I would recommend using an array, even it can't grow sequentially, but
it's faster if you want to iterate through all elements.

Rather than iterate through all elements, you should organize your data reasonable,
so you can apply search and sorting algorithms on it .

gregoryyoung

"While the HashTable is optimized for searching entries by their hash code,
it seems, that the HashTable is organized as a tree or something internally."

I am sorry but this line is just too much :) Why would a hash table use a tree internally? My guess would be that its implemented as a hash table internally

A hashtable uses an array internally .. but it does have slight overhead over an array for iteration as there are holes in the array ... Another option you might want to look at would be the possiblity of implementing a hashlist class A quick look with reflector shows the difference ..

conceptually ..

Hashtable has an array internally of size 100 ..

you insert an item that gets a hash of 20 .. it enters into bucket 20 a reference to the item.. (actually a holder struct called "Buckt" with the key and the value)
you insert an item that gets a hash of 115 .. it enters into bucket 20 a reference to the item..

The place where things get odd is in how the hashtable handles collisions (either append to the end of the list or move to next bucket etc)
you insert an item that gets a hash of 20 .. it enters into bucket 21 a reference to the item..

Now when it comes time to iterate .. it iterates through 100 items giving back the ones that are set (note I have a VERY low fill ratio here)

Array ...
public bool MoveNext()
{
if (this._complete)
{
this.index = this.endIndex;
return false;
}
this.index++;
this.IncArray(); //dont worry about this, it only does stuff for multi-dimensioned arrays
return !this._complete;
}

HashTable ...
public virtual bool MoveNext()
{
if (this.version != this.hashtable.version)
{
throw new InvalidOperationException(Environment.GetResourceString("InvalidOperation_EnumFailedVersion"));
}
while (true)
{
if (this.bucket <= 0)
{
this.current = false;
return false;
}
this.bucket--;
object obj1 = this.hashtable.buckets[this.bucket].key;
if ((obj1 != null) && (obj1 != this.hashtable.buckets))
{
this.currentKey = obj1;
this.currentValue = this.hashtable.buckets[this.bucket].val;
this.current = true;
return true;
}
}
}

The differences between these two should not be huge in terms of performance providing you maintain a reasonable hash fill ratio (which the hash does automatically O(size) vs O(n)), the big difference occurs when you want to look up a value in the data structure where the hashtable is O(1) and the array is O(n).

As I mentioned earlier there are other data structures which can help alleviate such things .. Hash List is a great example.

Cheers,

Greg Young

lblinc

ASKER

eltic ~ ok, thanks.. on similar topic, can you explain the difference (possibly in performance?) between using IEnumertor as in each of these 2 scenarios below?

1)

IEnumerator itr = someHashTable.Values.GetEnumerator();
while (itr.MoveNext())
{
Symbol sym= (Symbol) itr.Current;
sym.qUpdates.Enqueue("End");
}

2)

IEnumerator itr = someHashTable.GetEnumerator();
while (itr.MoveNext())
{
Symbol sym= (Symbol) itr.Current;
sym.qUpdates.Enqueue("End");
}

gregoryyoung

Values is an ICollection exposed by the hashtable ... it uses the enumerator off of the collection (which one would think is to use a list based enumerator .. it does not).

The Values property will create a new Values collection object pointing back to the hashtable (in other words it only encapsulates the hashtable it does not copy the data, it makes your hash table _look_ like a collection but all data is kept within the hashtable itself).

HashTable.ValueCollection.GetEnumerator ..

public virtual IEnumerator GetEnumerator()
{
return new Hashtable.HashtableEnumerator(this._hashtable, 2);
}

When you call to get an enumerator off of the values collection it returns you the same enumerator you would have received for the original hashtable. As such Values would actually be slightly slower (an additional object creation) but they should be within any reasonable standard equivalent in performance (you increase you constant slightly but O(size) remains the same).

Another important difference is that the hashtable enumerator will return you keys and values (resulting in the creation of a DictionaryEntry object for each item) where as the values enumerator will only return values.

Cheers,

Greg Young

eltic

gregoryyoung is absolutly right, it just returns the same Enumerator.
(And, by the way he's also right, that building a hashtable as a tree
makes no sense :)

ASKER CERTIFIED SOLUTION

gregoryyoung

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

gregoryyoung

heh@getting a B for the answer.

lblinc

ASKER

gregoryyoung, sorry about that.. it was a quick once-over on this post to close it out.. your answer was definitely very thorough.. i will have it changed to an 'A' for you.

gregoryyoung

np I was just curious what more I could have possibly said :)