• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 623
  • Last Modified:

MoveNext()

Anybody know if it's any faster to MoveNext() on an IEnum when the collection object it refrences  is  an  Array  vs.  HashTable ?


Thanks.  
0
lblinc
Asked:
lblinc
  • 5
  • 2
  • 2
1 Solution
 
elticCommented:
While the HashTable is optimized for searching entries by their hash code,
it seems, that the HashTable is organized as a tree or something internally.
The memory structure of that hash table would be a pointer to childnode
structure which holds a reference to the table entry, while an array
(not ArrayList) is organized as an linear appearance of references
which can be processed in-line by the memory hardware. (New DDR2-RAMs
are only faster than DDR when you access the memory lineary).

IMHO, since you don't have hundres or thousands of elements in your
hashtable or your array, you will not realize any differences. If you have so,
I would recommend using an array, even it can't grow sequentially, but
it's faster if you want to iterate through all elements.

Rather than iterate through all elements, you should organize your data reasonable,
so you can apply search and sorting algorithms on it .
0
 
gregoryyoungCommented:
"While the HashTable is optimized for searching entries by their hash code,
it seems, that the HashTable is organized as a tree or something internally."

I am sorry but this line is just too much :) Why would a hash table use a tree internally? My guess would be that its implemented as a hash table internally


A hashtable uses an array internally .. but it does have slight overhead over an array for iteration as there are holes in the array ... Another option you might want to look at would be the possiblity of implementing a hashlist class A quick look with reflector shows the difference ..

conceptually ..

Hashtable has an array internally of size 100 ..

you insert an item that gets a hash of 20 .. it enters into bucket 20 a reference to the item.. (actually a holder struct called "Buckt" with the key and the value)
you insert an item that gets a hash of 115 .. it enters into bucket 20 a reference to the item..

The place where things get odd is in how the hashtable handles collisions (either append to the end of the list or move to next bucket etc)
you insert an item that gets a hash of 20 .. it enters into bucket 21 a reference to the item..

Now when it comes time to iterate .. it iterates through 100 items giving back the ones that are set (note I have a VERY low fill ratio here)

Array ...
public bool MoveNext()
{
      if (this._complete)
      {
            this.index = this.endIndex;
            return false;
      }
      this.index++;
      this.IncArray(); //dont worry about this, it only does stuff for multi-dimensioned arrays
      return !this._complete;
}




HashTable ...
public virtual bool MoveNext()
{
      if (this.version != this.hashtable.version)
      {
            throw new InvalidOperationException(Environment.GetResourceString("InvalidOperation_EnumFailedVersion"));
      }
      while (true)
      {
            if (this.bucket <= 0)
            {
                  this.current = false;
                  return false;
            }
            this.bucket--;
            object obj1 = this.hashtable.buckets[this.bucket].key;
            if ((obj1 != null) && (obj1 != this.hashtable.buckets))
            {
                  this.currentKey = obj1;
                  this.currentValue = this.hashtable.buckets[this.bucket].val;
                  this.current = true;
                  return true;
            }
      }
}

The differences between these two should not be huge in terms of performance providing you maintain a reasonable hash fill ratio (which the hash does automatically O(size) vs O(n)), the big difference occurs when you want to look up a value in the data structure where the hashtable is O(1) and the array is O(n).

As I mentioned earlier there are other data structures which can help alleviate such things .. Hash List is a great example.

Cheers,

Greg Young
0
 
lblincAuthor Commented:
eltic  ~  ok, thanks..    on similar topic, can you explain the difference (possibly in performance?) between using IEnumertor as in each of these 2 scenarios below?  

1)

  IEnumerator itr = someHashTable.Values.GetEnumerator();
      while (itr.MoveNext())
      {
             Symbol sym= (Symbol) itr.Current;
             sym.qUpdates.Enqueue("End");
       }

2)

   IEnumerator itr = someHashTable.GetEnumerator();
       while (itr.MoveNext())
       {
             Symbol sym= (Symbol) itr.Current;
             sym.qUpdates.Enqueue("End");
       }

0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
gregoryyoungCommented:
Values is an ICollection exposed by the hashtable ... it uses the enumerator off of the collection (which one would think is to use a list based enumerator .. it does not).

The Values property will create a new Values collection object pointing back to the hashtable (in other words it only encapsulates the hashtable it does not copy the data, it makes your hash table _look_ like a collection but all data is kept within the hashtable itself).

HashTable.ValueCollection.GetEnumerator ..

public virtual IEnumerator GetEnumerator()
{
      return new Hashtable.HashtableEnumerator(this._hashtable, 2);
}

When you call to get an enumerator off of the values collection it returns you the same enumerator you would have received for the original hashtable. As such Values would actually be slightly slower (an additional object creation) but they should be within any reasonable standard equivalent in performance (you increase you constant slightly but O(size) remains the same).

Another important difference is that the hashtable enumerator will return you keys and values (resulting in the creation of a DictionaryEntry object for each item) where as the values enumerator will only return values.

Cheers,

Greg Young
0
 
elticCommented:
gregoryyoung is absolutly right, it just returns the same Enumerator.
(And, by the way he's also right, that building a hashtable as a tree
makes no sense :)
0
 
gregoryyoungCommented:
The basic breakdown in difference can be summarized by

O(n) vs O(size)

because hashtables generally run with a fill ratio of < 50% this means that performance will be cut in 1/2 or more in many cases.

If you decide to implement this on your own the addition of two fields to the bucket class can make this alot faster .. as I have mentioned earlier.

Here is some quick pseudo code (not at all tested etc) to just show the general process you would want to follow. Note I am assuming that bucket here is a class even though it is a struct in reality ..

bucket
   object value
   object key

keep

bucket
   object value
   object key
   bucket previous
   bucket next

then in your class keep a reference to the last inserted item ..
   bucket lastinserted

so on my first insert ..
public void Insert(object _Key, object _Value) {
    //insertintohash
    if(lastinserted != null) {
        lastinserted.Next = bucketIjustinserted;
    }
    lastinserted = bucketijustinserted;
    bucketijustinserted.next = null;
}

then on Remove
public void Remove(object _Key) {
    //remove from hash
    tmp = itemijustremoved.Previous
    if(itemijustremoved.Previous != null) {
        itemijustremoved.Next = itemijustremoved.Next;
    }
    if(itemijustremoved.Next != null) {
        itemijustremoved.Next.Previous = tmp;
    }
    if(itemijustremoved == lastinserted) {
          if(itemijustremoved.Previous != null) {
               lastinserted = itemijustremoved;
          } else {
               lastinserted = null;
          }
    }
}

Since order is not of much importance to us for the iterator we just store the reference to the last item we inserted .. we can then for an iterator do ..

bucket tmp = lastinserted
do {
} while tmp.Previous != null;

which will give us a O(n) iteration in our hashtable. there are many similar implementations to this.

Again this is not tested code (its not even real code) .. I may have missed an if etc but the general concept is correct. If you remember back to Data Structures course, what we have done is implemented is a double linked list inside of a hash table.
0
 
gregoryyoungCommented:
heh@getting a B for the answer.
0
 
lblincAuthor Commented:
gregoryyoung,  sorry about that..  it was a quick once-over on this post to close it out..  your answer was definitely very thorough..    i will have it changed to an 'A' for you.      
0
 
gregoryyoungCommented:
np I was just curious what more I could have possibly said :)
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

  • 5
  • 2
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now