Link to home
Start Free TrialLog in
Avatar of sector
sector

asked on

Best way of doing it.

If i have a list of integers that are not sequenced but arranged from the smaller number to the biggest one ( up to 250 number ) and i want to find a number to see if it is in the list.
What is the better container to use here ( vecotr ,array , hashtable , map etc.. ) and why ?
Avatar of Iexpert
Iexpert

hashtable since all 250 numbers not present and
gives fast access
ASKER CERTIFIED SOLUTION
Avatar of nietod
nietod

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of sector

ASKER

Why is a hashtable faster than just putting the numbers into an array like this :
int[] local = { 1,250,450,22 };

Avatar of sector

ASKER

but i get the data already sorted.
Does that change the picture ?
>> Why is a hashtable faster than just putting
>> the numbers into an array like this
When hashtables perform well, they are extremely fast at finding data.  Often much faster than a binary search.  In many cases, the data can be found in a single operation.  However, if you also need to access the data sequentualy, that is nearly impossible to do with a hash table, in that case a set would be better.   If you just need to search the data a hashtable is probably the best option, except there is not one included in STL, you would have to write it yourself.

>> i get the data already sorted.
>> Does that change the picture ?
Not really.   vector<> cannot be searched quickly.  (If the data is sorted, it would be possible to write a binary search for a vector<>, but there is no built in search for this, so you have to write your own or use the standard linear search, which is slow.)  The same is true of the list<> container.  Only map<> and set<> can be searched quickly and map<> needs that "extra value" so set<> is the right choice.
there is a binary_search algorithm in the STL, which can be used with the standard vector as long as it is sorted.  Access to the vector is fast, it has random access iterators, so it can use the indexing operator in O(1) time.

The following code works fine:

#include <algorithm>
#include <vector>

using namespace std;

void main()
{
    vector<int> vi;
    vi.push_back(7);
    vi.push_back(4);
    vi.push_back(5);
    vi.push_back(9);
    vi.push_back(2);
    std::stable_sort(vi.begin(), vi.end());

    bool found = std::binary_search(vi.begin(), vi.end(), 5);
}
I didn't realize that STL has a binary search (of sorts--its not guaranteed to be binary).  However that still is more work than the set container.
A set is better as long as the data contains no duplicates (or it doesn't matter if the duplicates are removed), but you can use a multiset if that is not the case.

The binary_search will be binary on a sorted vector since random access iterators are used.
It uses forward iterators:

template<class ForwardIterator, class T>
bool binary_search(ForwardIterator first, ForwardIterator last, const T& value);

template<class ForwardIterator, class T, class Compare>
bool binary_search(ForwardIterator first, ForwardIterator last, const T& value, Compare comp);

Although the code shows that lower_bound() is used, which has a random access iterator version....
The docs I have say that binary_search performs in O(logn) time if you have random access iterators and O(n) otherwise.  As you say, the algorithm can make use of the fact that there are random access iterators.
My docs only say it uses at most log(n) + 2 *comparisions* Seems there are some differences?
SGI docs also say log(n) + 2 compares, a logaritmic number of steps for random access iterators and lineair number of steps otherwise.
Anyway, with integers that would qualify for a O(N) time when using non-random access iterators.

SGI docs also metion the existence of a 'Hash Associative Container'
I think you are correct in what you say about excecution times, what I said was a simplification.

The dinkumware version of STL (which is what you get with VC++, although I don't think the version with the compiler is up to date) also contains hash implementations of sets and maps.
The trouble with hash table versions is of course that you have to define a suitable hashing algorithm that will ensure that items in the table are evenly distributed, which usually requires knowledge of the number of hash buckets in the table etc.

A bad hashing algorithm can make the hash table very inefficient.

The worst hashing algorithm I have seen is:

return 1;

:)
SGI do have hash_[multi]map and hash_[multi]set containers, but these are not defined in the C++ standard. They're included with the GNU C++ compiler libs, and I guess you can get them from SGI and integrate them for use with any compiler you fancy.

For what you want, the hash_set would be the best choice, offering faster access than a plain set.

As for the hashing algorithm, it is possible to override the default one if your needs be. Look at
     http://www.sgi.com/Technology/STL/HashFunction.html
to see what SGi say about the whole thing. Go to the index of that directory for loads of good STL resources.

Leon