Link to home
Start Free TrialLog in
Avatar of gregasm
gregasm

asked on

string array challenge

I have a string array containing repeating values.

I would like to process the array and leave only distinct string values.

What's an algorithm or built in method of the string class that can do this for me?

SOLUTION
Avatar of DonRameshSachin
DonRameshSachin
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
So the latest Array will have only the distinct values

Don
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Avatar of David H.H.Lee
David H.H.Lee
Flag of Malaysia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of gregasm
gregasm

ASKER

Thank you all for the responses, those are all great solutions! Thanks, I particularly like the arraylist solutions, although the hashtable implementations are also very elegant.
OK, gregasm, now that you have all these great solutions, why don't you do an exhaustive comparison of space and time performance between each of the implementation and report back to us!  ( Wink, wink)  :-)

[ Seriously, that would be useful, if someone with lots of spare time wants to take it on, and write an article. ]

gregasm, I'm surprised by the number of good solutions you got.  They're all valid approaches, each with it's own strengths and weaknesses.  I like them all.
Hi,
 I too agree with farsight.

Avatar of gregasm

ASKER

I am suprised by the quality of the solutions presented here too! Experts on Experts exchange rock!
Avatar of gregasm

ASKER

err... that is to say, YOU GUYS ROCK! =]]
farsight, you've got me going! I am off to do these comparisons. Will post the results in a day or two. Watch this thread!

Off the top of my head though, I would state this: iboutchkine's approach is superior to my 11 o'clock-at-night, bleary-eyed solution (how's that for an excuse? :-) ); mine involves throwing exceptions, which is a BAD idea - more duplicates, more exceptions - to be avoided.

As for the toss off between Hashtable and Arraylist approach, I guess the Hashtable might be faster since the duplicate check is only O(1) in Hashtable but O(n) in Arraylists. So larger the source array, slower the performance of the Arraylists. But again, I have not factored in the cost of creating both the hashtables and arraylists.

As I said, I will do a more comprehensive test on these and post the results.

Cheers!
Karun.
> farsight, you've got me going!
Great KarunSK!  I'm glad you took the reins and are off running.
KarunSK,
 its great, carry on, waiting for the result
Don
Sorry guys that 2 days became 2 weeks, but you would understand that such is a developer's life!

OK. The results are in, and you will not believe it!

I tested four approaches:
1. I combined Don and farsight's approaches, since they were similar;
2. iboutchkine's solution involving hashtables and Contains() check
3. x_com's using arraylist and Contains() check
4. My solution using hashtables and exception.

I just put each approach in a function and called it in a loop that ran 100,000 times. Result:

Approach 1: took about 28 million ticks
Approach 2: took about 35 million ticks
Approach 3: took about 28 million ticks
Approach 4: took about 27 million ticks!!!

Isn't this strange? What is surprising is that the results are consistent! Everything I learnt about exceptions being costly flies in the face of these results. Mind too, that the exceptions would be raised almost always, after the set gets filled up.

I feel like starting a new thread to discuss this. What say?

Karun.
I'm surprised at how close they all are.  Even the "outlier" is only 130% of the fastest one.  I was expecting a lot more variation.

> I feel like starting a new thread to discuss this. What say?
Sure, do that.   And post the code so we can look for anomolies.
And please, leave a link to the thread here, also.

One possible explanation is that the other approaches are also using exceptions, hidden internally.
It might be interesting to test various alterations to the initial data, too.
  Change number of items in the array.
  Change the length of the strings in the array.
  Change the "ordered-ness" or the strings in the array.
  Change the number of loops.
  Running "empty" loops, to see how much overhead the loops introduce.
  etc.
Regarding the hashtable method:

As an alternative you can use the sortedlist object - and the returned array will then be sorted.