Link to home
Start Free TrialLog in
Avatar of guyneo
guyneo

asked on

Add dynamically to an array and compare

Hi Experts,
I need advice on what is the best and fastest way to do this. I have a huge text file. has millions of rows. I maintain two lists A and B. I need to read each line and compare if the  values  in   ColumnA exists in list A . if it is not in the list. I add to it. If it is already there I skip adding to the list and proceed to the next line. I do the same with column B. Once the length of the listA or list B reaches a threshold say 500. I output the lists and reinitialize them and keep going.
What datastructure is best suited for this?  arrays or collections or?  I  like to process the whole log file as quickly as possible.
Any help is very much appreciated.
SOLUTION
Avatar of jurobot
jurobot

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Avatar of Gurvinder Pal Singh
Gurvinder Pal Singh
Flag of India image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of guyneo
guyneo

ASKER

Thanks jurobot and gurvinder for your input.
@jurobot. HashSet offers constant time but is it on the whole faster than other options? say ArrayList.
The operations I would do mostly are contains and add?
@gurvinder The automatic removal of duplicates in HashSet seems like a nice feature. However for my current task I need to know when there is duplicate. I didnot explain all my requirements for the sake of keeping small and clear. I did implement it in ArrayLists, it worked fine. definitely usable. Ofcourse looking for a better solution keeps the learning going :)
hi guyneo,

I guess that time complexity for all array-like implementations is linear running time for operation 'contains'. Constant time for operation 'add' (i.e. adding n elements requires O(n) time).

Set implementations using hash indexing offers constant time for operations like 'add' and 'contains'.

So from my point of view the answer for you is chooding between array-like implementation (ArrayList is array-like implementation) and hash indexing implementation.

cheers
JS
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of guyneo

ASKER

Thanks for all your help Guys. I got help in time. For sake of learning, I tried various approaches. For my particular task,dataset, it didnot make too much of a difference so as to prefer one over the other. Ofcourse other people might find it different. If so please do  share your experiences.