Add dynamically to an array and compare

Hi Experts,
I need advice on what is the best and fastest way to do this. I have a huge text file. has millions of rows. I maintain two lists A and B. I need to read each line and compare if the  values  in   ColumnA exists in list A . if it is not in the list. I add to it. If it is already there I skip adding to the list and proceed to the next line. I do the same with column B. Once the length of the listA or list B reaches a threshold say 500. I output the lists and reinitialize them and keep going.
What datastructure is best suited for this?  arrays or collections or?  I  like to process the whole log file as quickly as possible.
Any help is very much appreciated.
LVL 1
guyneoAsked:
Who is Participating?
 
Gurvinder Pal SinghConnect With a Mentor Commented:
Use java.util.Set, since if you add any item in the Set, duplicates are removed automatically, so no comparison is required to be done, all you need to do is
Set set = new HashSet();
set.add(item);

another approach could be to keep appending the items to an string to make a comma separated string, and when the 500 elements are processed, split the string to make an array of strings and then add that array to Set, then you can count if the number of items in resulting Set has reached 500 or not,

String[] array = {"Happy", "New", "Year", "2006"};
set.addAll( Arrays.asList(array);
array = set.toArray();
this array is now having unique elements only, then you can continue untill the 500 mark is reached.


0
 
jurobotConnect With a Mentor Commented:
hi,
use collections, for exmaple HashSet that offers constant time performance for the basic operations (add, remove, contains and size)...

Note: once the length of the listA or listB reaches the treshold, you will "forget" what values have been already processed. Think about it - if it has meaning...

cheers
JS
0
 
guyneoAuthor Commented:
Thanks jurobot and gurvinder for your input.
@jurobot. HashSet offers constant time but is it on the whole faster than other options? say ArrayList.
The operations I would do mostly are contains and add?
@gurvinder The automatic removal of duplicates in HashSet seems like a nice feature. However for my current task I need to know when there is duplicate. I didnot explain all my requirements for the sake of keeping small and clear. I did implement it in ArrayLists, it worked fine. definitely usable. Ofcourse looking for a better solution keeps the learning going :)
0
Cloud Class® Course: Microsoft Office 2010

This course will introduce you to the interfaces and features of Microsoft Office 2010 Word, Excel, PowerPoint, Outlook, and Access. You will learn about the features that are shared between all products in the Office suite, as well as the new features that are product specific.

 
jurobotCommented:
hi guyneo,

I guess that time complexity for all array-like implementations is linear running time for operation 'contains'. Constant time for operation 'add' (i.e. adding n elements requires O(n) time).

Set implementations using hash indexing offers constant time for operations like 'add' and 'contains'.

So from my point of view the answer for you is chooding between array-like implementation (ArrayList is array-like implementation) and hash indexing implementation.

cheers
JS
0
 
__geof__Connect With a Mentor Commented:
Hashset returns false if the item already exist in the set so I would use the solution from gurvinder372 and check the return value.
0
 
hazgodukConnect With a Mentor Commented:
I always use HashMap or LinkedHashMap (if you need to retain order) for this.

LinkedHashMap<String, String> a = new LinkedHashMap();

a.put("data", "");

You can either check if a.containsKey("") if you need to know if it's a duplicate or if it doesn't matter just put and it won't have duplicates.
0
 
guyneoAuthor Commented:
Thanks for all your help Guys. I got help in time. For sake of learning, I tried various approaches. For my particular task,dataset, it didnot make too much of a difference so as to prefer one over the other. Ofcourse other people might find it different. If so please do  share your experiences.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.