Go Premium for a chance to win a PS4. Enter to Win

x
?
Solved

Add dynamically to an array and compare

Posted on 2010-08-20
7
Medium Priority
?
343 Views
Last Modified: 2013-11-23
Hi Experts,
I need advice on what is the best and fastest way to do this. I have a huge text file. has millions of rows. I maintain two lists A and B. I need to read each line and compare if the  values  in   ColumnA exists in list A . if it is not in the list. I add to it. If it is already there I skip adding to the list and proceed to the next line. I do the same with column B. Once the length of the listA or list B reaches a threshold say 500. I output the lists and reinitialize them and keep going.
What datastructure is best suited for this?  arrays or collections or?  I  like to process the whole log file as quickly as possible.
Any help is very much appreciated.
0
Comment
Question by:guyneo
7 Comments
 
LVL 2

Assisted Solution

by:jurobot
jurobot earned 700 total points
ID: 33487812
hi,
use collections, for exmaple HashSet that offers constant time performance for the basic operations (add, remove, contains and size)...

Note: once the length of the listA or listB reaches the treshold, you will "forget" what values have been already processed. Think about it - if it has meaning...

cheers
JS
0
 
LVL 40

Accepted Solution

by:
Gurvinder Pal Singh earned 700 total points
ID: 33487850
Use java.util.Set, since if you add any item in the Set, duplicates are removed automatically, so no comparison is required to be done, all you need to do is
Set set = new HashSet();
set.add(item);

another approach could be to keep appending the items to an string to make a comma separated string, and when the 500 elements are processed, split the string to make an array of strings and then add that array to Set, then you can count if the number of items in resulting Set has reached 500 or not,

String[] array = {"Happy", "New", "Year", "2006"};
set.addAll( Arrays.asList(array);
array = set.toArray();
this array is now having unique elements only, then you can continue untill the 500 mark is reached.


0
 
LVL 1

Author Comment

by:guyneo
ID: 33495417
Thanks jurobot and gurvinder for your input.
@jurobot. HashSet offers constant time but is it on the whole faster than other options? say ArrayList.
The operations I would do mostly are contains and add?
@gurvinder The automatic removal of duplicates in HashSet seems like a nice feature. However for my current task I need to know when there is duplicate. I didnot explain all my requirements for the sake of keeping small and clear. I did implement it in ArrayLists, it worked fine. definitely usable. Ofcourse looking for a better solution keeps the learning going :)
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
LVL 2

Expert Comment

by:jurobot
ID: 33499265
hi guyneo,

I guess that time complexity for all array-like implementations is linear running time for operation 'contains'. Constant time for operation 'add' (i.e. adding n elements requires O(n) time).

Set implementations using hash indexing offers constant time for operations like 'add' and 'contains'.

So from my point of view the answer for you is chooding between array-like implementation (ArrayList is array-like implementation) and hash indexing implementation.

cheers
JS
0
 
LVL 6

Assisted Solution

by:__geof__
__geof__ earned 200 total points
ID: 33509322
Hashset returns false if the item already exist in the set so I would use the solution from gurvinder372 and check the return value.
0
 
LVL 3

Assisted Solution

by:hazgoduk
hazgoduk earned 400 total points
ID: 33566373
I always use HashMap or LinkedHashMap (if you need to retain order) for this.

LinkedHashMap<String, String> a = new LinkedHashMap();

a.put("data", "");

You can either check if a.containsKey("") if you need to know if it's a duplicate or if it doesn't matter just put and it won't have duplicates.
0
 
LVL 1

Author Closing Comment

by:guyneo
ID: 33762009
Thanks for all your help Guys. I got help in time. For sake of learning, I tried various approaches. For my particular task,dataset, it didnot make too much of a difference so as to prefer one over the other. Ofcourse other people might find it different. If so please do  share your experiences.
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Are you developing a Java application and want to create Excel Spreadsheets? You have come to the right place, this article will describe how you can create Excel Spreadsheets from a Java Application. For the purposes of this article, I will be u…
Introduction This article is the last of three articles that explain why and how the Experts Exchange QA Team does test automation for our web site. This article covers our test design approach and then goes through a simple test case example, how …
Viewers will learn about arithmetic and Boolean expressions in Java and the logical operators used to create Boolean expressions. We will cover the symbols used for arithmetic expressions and define each logical operator and how to use them in Boole…
How to fix incompatible JVM issue while installing Eclipse While installing Eclipse in windows, got one error like above and unable to proceed with the installation. This video describes how to successfully install Eclipse. How to solve incompa…
Suggested Courses

971 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question