Solved

Add dynamically to an array and compare

Posted on 2010-08-20
7
333 Views
Last Modified: 2013-11-23
Hi Experts,
I need advice on what is the best and fastest way to do this. I have a huge text file. has millions of rows. I maintain two lists A and B. I need to read each line and compare if the  values  in   ColumnA exists in list A . if it is not in the list. I add to it. If it is already there I skip adding to the list and proceed to the next line. I do the same with column B. Once the length of the listA or list B reaches a threshold say 500. I output the lists and reinitialize them and keep going.
What datastructure is best suited for this?  arrays or collections or?  I  like to process the whole log file as quickly as possible.
Any help is very much appreciated.
0
Comment
Question by:guyneo
7 Comments
 
LVL 2

Assisted Solution

by:jurobot
jurobot earned 175 total points
ID: 33487812
hi,
use collections, for exmaple HashSet that offers constant time performance for the basic operations (add, remove, contains and size)...

Note: once the length of the listA or listB reaches the treshold, you will "forget" what values have been already processed. Think about it - if it has meaning...

cheers
JS
0
 
LVL 40

Accepted Solution

by:
gurvinder372 earned 175 total points
ID: 33487850
Use java.util.Set, since if you add any item in the Set, duplicates are removed automatically, so no comparison is required to be done, all you need to do is
Set set = new HashSet();
set.add(item);

another approach could be to keep appending the items to an string to make a comma separated string, and when the 500 elements are processed, split the string to make an array of strings and then add that array to Set, then you can count if the number of items in resulting Set has reached 500 or not,

String[] array = {"Happy", "New", "Year", "2006"};
set.addAll( Arrays.asList(array);
array = set.toArray();
this array is now having unique elements only, then you can continue untill the 500 mark is reached.


0
 
LVL 1

Author Comment

by:guyneo
ID: 33495417
Thanks jurobot and gurvinder for your input.
@jurobot. HashSet offers constant time but is it on the whole faster than other options? say ArrayList.
The operations I would do mostly are contains and add?
@gurvinder The automatic removal of duplicates in HashSet seems like a nice feature. However for my current task I need to know when there is duplicate. I didnot explain all my requirements for the sake of keeping small and clear. I did implement it in ArrayLists, it worked fine. definitely usable. Ofcourse looking for a better solution keeps the learning going :)
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 
LVL 2

Expert Comment

by:jurobot
ID: 33499265
hi guyneo,

I guess that time complexity for all array-like implementations is linear running time for operation 'contains'. Constant time for operation 'add' (i.e. adding n elements requires O(n) time).

Set implementations using hash indexing offers constant time for operations like 'add' and 'contains'.

So from my point of view the answer for you is chooding between array-like implementation (ArrayList is array-like implementation) and hash indexing implementation.

cheers
JS
0
 
LVL 6

Assisted Solution

by:__geof__
__geof__ earned 50 total points
ID: 33509322
Hashset returns false if the item already exist in the set so I would use the solution from gurvinder372 and check the return value.
0
 
LVL 3

Assisted Solution

by:hazgoduk
hazgoduk earned 100 total points
ID: 33566373
I always use HashMap or LinkedHashMap (if you need to retain order) for this.

LinkedHashMap<String, String> a = new LinkedHashMap();

a.put("data", "");

You can either check if a.containsKey("") if you need to know if it's a duplicate or if it doesn't matter just put and it won't have duplicates.
0
 
LVL 1

Author Closing Comment

by:guyneo
ID: 33762009
Thanks for all your help Guys. I got help in time. For sake of learning, I tried various approaches. For my particular task,dataset, it didnot make too much of a difference so as to prefer one over the other. Ofcourse other people might find it different. If so please do  share your experiences.
0

Featured Post

Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

Join & Write a Comment

An old method to applying the Singleton pattern in your Java code is to check if a static instance, defined in the same class that needs to be instantiated once and only once, is null and then create a new instance; otherwise, the pre-existing insta…
Are you developing a Java application and want to create Excel Spreadsheets? You have come to the right place, this article will describe how you can create Excel Spreadsheets from a Java Application. For the purposes of this article, I will be u…
Video by: Michael
Viewers learn about how to reduce the potential repetitiveness of coding in main by developing methods to perform specific tasks for their program. Additionally, objects are introduced for the purpose of learning how to call methods in Java. Define …
Viewers learn about the scanner class in this video and are introduced to receiving user input for their programs. Additionally, objects, conditional statements, and loops are used to help reinforce the concepts. Introduce Scanner class: Importing…

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now