Solved

Add dynamically to an array and compare

Posted on 2010-08-20
7
335 Views
Last Modified: 2013-11-23
Hi Experts,
I need advice on what is the best and fastest way to do this. I have a huge text file. has millions of rows. I maintain two lists A and B. I need to read each line and compare if the  values  in   ColumnA exists in list A . if it is not in the list. I add to it. If it is already there I skip adding to the list and proceed to the next line. I do the same with column B. Once the length of the listA or list B reaches a threshold say 500. I output the lists and reinitialize them and keep going.
What datastructure is best suited for this?  arrays or collections or?  I  like to process the whole log file as quickly as possible.
Any help is very much appreciated.
0
Comment
Question by:guyneo
7 Comments
 
LVL 2

Assisted Solution

by:jurobot
jurobot earned 175 total points
ID: 33487812
hi,
use collections, for exmaple HashSet that offers constant time performance for the basic operations (add, remove, contains and size)...

Note: once the length of the listA or listB reaches the treshold, you will "forget" what values have been already processed. Think about it - if it has meaning...

cheers
JS
0
 
LVL 40

Accepted Solution

by:
gurvinder372 earned 175 total points
ID: 33487850
Use java.util.Set, since if you add any item in the Set, duplicates are removed automatically, so no comparison is required to be done, all you need to do is
Set set = new HashSet();
set.add(item);

another approach could be to keep appending the items to an string to make a comma separated string, and when the 500 elements are processed, split the string to make an array of strings and then add that array to Set, then you can count if the number of items in resulting Set has reached 500 or not,

String[] array = {"Happy", "New", "Year", "2006"};
set.addAll( Arrays.asList(array);
array = set.toArray();
this array is now having unique elements only, then you can continue untill the 500 mark is reached.


0
 
LVL 1

Author Comment

by:guyneo
ID: 33495417
Thanks jurobot and gurvinder for your input.
@jurobot. HashSet offers constant time but is it on the whole faster than other options? say ArrayList.
The operations I would do mostly are contains and add?
@gurvinder The automatic removal of duplicates in HashSet seems like a nice feature. However for my current task I need to know when there is duplicate. I didnot explain all my requirements for the sake of keeping small and clear. I did implement it in ArrayLists, it worked fine. definitely usable. Ofcourse looking for a better solution keeps the learning going :)
0
NAS Cloud Backup Strategies

This article explains backup scenarios when using network storage. We review the so-called “3-2-1 strategy” and summarize the methods you can use to send NAS data to the cloud

 
LVL 2

Expert Comment

by:jurobot
ID: 33499265
hi guyneo,

I guess that time complexity for all array-like implementations is linear running time for operation 'contains'. Constant time for operation 'add' (i.e. adding n elements requires O(n) time).

Set implementations using hash indexing offers constant time for operations like 'add' and 'contains'.

So from my point of view the answer for you is chooding between array-like implementation (ArrayList is array-like implementation) and hash indexing implementation.

cheers
JS
0
 
LVL 6

Assisted Solution

by:__geof__
__geof__ earned 50 total points
ID: 33509322
Hashset returns false if the item already exist in the set so I would use the solution from gurvinder372 and check the return value.
0
 
LVL 3

Assisted Solution

by:hazgoduk
hazgoduk earned 100 total points
ID: 33566373
I always use HashMap or LinkedHashMap (if you need to retain order) for this.

LinkedHashMap<String, String> a = new LinkedHashMap();

a.put("data", "");

You can either check if a.containsKey("") if you need to know if it's a duplicate or if it doesn't matter just put and it won't have duplicates.
0
 
LVL 1

Author Closing Comment

by:guyneo
ID: 33762009
Thanks for all your help Guys. I got help in time. For sake of learning, I tried various approaches. For my particular task,dataset, it didnot make too much of a difference so as to prefer one over the other. Ofcourse other people might find it different. If so please do  share your experiences.
0

Featured Post

Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
maven project error 5 56
Way to decrease size of apk file 9 71
hibernate example issues from command prompt 10 42
sql import cannot be resolved jsp 3 24
For customizing the look of your lightweight component and making it look opaque like it was made of plastic.  This tip assumes your component to be of rectangular shape and completely opaque.   (CODE)
For beginner Java programmers or at least those new to the Eclipse IDE, the following tutorial will show some (four) ways in which you can import your Java projects to your Eclipse workbench. Introduction While learning Java can be done with…
Viewers learn about the third conditional statement “else if” and use it in an example program. Then additional information about conditional statements is provided, covering the topic thoroughly. Viewers learn about the third conditional statement …
Video by: Michael
Viewers learn about how to reduce the potential repetitiveness of coding in main by developing methods to perform specific tasks for their program. Additionally, objects are introduced for the purpose of learning how to call methods in Java. Define …

832 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question