Solved

Add dynamically to an array and compare

Posted on 2010-08-20
7
337 Views
Last Modified: 2013-11-23
Hi Experts,
I need advice on what is the best and fastest way to do this. I have a huge text file. has millions of rows. I maintain two lists A and B. I need to read each line and compare if the  values  in   ColumnA exists in list A . if it is not in the list. I add to it. If it is already there I skip adding to the list and proceed to the next line. I do the same with column B. Once the length of the listA or list B reaches a threshold say 500. I output the lists and reinitialize them and keep going.
What datastructure is best suited for this?  arrays or collections or?  I  like to process the whole log file as quickly as possible.
Any help is very much appreciated.
0
Comment
Question by:guyneo
7 Comments
 
LVL 2

Assisted Solution

by:jurobot
jurobot earned 175 total points
ID: 33487812
hi,
use collections, for exmaple HashSet that offers constant time performance for the basic operations (add, remove, contains and size)...

Note: once the length of the listA or listB reaches the treshold, you will "forget" what values have been already processed. Think about it - if it has meaning...

cheers
JS
0
 
LVL 40

Accepted Solution

by:
gurvinder372 earned 175 total points
ID: 33487850
Use java.util.Set, since if you add any item in the Set, duplicates are removed automatically, so no comparison is required to be done, all you need to do is
Set set = new HashSet();
set.add(item);

another approach could be to keep appending the items to an string to make a comma separated string, and when the 500 elements are processed, split the string to make an array of strings and then add that array to Set, then you can count if the number of items in resulting Set has reached 500 or not,

String[] array = {"Happy", "New", "Year", "2006"};
set.addAll( Arrays.asList(array);
array = set.toArray();
this array is now having unique elements only, then you can continue untill the 500 mark is reached.


0
 
LVL 1

Author Comment

by:guyneo
ID: 33495417
Thanks jurobot and gurvinder for your input.
@jurobot. HashSet offers constant time but is it on the whole faster than other options? say ArrayList.
The operations I would do mostly are contains and add?
@gurvinder The automatic removal of duplicates in HashSet seems like a nice feature. However for my current task I need to know when there is duplicate. I didnot explain all my requirements for the sake of keeping small and clear. I did implement it in ArrayLists, it worked fine. definitely usable. Ofcourse looking for a better solution keeps the learning going :)
0
Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

 
LVL 2

Expert Comment

by:jurobot
ID: 33499265
hi guyneo,

I guess that time complexity for all array-like implementations is linear running time for operation 'contains'. Constant time for operation 'add' (i.e. adding n elements requires O(n) time).

Set implementations using hash indexing offers constant time for operations like 'add' and 'contains'.

So from my point of view the answer for you is chooding between array-like implementation (ArrayList is array-like implementation) and hash indexing implementation.

cheers
JS
0
 
LVL 6

Assisted Solution

by:__geof__
__geof__ earned 50 total points
ID: 33509322
Hashset returns false if the item already exist in the set so I would use the solution from gurvinder372 and check the return value.
0
 
LVL 3

Assisted Solution

by:hazgoduk
hazgoduk earned 100 total points
ID: 33566373
I always use HashMap or LinkedHashMap (if you need to retain order) for this.

LinkedHashMap<String, String> a = new LinkedHashMap();

a.put("data", "");

You can either check if a.containsKey("") if you need to know if it's a duplicate or if it doesn't matter just put and it won't have duplicates.
0
 
LVL 1

Author Closing Comment

by:guyneo
ID: 33762009
Thanks for all your help Guys. I got help in time. For sake of learning, I tried various approaches. For my particular task,dataset, it didnot make too much of a difference so as to prefer one over the other. Ofcourse other people might find it different. If so please do  share your experiences.
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

INTRODUCTION Working with files is a moderately common task in Java.  For most projects hard coding the file names, using parameters in configuration files, or using command-line arguments is sufficient.   However, when your application has vi…
Are you developing a Java application and want to create Excel Spreadsheets? You have come to the right place, this article will describe how you can create Excel Spreadsheets from a Java Application. For the purposes of this article, I will be u…
Viewers learn about the “for” loop and how it works in Java. By comparing it to the while loop learned before, viewers can make the transition easily. You will learn about the formatting of the for loop as we write a program that prints even numbers…
Viewers will learn how to properly install Eclipse with the necessary JDK, and will take a look at an introductory Java program. Download Eclipse installation zip file: Extract files from zip file: Download and install JDK 8: Open Eclipse and …

830 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question