Comparing set of records for a substring match to remove duplicate records

kalyangkm
kalyangkm used Ask the Experts™
on
Hi Folks,

I will have a list of records like this

1111,abdc,D23
2222,ejdk,D67
3333,eeld,D22
1111,rrrr,Z33
2222,ertu,T22

and I need to drop the duplicate record but the one which comes first rather than later and the output should be

1111,rrrr,Z33
2222,ertu,T22
3333,eeld,D22

My approach is to see if there is any way to compare the substring (the first 4 characters) of the string(in this case considering the entire record as string eg: 1111,abdc,D23). But I am not able to visualize how can I ignore the first record and return only second duplicate record. A sample code would also be appreciated.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
IT Business Systems Analyst / Software Developer
Top Expert 2015
Commented:
I might do it using code like this...
import java.util.ArrayList;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;

public class RemoveDuplicate {
    
    public static void main(String[] args) {
        List<String> input = new ArrayList<String>();
        
        input.add("1111,abdc,D23");
        input.add("2222,ejdk,D67");
        input.add("3333,eeld,D22");
        input.add("1111,rrrr,Z33");
        input.add("2222,ertu,T22");
        
        
        
        
        Map<String, String> recordMap = new LinkedHashMap<String, String>();
        
        for (String entry : input) {
            String[] fields = entry.split(",", 2);
            recordMap.put(fields[0], fields[1]);
        }
        
        
        
        
        List<String> output = new ArrayList<String>();
        for (Map.Entry<String, String> entry : recordMap.entrySet()) {
            output.add(entry.getKey() + "," + entry.getValue());
        }
        for (String entry : output) {
            System.out.println(entry);
        }
    }
    
}

Open in new window

Note that you may be looking to do further processing with each record broken into its fields and so the first part and the last part will probably be different.

The guts of why this works is line 20-25 and the use of the LinkedHashMap. Since it is a Map that only allows ONE entry of a certain key, and so when you attempt to "put" new entries that have the same key as previous ones, the previous ones get replaced. I used a LinkedHashMap so as it keeps the same key order as in the input list. A plain HashMap would have given you some (basically) random order. You could use a TreeMap to ensure that the result is always in the natural order of the keys, ie. in this case, alphabetical String order.

Author

Commented:
Also can you elaborate  on what exactly you mean by this

"Note that you may be looking to do further processing with each record broken into its fields and so the first part and the last part will probably be different."

Author

Commented:
Cool, I didnt realize that Linked hashmap would replace the existing key with the key that being inserted if there is a match. Thank You. Could you please eloborate on my above question.
mccarlIT Business Systems Analyst / Software Developer
Top Expert 2015

Commented:
Also can you elaborate  on what exactly you mean by this
All, I meant was that the final desired output is probably not to write those lines to the Java console. I am guessing that you possibly may have further processing that you would like to do, and that it would be useful to keep the "key" still separate from the rest of the fields, or that you want to break up the "entire" record into fields in the first place.

If that is the case, than you probably DON'T need lines 30-36 of the code that I posted. All that does is join the fields back together and print them to the console. Instead, you could take these and directly do the further processing required, etc.

Thank You.
You're welcome!

Author

Commented:
Thanks for the feedback.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial