Comparing set of records for a substring match to remove duplicate records

Hi Folks,

I will have a list of records like this

1111,abdc,D23
2222,ejdk,D67
3333,eeld,D22
1111,rrrr,Z33
2222,ertu,T22

and I need to drop the duplicate record but the one which comes first rather than later and the output should be

1111,rrrr,Z33
2222,ertu,T22
3333,eeld,D22

My approach is to see if there is any way to compare the substring (the first 4 characters) of the string(in this case considering the entire record as string eg: 1111,abdc,D23). But I am not able to visualize how can I ignore the first record and return only second duplicate record. A sample code would also be appreciated.
kalyangkmAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

mccarlIT Business Systems Analyst / Software DeveloperCommented:
I might do it using code like this...
import java.util.ArrayList;
import java.util.LinkedHashMap;
import java.util.List;
import java.util.Map;

public class RemoveDuplicate {
    
    public static void main(String[] args) {
        List<String> input = new ArrayList<String>();
        
        input.add("1111,abdc,D23");
        input.add("2222,ejdk,D67");
        input.add("3333,eeld,D22");
        input.add("1111,rrrr,Z33");
        input.add("2222,ertu,T22");
        
        
        
        
        Map<String, String> recordMap = new LinkedHashMap<String, String>();
        
        for (String entry : input) {
            String[] fields = entry.split(",", 2);
            recordMap.put(fields[0], fields[1]);
        }
        
        
        
        
        List<String> output = new ArrayList<String>();
        for (Map.Entry<String, String> entry : recordMap.entrySet()) {
            output.add(entry.getKey() + "," + entry.getValue());
        }
        for (String entry : output) {
            System.out.println(entry);
        }
    }
    
}

Open in new window

Note that you may be looking to do further processing with each record broken into its fields and so the first part and the last part will probably be different.

The guts of why this works is line 20-25 and the use of the LinkedHashMap. Since it is a Map that only allows ONE entry of a certain key, and so when you attempt to "put" new entries that have the same key as previous ones, the previous ones get replaced. I used a LinkedHashMap so as it keeps the same key order as in the input list. A plain HashMap would have given you some (basically) random order. You could use a TreeMap to ensure that the result is always in the natural order of the keys, ie. in this case, alphabetical String order.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
kalyangkmAuthor Commented:
Also can you elaborate  on what exactly you mean by this

"Note that you may be looking to do further processing with each record broken into its fields and so the first part and the last part will probably be different."
kalyangkmAuthor Commented:
Cool, I didnt realize that Linked hashmap would replace the existing key with the key that being inserted if there is a match. Thank You. Could you please eloborate on my above question.
mccarlIT Business Systems Analyst / Software DeveloperCommented:
Also can you elaborate  on what exactly you mean by this
All, I meant was that the final desired output is probably not to write those lines to the Java console. I am guessing that you possibly may have further processing that you would like to do, and that it would be useful to keep the "key" still separate from the rest of the fields, or that you want to break up the "entire" record into fields in the first place.

If that is the case, than you probably DON'T need lines 30-36 of the code that I posted. All that does is join the fields back together and print them to the console. Instead, you could take these and directly do the further processing required, etc.

Thank You.
You're welcome!
kalyangkmAuthor Commented:
Thanks for the feedback.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Java

From novice to tech pro — start learning today.