troubleshooting Question

Python - Deduplicating and Capturing Duplicates

Avatar of bsumariwalla
bsumariwalla asked on
7 Comments2 Solutions215 ViewsLast Modified:

I'm using Python 2.7 and I have a lists of lists with lists.  I'm trying to deduplicate the list of lists and also return every duplication.  How can I do this?  I seem to have code that deduplicates okay, but I can figure out the second part.  Consider the following list

[["apple", "red"]
["apple", "red"]
["apple", "red"]
["apple", "green"]
["banana", "yellow"]]

I'm trying to return
[["apple", "red"]
["apple", "green"]
["banana", "yellow"]]

[["apple", "red"]
["apple", "red"]]

def csv_deduplicate2(csv_list):
    sorted_by_date = sorted(csv_list, key=lambda row: row[2], reverse=True)
    unique_csv_list = []
    duplicate_csv_list = []
    for sorted_row in sorted_by_date:
        if sorted_row not in unique_csv_list:
            unique_csv_list.append(sorted_row) # Produces a unique list
    for sorted_row in sorted_by_date:
        if sorted_row in unique_csv_list:
            duplicate_csv_list.append(sorted_row) # Produces a list of everything, not just duplicates.
    return unique_csv_list, duplicate_csv_list
Get vaccinated; Social distance; Wear a mask

Our community of experts have been thoroughly vetted for their expertise and industry experience.

Top Expert 2014

The Distinguished Expert awards are presented to the top veteran and rookie experts to earn the most points in the top 50 topics.

Join our community to see this answer!
Unlock 2 Answers and 7 Comments.
Start Free Trial
Learn from the best

Network and collaborate with thousands of CTOs, CISOs, and IT Pros rooting for you and your success.

Andrew Hancock - VMware vExpert
See if this solution works for you by signing up for a 7 day free trial.
Unlock 2 Answers and 7 Comments.
Try for 7 days

”The time we save is the biggest benefit of E-E to our team. What could take multiple guys 2 hours or more each to find is accessed in around 15 minutes on Experts Exchange.

-Mike Kapnisakis, Warner Bros