Avatar of bsumariwalla
bsumariwalla
 asked on

Python - Deduplicating and Capturing Duplicates

Hello,

I'm using Python 2.7 and I have a lists of lists with lists.  I'm trying to deduplicate the list of lists and also return every duplication.  How can I do this?  I seem to have code that deduplicates okay, but I can figure out the second part.  Consider the following list

[["apple", "red"]
["apple", "red"]
["apple", "red"]
["apple", "green"]
["banana", "yellow"]]

I'm trying to return
Uniques:
[["apple", "red"]
["apple", "green"]
["banana", "yellow"]]

Duplicates:
[["apple", "red"]
["apple", "red"]]

def csv_deduplicate2(csv_list):
    sorted_by_date = sorted(csv_list, key=lambda row: row[2], reverse=True)
    unique_csv_list = []
    duplicate_csv_list = []
    for sorted_row in sorted_by_date:
        if sorted_row not in unique_csv_list:
            unique_csv_list.append(sorted_row) # Produces a unique list
    for sorted_row in sorted_by_date:
        if sorted_row in unique_csv_list:
            duplicate_csv_list.append(sorted_row) # Produces a list of everything, not just duplicates.
    return unique_csv_list, duplicate_csv_list

Open in new window

Python

Avatar of undefined
Last Comment
aikimark

8/22/2022 - Mon
SOLUTION
gelonida

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
ASKER CERTIFIED SOLUTION
aikimark

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
aikimark

I don't see the need for sorting the list unless the list is very large.  Then you would consider using the bisect searching library, which requires a sorted list.
aikimark

@bsumariwalla

Have you tried the posted solutions?  It is time for you to close this question.
aikimark

@bsumariwalla

Where do you stand with this question?
All of life is about relationships, and EE has made a viirtual community a real community. It lifts everyone's boat
William Peck
aikimark

aikimark

Split points between two correct solutions.