Link to home
Start Free TrialLog in
Avatar of bsumariwalla
bsumariwalla

asked on

Python - Deduplicating and Capturing Duplicates

Hello,

I'm using Python 2.7 and I have a lists of lists with lists.  I'm trying to deduplicate the list of lists and also return every duplication.  How can I do this?  I seem to have code that deduplicates okay, but I can figure out the second part.  Consider the following list

[["apple", "red"]
["apple", "red"]
["apple", "red"]
["apple", "green"]
["banana", "yellow"]]

I'm trying to return
Uniques:
[["apple", "red"]
["apple", "green"]
["banana", "yellow"]]

Duplicates:
[["apple", "red"]
["apple", "red"]]

def csv_deduplicate2(csv_list):
    sorted_by_date = sorted(csv_list, key=lambda row: row[2], reverse=True)
    unique_csv_list = []
    duplicate_csv_list = []
    for sorted_row in sorted_by_date:
        if sorted_row not in unique_csv_list:
            unique_csv_list.append(sorted_row) # Produces a unique list
    for sorted_row in sorted_by_date:
        if sorted_row in unique_csv_list:
            duplicate_csv_list.append(sorted_row) # Produces a list of everything, not just duplicates.
    return unique_csv_list, duplicate_csv_list

Open in new window

SOLUTION
Avatar of gelonida
gelonida
Flag of France image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I don't see the need for sorting the list unless the list is very large.  Then you would consider using the bisect searching library, which requires a sorted list.
@bsumariwalla

Have you tried the posted solutions?  It is time for you to close this question.
@bsumariwalla

Where do you stand with this question?
Split points between two correct solutions.