Link to home
Start Free TrialLog in
Avatar of Newton
Newton

asked on

Python : how to remove duplicated row from output.

Hi,
I got 2 files i.e File1 and File2, as shown below

File1
A | APPLE
B | ORANGE

File2
A | 10
B | 15
D | 20
A | 10

I need following output
Output 1
A | APPLE | 10
B | ORANGE | 15

But I am getting this below output.
A | APPLE | 10
B | ORANGE | 15
A | APPLE | 10

How can I remove the duplicate rows from the output and direct only the duplicate output to a new file.

My code is as follow

Import pandas as pd
df1 = pd.read_csv('file1.txt', sep='|')
df2 = pd.read_csv('file2.txt', sep='|')
Merge12 = pd.merge(df1, df2, how='left', on='A')
Merge12.to_csv('output.txt')
Avatar of Subodh Tiwari (Neeraj)
Subodh Tiwari (Neeraj)
Flag of India image

You may remove duplicate rows either from the df1 and df2 first and then merge them or remove duplicate rows from the resultant dataframe.


Merge12.drop_duplicates(keep='first', inplace=True)

Open in new window

Avatar of Newton
Newton

ASKER

Below code worked.

Import pandas as pd
df1 = pd.read_csv('file1.txt', sep='|')
df2 = pd.read_csv('file2.txt', sep='|')
df2.drop_duplicates(keep='first',inplace=True)
Merge12 = pd.merge(df1, df2, how='left', on='A')
Merge12.to_csv('output.txt')

Now I want to write only the duplicated row to a new file, is below code is coreect way of doing?

Import pandas as pd
df1 = pd.read_csv('file1.txt', sep='|')
df2 = pd.read_csv('file2.txt', sep='|')
df2.drop_duplicates(keep='first',inplace=True)
Merge12 = pd.merge(df1, df2, how='left', on='A')
Merge12.to_csv('output.txt')

df3 = pd.read_csv('file.txt', sep='|')
df3.drop_duplicates(keep='first', inplace=False)
df3.to_csv('duplicatedrow.txt')
ASKER CERTIFIED SOLUTION
Avatar of Subodh Tiwari (Neeraj)
Subodh Tiwari (Neeraj)
Flag of India image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Newton

ASKER

Thank You Neeraj.

You're welcome Newton!