We help IT Professionals succeed at work.

Python dataframe select only duplicated values.

Newton
Newton asked
on
Medium Priority
70 Views
Last Modified: 2020-03-08
I want to extract only those rows where specific column values are duplicated.

For example, my source file is

File 1 : CountryA.txt
fruits|count
apple|100
orange|200
orange|245
grapes|230

I need following output
output1.txt
fruits|count
orange|200
orange|245

My code is, is this the correct way of doing this ?

Df1 = pd.read_csv('CountryA.txt',sep="|")
Df1 = Df1[Df1['fruits'].duplicated()]
Df1.to_csv('output1.txt',sep="|")
Comment
Watch Question

Excel & VBA Expert
CERTIFIED EXPERT
Most Valuable Expert 2018
Awarded 2015
Commented:
You may try something like this...

import pandas as pd

country_A = pd.read_table('CountryA.txt', sep="|")

duplicate_rows = country_A[country_A.duplicated(['fruits'], keep=False)]
duplicate_rows.to_csv('output1.txt', sep="|", index=None)

Author

Commented:
Thanks Neeraj. My problem solved.
Subodh Tiwari (Neeraj)Excel & VBA Expert
CERTIFIED EXPERT
Most Valuable Expert 2018
Awarded 2015

Commented:
Great. You're welcome!