Link to home
Start Free TrialLog in
Avatar of Sean Girgis
Sean GirgisFlag for United States of America

asked on

How to do join update between Pandas DataFrames based on a common column

I have a Pandas Dataframe that include many columns such as (Server, Region, types, no_of_cores, ....., app_id) and there is a numerical index for each row.

name that Dataframe left.

Say that Dataframe has 100k rows... about a 2000 of those rows has Nan in the app_id. 


I also have another DataFrame that will name it right it has only two columns (Server, appID)

There is about 10000 rows and in the right dataframe and all of them has values in appID.


How can we update the first dataframe (left) ... from Right dataframe updating the field 

right.app_id = left.appID having that Server is the same in both data frames and when only right.app_id is Nan



Here is a cope explanation.. I need to reach desired outcome. I am using python 3.78


import pandas as pd, numpy as np
right = pd.DataFrame(columns=['Server', 'region','no_of_cores', 'Memory', 'app_id' , 'notes']) 
right = right.append({'Server':'sevA', 'region':'East','no_of_cores':8, 'Memory':48, 'app_id':'101|102' , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevB', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':np.NaN , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevC', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'105' , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevD', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'105' , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevE', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'108' , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevF', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':np.NaN , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevG', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'102|107' , 'notes':'some note'},ignore_index=True)
print(right)

print('===================================================')
left = pd.DataFrame(columns=['Server','app_id' ]) 
left = left.append({'Server':'sevB', 'app_id':'101|102' },ignore_index=True)
left = left.append({'Server':'sevC', 'app_id':'101|102' },ignore_index=True)
left = left.append({'Server':'sevD', 'app_id': np.NaN },ignore_index=True)
left = left.append({'Server':'sevF', 'app_id': '143|144' },ignore_index=True)
left = left.append({'Server':'sevH', 'app_id': '115|117' },ignore_index=True)
print(left)
print('=================Desired Outcome====================================')
outcome = pd.DataFrame(columns=['Server', 'region','no_of_cores', 'Memory', 'app_id' , 'notes']) 
outcome = outcome.append({'Server':'sevA', 'region':'East','no_of_cores':8, 'Memory':48, 'app_id':'101|102' , 'notes':'some note'},ignore_index=True)
outcome =outcome.append({'Server':'sevB', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'101|102' , 'notes':'some note'},ignore_index=True)
outcome =outcome.append({'Server':'sevC', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'105' , 'notes':'some note'},ignore_index=True)
outcome =outcome.append({'Server':'sevD', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'105' , 'notes':'some note'},ignore_index=True)
outcome =outcome.append({'Server':'sevE', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'108' , 'notes':'some note'},ignore_index=True)
outcome =outcome.append({'Server':'sevF', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'143|144' , 'notes':'some note'},ignore_index=True)
outcome =outcome.append({'Server':'sevG', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'102|107' , 'notes':'some note'},ignore_index=True)
print(outcome)

Open in new window


ASKER CERTIFIED SOLUTION
Avatar of Norie
Norie

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
How can we update the first dataframe (left) ... from Right dataframe updating the field
right.app_id = left.appID having that Server is the same in both data frames and when only right.app_id is Nan
This is another alternative solution to you based on your existing sample data and desired output.
import pandas as pd, numpy as np
right = pd.DataFrame(columns=['Server', 'region','no_of_cores', 'Memory', 'app_id' , 'notes']) 
right = right.append({'Server':'sevA', 'region':'East','no_of_cores':8, 'Memory':48, 'app_id':'101|102' , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevB', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':np.NaN , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevC', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'105' , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevD', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'105' , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevE', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'108' , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevF', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':np.NaN , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevG', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'102|107' , 'notes':'some note'},ignore_index=True)
print(right)

print('===================================================')
left = pd.DataFrame(columns=['Server','app_id' ]) 
left = left.append({'Server':'sevB', 'app_id':'101|102' },ignore_index=True)
left = left.append({'Server':'sevC', 'app_id':'101|102' },ignore_index=True)
left = left.append({'Server':'sevD', 'app_id': np.NaN },ignore_index=True)
left = left.append({'Server':'sevF', 'app_id': '143|144' },ignore_index=True)
left = left.append({'Server':'sevH', 'app_id': '115|117' },ignore_index=True)
print(left)

print('=================Desired Outcome====================================')
right.loc[right['app_id'].isna(), 'app_id'] = right.merge(left, how='left', on='Server')['app_id_y']

print(right)

Open in new window

User generated image


Avatar of Sean Girgis

ASKER

Yes,
I made a mistake swapping left and right.  
I think these two solutions make sense. I will try them and if they work, I will accept both.

Thank you!

I have a Question for you guys, for those codes to work the Server field in right DataFrame  needs to be unique, is not it... Or it could work in case of Duplicates too.
I mean do you I need first to do data cleaning first by doing a unique on them!!
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial