Avatar of Sean Girgis
Sean Girgis
Flag for United States of America

asked on 

How to do join update between Pandas DataFrames based on a common column

I have a Pandas Dataframe that include many columns such as (Server, Region, types, no_of_cores, ....., app_id) and there is a numerical index for each row.

name that Dataframe left.

Say that Dataframe has 100k rows... about a 2000 of those rows has Nan in the app_id. 


I also have another DataFrame that will name it right it has only two columns (Server, appID)

There is about 10000 rows and in the right dataframe and all of them has values in appID.


How can we update the first dataframe (left) ... from Right dataframe updating the field 

right.app_id = left.appID having that Server is the same in both data frames and when only right.app_id is Nan



Here is a cope explanation.. I need to reach desired outcome. I am using python 3.78


import pandas as pd, numpy as np
right = pd.DataFrame(columns=['Server', 'region','no_of_cores', 'Memory', 'app_id' , 'notes']) 
right = right.append({'Server':'sevA', 'region':'East','no_of_cores':8, 'Memory':48, 'app_id':'101|102' , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevB', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':np.NaN , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevC', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'105' , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevD', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'105' , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevE', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'108' , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevF', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':np.NaN , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevG', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'102|107' , 'notes':'some note'},ignore_index=True)
print(right)

print('===================================================')
left = pd.DataFrame(columns=['Server','app_id' ]) 
left = left.append({'Server':'sevB', 'app_id':'101|102' },ignore_index=True)
left = left.append({'Server':'sevC', 'app_id':'101|102' },ignore_index=True)
left = left.append({'Server':'sevD', 'app_id': np.NaN },ignore_index=True)
left = left.append({'Server':'sevF', 'app_id': '143|144' },ignore_index=True)
left = left.append({'Server':'sevH', 'app_id': '115|117' },ignore_index=True)
print(left)
print('=================Desired Outcome====================================')
outcome = pd.DataFrame(columns=['Server', 'region','no_of_cores', 'Memory', 'app_id' , 'notes']) 
outcome = outcome.append({'Server':'sevA', 'region':'East','no_of_cores':8, 'Memory':48, 'app_id':'101|102' , 'notes':'some note'},ignore_index=True)
outcome =outcome.append({'Server':'sevB', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'101|102' , 'notes':'some note'},ignore_index=True)
outcome =outcome.append({'Server':'sevC', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'105' , 'notes':'some note'},ignore_index=True)
outcome =outcome.append({'Server':'sevD', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'105' , 'notes':'some note'},ignore_index=True)
outcome =outcome.append({'Server':'sevE', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'108' , 'notes':'some note'},ignore_index=True)
outcome =outcome.append({'Server':'sevF', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'143|144' , 'notes':'some note'},ignore_index=True)
outcome =outcome.append({'Server':'sevG', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'102|107' , 'notes':'some note'},ignore_index=True)
print(outcome)

Open in new window


* Datascience* pandasPython

Avatar of undefined
Last Comment
Norie

8/22/2022 - Mon