asked on

How to do join update between Pandas DataFrames based on a common column

I have a Pandas Dataframe that include many columns such as (Server, Region, types, no_of_cores, ....., app_id) and there is a numerical index for each row.

name that Dataframe left.

Say that Dataframe has 100k rows... about a 2000 of those rows has Nan in the app_id.

I also have another DataFrame that will name it right it has only two columns (Server, appID)

There is about 10000 rows and in the right dataframe and all of them has values in appID.

How can we update the first dataframe (left) ... from Right dataframe updating the field

right.app_id = left.appID having that Server is the same in both data frames and when only right.app_id is Nan

Here is a cope explanation.. I need to reach desired outcome. I am using python 3.78

import pandas as pd, numpy as np
right = pd.DataFrame(columns=['Server', 'region','no_of_cores', 'Memory', 'app_id' , 'notes']) 
right = right.append({'Server':'sevA', 'region':'East','no_of_cores':8, 'Memory':48, 'app_id':'101|102' , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevB', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':np.NaN , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevC', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'105' , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevD', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'105' , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevE', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'108' , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevF', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':np.NaN , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevG', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'102|107' , 'notes':'some note'},ignore_index=True)
print(right)

print('===================================================')
left = pd.DataFrame(columns=['Server','app_id' ]) 
left = left.append({'Server':'sevB', 'app_id':'101|102' },ignore_index=True)
left = left.append({'Server':'sevC', 'app_id':'101|102' },ignore_index=True)
left = left.append({'Server':'sevD', 'app_id': np.NaN },ignore_index=True)
left = left.append({'Server':'sevF', 'app_id': '143|144' },ignore_index=True)
left = left.append({'Server':'sevH', 'app_id': '115|117' },ignore_index=True)
print(left)
print('=================Desired Outcome====================================')
outcome = pd.DataFrame(columns=['Server', 'region','no_of_cores', 'Memory', 'app_id' , 'notes']) 
outcome = outcome.append({'Server':'sevA', 'region':'East','no_of_cores':8, 'Memory':48, 'app_id':'101|102' , 'notes':'some note'},ignore_index=True)
outcome =outcome.append({'Server':'sevB', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'101|102' , 'notes':'some note'},ignore_index=True)
outcome =outcome.append({'Server':'sevC', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'105' , 'notes':'some note'},ignore_index=True)
outcome =outcome.append({'Server':'sevD', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'105' , 'notes':'some note'},ignore_index=True)
outcome =outcome.append({'Server':'sevE', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'108' , 'notes':'some note'},ignore_index=True)
outcome =outcome.append({'Server':'sevF', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'143|144' , 'notes':'some note'},ignore_index=True)
outcome =outcome.append({'Server':'sevG', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'102|107' , 'notes':'some note'},ignore_index=True)
print(outcome)

Open in new window

ASKER CERTIFIED SOLUTION

Norie

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

David H.H.Lee

How can we update the first dataframe (left) ... from Right dataframe updating the field
right.app_id = left.appID having that Server is the same in both data frames and when only right.app_id is Nan

This is another alternative solution to you based on your existing sample data and desired output.

import pandas as pd, numpy as np
right = pd.DataFrame(columns=['Server', 'region','no_of_cores', 'Memory', 'app_id' , 'notes']) 
right = right.append({'Server':'sevA', 'region':'East','no_of_cores':8, 'Memory':48, 'app_id':'101|102' , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevB', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':np.NaN , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevC', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'105' , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevD', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'105' , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevE', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'108' , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevF', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':np.NaN , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevG', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'102|107' , 'notes':'some note'},ignore_index=True)
print(right)

print('===================================================')
left = pd.DataFrame(columns=['Server','app_id' ]) 
left = left.append({'Server':'sevB', 'app_id':'101|102' },ignore_index=True)
left = left.append({'Server':'sevC', 'app_id':'101|102' },ignore_index=True)
left = left.append({'Server':'sevD', 'app_id': np.NaN },ignore_index=True)
left = left.append({'Server':'sevF', 'app_id': '143|144' },ignore_index=True)
left = left.append({'Server':'sevH', 'app_id': '115|117' },ignore_index=True)
print(left)

print('=================Desired Outcome====================================')
right.loc[right['app_id'].isna(), 'app_id'] = right.merge(left, how='left', on='Server')['app_id_y']

print(right)

Open in new window

Sean Girgis

ASKER

Yes,
I made a mistake swapping left and right.
I think these two solutions make sense. I will try them and if they work, I will accept both.

Thank you!