asked on
How to do join update between Pandas DataFrames based on a common column
I have a Pandas Dataframe that include many columns such as (Server, Region, types, no_of_cores, ....., app_id) and there is a numerical index for each row.
name that Dataframe left.
Say that Dataframe has 100k rows... about a 2000 of those rows has Nan in the app_id.
I also have another DataFrame that will name it right it has only two columns (Server, appID)
There is about 10000 rows and in the right dataframe and all of them has values in appID.
How can we update the first dataframe (left) ... from Right dataframe updating the field
right.app_id = left.appID having that Server is the same in both data frames and when only right.app_id is Nan
Here is a cope explanation.. I need to reach desired outcome. I am using python 3.78
import pandas as pd, numpy as np
right = pd.DataFrame(columns=['Server', 'region','no_of_cores', 'Memory', 'app_id' , 'notes'])
right = right.append({'Server':'sevA', 'region':'East','no_of_cores':8, 'Memory':48, 'app_id':'101|102' , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevB', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':np.NaN , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevC', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'105' , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevD', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'105' , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevE', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'108' , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevF', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':np.NaN , 'notes':'some note'},ignore_index=True)
right =right.append({'Server':'sevG', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'102|107' , 'notes':'some note'},ignore_index=True)
print(right)
print('===================================================')
left = pd.DataFrame(columns=['Server','app_id' ])
left = left.append({'Server':'sevB', 'app_id':'101|102' },ignore_index=True)
left = left.append({'Server':'sevC', 'app_id':'101|102' },ignore_index=True)
left = left.append({'Server':'sevD', 'app_id': np.NaN },ignore_index=True)
left = left.append({'Server':'sevF', 'app_id': '143|144' },ignore_index=True)
left = left.append({'Server':'sevH', 'app_id': '115|117' },ignore_index=True)
print(left)
print('=================Desired Outcome====================================')
outcome = pd.DataFrame(columns=['Server', 'region','no_of_cores', 'Memory', 'app_id' , 'notes'])
outcome = outcome.append({'Server':'sevA', 'region':'East','no_of_cores':8, 'Memory':48, 'app_id':'101|102' , 'notes':'some note'},ignore_index=True)
outcome =outcome.append({'Server':'sevB', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'101|102' , 'notes':'some note'},ignore_index=True)
outcome =outcome.append({'Server':'sevC', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'105' , 'notes':'some note'},ignore_index=True)
outcome =outcome.append({'Server':'sevD', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'105' , 'notes':'some note'},ignore_index=True)
outcome =outcome.append({'Server':'sevE', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'108' , 'notes':'some note'},ignore_index=True)
outcome =outcome.append({'Server':'sevF', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'143|144' , 'notes':'some note'},ignore_index=True)
outcome =outcome.append({'Server':'sevG', 'region':'West','no_of_cores':8, 'Memory':72, 'app_id':'102|107' , 'notes':'some note'},ignore_index=True)
print(outcome)
ASKER
I made a mistake swapping left and right.
I think these two solutions make sense. I will try them and if they work, I will accept both.
Thank you!
ASKER
I mean do you I need first to do data cleaning first by doing a unique on them!!
Open in new window