We help IT Professionals succeed at work.
Research Question

Python dataframe join select where not exists

Leo Torres
Leo Torres asked
on
84 Views
Last Modified: 2020-10-30
I have 2 dataframes below I would like to set the NxtCase value

dfObj = pd.DataFrame(columns=['ID', 'Case','NxtCase']) 
dfObj = dfObj.append({'ID': 1, 'Case': 'A', 'NxtCase': ''}, ignore_index=True) dfObj = dfObj.append({'ID': 2, 'Case': 'C', 'NxtCase': ''}, ignore_index=True) dfObj = dfObj.append({'ID': 3, 'Case': 'E', 'NxtCase': ''}, ignore_index=True) 

dfObj2 = pd.DataFrame(columns=['CaseID', 'Case']) 
dfObj2 = dfObj2.append({'ID': 1, 'Case': 'A' }, ignore_index=True) dfObj2 = dfObj2.append({'ID': 1, 'Case': 'B'}, ignore_index=True) dfObj2 = dfObj2.append({'ID': 2, 'Case': 'C'}, ignore_index=True) dfObj2 = dfObj2.append({'ID': 2, 'Case': 'D'}, ignore_index=True) dfObj2 = dfObj2.append({'ID': 3, 'Case': 'E'}, ignore_index=True) dfObj2 = dfObj2.append({'ID': 3, 'Case': 'F'}, ignore_index=True)
My objective is to compare dfObj to dfObj2 and set the NxtCase value.
The NxtCase value has to Equal the missing letter Case based on ID and not what currently exists in Case column.
For example ID =1 and Case= 'A' then the value NxtCase = 'B' Any ideas on how I may set this value?

Comment
Watch Question

nociSoftware Engineer
CERTIFIED EXPERT
Distinguished Expert 2019

Commented:
The first step would be to provide a working example of your code.
Copy paste of the code above  provokes syntax errors, please clean up.
and name pd is not defined... I can Guess what you mean, but this was not a guessing game. You would like an answer...  Code debugging is a different subject.
NorieAnalyst Assistant
CERTIFIED EXPERT

Commented:
Here's the code re-formatted.
import pandas as pd

dfObj = pd.DataFrame(columns=['ID', 'Case','NxtCase']) 
dfObj = dfObj.append({'ID': 1, 'Case': 'A', 'NxtCase': ''}, ignore_index=True) 
dfObj = dfObj.append({'ID': 2, 'Case': 'C', 'NxtCase': ''}, ignore_index=True) 
dfObj = dfObj.append({'ID': 3, 'Case': 'E', 'NxtCase': ''}, ignore_index=True) 

dfObj2 = pd.DataFrame(columns=['CaseID', 'Case']) 
dfObj2 = dfObj2.append({'ID': 1, 'Case': 'A' }, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 1, 'Case': 'B'}, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 2, 'Case': 'C'}, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 2, 'Case': 'D'}, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 3, 'Case': 'E'}, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 3, 'Case': 'F'}, ignore_index=True)

print(dfObj)
print(dfObj2)

Open in new window


However I don't think it's quite right as this is what the 2 data frames look like when output.

dfObj:

   ID Case NxtCase
0  1    A        
1  2    C        
2  3    E      


dfObj2:
  CaseID Case   ID
0    NaN    A  1.0
1    NaN    B  1.0
2    NaN    C  2.0
3    NaN    D  2.0
4    NaN    E  3.0
5    NaN    F  3.0
Leo TorresSQL Developer
CERTIFIED EXPERT

Author

Commented:
Your right I messed up the column names thank you.

import pandas as pd

dfObj = pd.DataFrame(columns=['ID', 'Case','NxtCase']) 
dfObj = dfObj.append({'ID': 1, 'Case': 'A', 'NxtCase': ''}, ignore_index=True) 
dfObj = dfObj.append({'ID': 2, 'Case': 'C', 'NxtCase': ''}, ignore_index=True) 
dfObj = dfObj.append({'ID': 3, 'Case': 'E', 'NxtCase': ''}, ignore_index=True) 

dfObj2 = pd.DataFrame(columns=['ID', 'Case']) 
dfObj2 = dfObj2.append({'ID': 1, 'Case': 'A' }, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 1, 'Case': 'B'}, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 2, 'Case': 'C'}, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 2, 'Case': 'D'}, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 3, 'Case': 'E'}, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 3, 'Case': 'F'}, ignore_index=True)

print(dfObj)
print(dfObj2)


This was What I intended
 ID Case NxtCase
 1    A        
 2    C        
 3    E      

This is my desired result

This was What I intended
 ID Case NxtCase
 1    A        B
 2    C        D
 3    E        F

Leo TorresSQL Developer
CERTIFIED EXPERT

Author

Commented:
Your right I messed up the column names thank you.

import pandas as pd

dfObj = pd.DataFrame(columns=['ID', 'Case','NxtCase']) 
dfObj = dfObj.append({'ID': 1, 'Case': 'A', 'NxtCase': ''}, ignore_index=True) 
dfObj = dfObj.append({'ID': 2, 'Case': 'C', 'NxtCase': ''}, ignore_index=True) 
dfObj = dfObj.append({'ID': 3, 'Case': 'E', 'NxtCase': ''}, ignore_index=True) 

dfObj2 = pd.DataFrame(columns=['ID', 'Case']) 
dfObj2 = dfObj2.append({'ID': 1, 'Case': 'A' }, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 1, 'Case': 'B'}, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 2, 'Case': 'C'}, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 2, 'Case': 'D'}, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 3, 'Case': 'E'}, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 3, 'Case': 'F'}, ignore_index=True)

print(dfObj)
print(dfObj2)


This was What I intended
 ID Case NxtCase
 1    A        
 2    C        
 3    E      

This is my desired result
 ID Case NxtCase
 1    A        B
 2    C        D
 3    E        F

NorieAnalyst Assistant
CERTIFIED EXPERT

Commented:
Leo

What should happen if there isn't a 'next case'?

For example, if the ID was 1 and the case was 'B'?
Leo TorresSQL Developer
CERTIFIED EXPERT

Author

Commented:
Hello Norie.. Thank you

Not Sure I understand your question or concern.

Let me try a different approach

1. There will always be a matching case
2. Each case has 1 ID I have 1 case A i need the query to return the other part of the ID. Lets not use letters

This is the initial table you have to find there partner
1 Jake
2 Cary
3 Mike
4 Nicole

Her is the second table with Partner and Key
1 Jake
2 Jordan
3 Mike
1 Susan
2 Cary
4 Nicole
3 Ana
4 Henry

If you were given these names and the ID represent a couple let say the similar ID would represent a couple.
So that being said these couples are together
Jake/Susan
Jordan/Cary
Mike/Ana
Henry/Nicole

Same as other example just using people instead. The IDs could be any number. See in the first table I already have 1 partner I need the logic to find the other partner and ignore the partner I already have.

I hope that explains it better.


NorieAnalyst Assistant
CERTIFIED EXPERT

Commented:
Leo

So are do you mean the ID should be disregarded?
Leo TorresSQL Developer
CERTIFIED EXPERT

Author

Commented:
Not sure I understand by disregarded . You need the ID to match the couple and then compare who you have from the couple to return the other person.
Leo TorresSQL Developer
CERTIFIED EXPERT

Author

Commented:
I think this is easy I may just not be explaining it correctly. I am not that versed in python but I can do this is SQL. The final select table is what I am trying to accomplish in python

Create table #tmp1(
ID int,
[Case] varchar(20),
[NxtCase] varchar(20)
)

Create table #tmp2(
ID int,
[Case] varchar(20),
)



Insert into #tmp1(ID,[Case])
Select 23, 'A'
Union
Select 51, 'H'
Union
Select 63, 'L'


Insert into #tmp2(ID,[Case])
Select 23, 'A'
Union
Select 51, 'H'
Union
Select 63, 'L'
union
Select 23, 'D'
Union
Select 51, 'O'
Union
Select 63, 'E'



update t1
Set [NxtCase] =  t2.[Case]
from #tmp1 t1
    join #tmp2 t2
        on t1.ID = t2.ID
            and t1.[Case] <> t2.[Case] 

Select * from #tmp1


NorieAnalyst Assistant
CERTIFIED EXPERT

Commented:
Have a look at this SQL.
Create table #tmp1(
ID int,
[Case] varchar(20),
[NxtCase] varchar(20)
)

Create table #tmp2(
ID int,
[Case] varchar(20),
)



Insert into #tmp1(ID,[Case])
Select 23, 'D'
Union
Select 51, 'H'
Union
Select 63, 'L'

Select * from #tmp1

Insert into #tmp2(ID,[Case])
Select 23, 'A'
Union
Select 51, 'H'
Union
Select 63, 'L'
union
Select 23, 'D'
Union
Select 51, 'O'
Union
Select 63, 'E'
Union
Select 63, 'G'
Union
Select 23, 'Z'

Select * from #tmp2

update t1
Set [NxtCase] =  t2.[Case]
from #tmp1 t1
    join #tmp2 t2
        on t1.ID = t2.ID
            and t1.[Case] <> t2.[Case] 

Select * from #tmp1

Open in new window

Leo TorresSQL Developer
CERTIFIED EXPERT

Author

Commented:
Ah ok I see what your concern is now for 63. There are multiple options to return. This is easy. In my case this will not exists 63 will only have 1 pair. Everything will only have 2 parts. There is no case where there is only 1 or 3 parts or more. If that happens then there is a data issue. So my example still holds. Thank you I should have made this more clear.
NorieAnalyst Assistant
CERTIFIED EXPERT

Commented:
No problem, that definitely clarifies things.

Don't have time to have a proper look at it right now but I'll try and come up with something tomorrow.
Leo TorresSQL Developer
CERTIFIED EXPERT

Author

Commented:
any updates?
NorieAnalyst Assistant
CERTIFIED EXPERT

Commented:
I've been looking at it but I can't find a simple way to do it.
Analyst Assistant
CERTIFIED EXPERT
Commented:
This problem has been solved!
(Unlock this solution with a 7-day Free Trial)
UNLOCK SOLUTION
Leo TorresSQL Developer
CERTIFIED EXPERT

Author

Commented:
Thank you Norie just saw comment. I thought it was just me I could not figure this out in python. I will test this
I was thinking are there addon out there that provide the ability to write SQL on python that way this type of manipulation would be easier(at least for me). Thank you I will test today. Thank you! 
NorieAnalyst Assistant
CERTIFIED EXPERT

Commented:
You can use SQL in Python but I didn't look into that as I thought you wanted a 'pure' Python method, plus you would need a bit of setup depending on which database you wanted to use.
NorieAnalyst Assistant
CERTIFIED EXPERT

Commented:
This is what I came up with using sqlite which comes as standard with Python.

Wasn't able to get the UPDATE bit to work but the correct results appear to be returned.
import pandas as pd
import sqlite3

dfObj = pd.DataFrame(columns=['ID', 'Case','NxtCase']) 
dfObj = dfObj.append({'ID': 1, 'Case': 'A', 'NxtCase': ''}, ignore_index=True) 
dfObj = dfObj.append({'ID': 2, 'Case': 'C', 'NxtCase': ''}, ignore_index=True) 
dfObj = dfObj.append({'ID': 3, 'Case': 'E', 'NxtCase': ''}, ignore_index=True) 

dfObj2 = pd.DataFrame(columns=['CaseID', 'Case']) 
dfObj2 = dfObj2.append({'ID': 1, 'Case': 'A' }, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 1, 'Case': 'B'}, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 2, 'Case': 'C'}, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 2, 'Case': 'D'}, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 3, 'Case': 'E'}, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 3, 'Case': 'F'}, ignore_index=True)

conn = sqlite3.connect('test.db')

dfObj.to_sql('Table1', conn, index=False)

dfObj2.to_sql('Table2', conn, index=False)

query = """
  SELECT table1.[Case], table2.[Case]
  from table1 
  Left join table2
  on table1.[ID] = table2.[ID]
  and table1.[Case] <> table2.[Case];
"""
curr = conn.cursor()

data = curr.execute(query).fetchall()

print(data)

Open in new window

Gain unlimited access to on-demand training courses with an Experts Exchange subscription.

Get Access
Why Experts Exchange?

Experts Exchange always has the answer, or at the least points me in the correct direction! It is like having another employee that is extremely experienced.

Jim Murphy
Programmer at Smart IT Solutions

When asked, what has been your best career decision?

Deciding to stick with EE.

Mohamed Asif
Technical Department Head

Being involved with EE helped me to grow personally and professionally.

Carl Webster
CTP, Sr Infrastructure Consultant
Empower Your Career
Did You Know?

We've partnered with two important charities to provide clean water and computer science education to those who need it most. READ MORE

Ask ANY Question

Connect with Certified Experts to gain insight and support on specific technology challenges including:

  • Troubleshooting
  • Research
  • Professional Opinions