asked on

Python dataframe join select where not exists

I have 2 dataframes below I would like to set the NxtCase value

dfObj = pd.DataFrame(columns=['ID', 'Case','NxtCase']) 
dfObj = dfObj.append({'ID': 1, 'Case': 'A', 'NxtCase': ''}, ignore_index=True) dfObj = dfObj.append({'ID': 2, 'Case': 'C', 'NxtCase': ''}, ignore_index=True) dfObj = dfObj.append({'ID': 3, 'Case': 'E', 'NxtCase': ''}, ignore_index=True) 

dfObj2 = pd.DataFrame(columns=['CaseID', 'Case']) 
dfObj2 = dfObj2.append({'ID': 1, 'Case': 'A' }, ignore_index=True) dfObj2 = dfObj2.append({'ID': 1, 'Case': 'B'}, ignore_index=True) dfObj2 = dfObj2.append({'ID': 2, 'Case': 'C'}, ignore_index=True) dfObj2 = dfObj2.append({'ID': 2, 'Case': 'D'}, ignore_index=True) dfObj2 = dfObj2.append({'ID': 3, 'Case': 'E'}, ignore_index=True) dfObj2 = dfObj2.append({'ID': 3, 'Case': 'F'}, ignore_index=True)

Open in new window

My objective is to compare dfObj to dfObj2 and set the NxtCase value.
The NxtCase value has to Equal the missing letter Case based on ID and not what currently exists in Case column.
For example ID =1 and Case= 'A' then the value NxtCase = 'B' Any ideas on how I may set this value?

noci

The first step would be to provide a working example of your code.
Copy paste of the code above provokes syntax errors, please clean up.
and name pd is not defined... I can Guess what you mean, but this was not a guessing game. You would like an answer... Code debugging is a different subject.

Norie

Here's the code re-formatted.

import pandas as pd

dfObj = pd.DataFrame(columns=['ID', 'Case','NxtCase']) 
dfObj = dfObj.append({'ID': 1, 'Case': 'A', 'NxtCase': ''}, ignore_index=True) 
dfObj = dfObj.append({'ID': 2, 'Case': 'C', 'NxtCase': ''}, ignore_index=True) 
dfObj = dfObj.append({'ID': 3, 'Case': 'E', 'NxtCase': ''}, ignore_index=True) 

dfObj2 = pd.DataFrame(columns=['CaseID', 'Case']) 
dfObj2 = dfObj2.append({'ID': 1, 'Case': 'A' }, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 1, 'Case': 'B'}, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 2, 'Case': 'C'}, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 2, 'Case': 'D'}, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 3, 'Case': 'E'}, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 3, 'Case': 'F'}, ignore_index=True)

print(dfObj)
print(dfObj2)

Open in new window

However I don't think it's quite right as this is what the 2 data frames look like when output.

dfObj:

ID Case NxtCase
0 1 A
1 2 C
2 3 E

dfObj2:
CaseID Case ID
0 NaN A 1.0
1 NaN B 1.0
2 NaN C 2.0
3 NaN D 2.0
4 NaN E 3.0
5 NaN F 3.0

Leo Torres

ASKER

Your right I messed up the column names thank you.

import pandas as pd

dfObj = pd.DataFrame(columns=['ID', 'Case','NxtCase']) 
dfObj = dfObj.append({'ID': 1, 'Case': 'A', 'NxtCase': ''}, ignore_index=True) 
dfObj = dfObj.append({'ID': 2, 'Case': 'C', 'NxtCase': ''}, ignore_index=True) 
dfObj = dfObj.append({'ID': 3, 'Case': 'E', 'NxtCase': ''}, ignore_index=True) 

dfObj2 = pd.DataFrame(columns=['ID', 'Case']) 
dfObj2 = dfObj2.append({'ID': 1, 'Case': 'A' }, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 1, 'Case': 'B'}, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 2, 'Case': 'C'}, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 2, 'Case': 'D'}, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 3, 'Case': 'E'}, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 3, 'Case': 'F'}, ignore_index=True)

print(dfObj)
print(dfObj2)

Open in new window


This was What I intended
 ID Case NxtCase
 1    A        
 2    C        
 3    E      

This is my desired result

This was What I intended
 ID Case NxtCase
 1    A        B
 2    C        D
 3    E        F

Open in new window

Leo Torres

ASKER

Your right I messed up the column names thank you.

import pandas as pd

dfObj = pd.DataFrame(columns=['ID', 'Case','NxtCase']) 
dfObj = dfObj.append({'ID': 1, 'Case': 'A', 'NxtCase': ''}, ignore_index=True) 
dfObj = dfObj.append({'ID': 2, 'Case': 'C', 'NxtCase': ''}, ignore_index=True) 
dfObj = dfObj.append({'ID': 3, 'Case': 'E', 'NxtCase': ''}, ignore_index=True) 

dfObj2 = pd.DataFrame(columns=['ID', 'Case']) 
dfObj2 = dfObj2.append({'ID': 1, 'Case': 'A' }, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 1, 'Case': 'B'}, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 2, 'Case': 'C'}, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 2, 'Case': 'D'}, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 3, 'Case': 'E'}, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 3, 'Case': 'F'}, ignore_index=True)

print(dfObj)
print(dfObj2)

Open in new window

This was What I intended
 ID Case NxtCase
 1    A        
 2    C        
 3    E      

This is my desired result
 ID Case NxtCase
 1    A        B
 2    C        D
 3    E        F

Open in new window

Norie

Leo

What should happen if there isn't a 'next case'?

For example, if the ID was 1 and the case was 'B'?

Leo Torres

ASKER

Hello Norie.. Thank you

Not Sure I understand your question or concern.

Let me try a different approach

1. There will always be a matching case
2. Each case has 1 ID I have 1 case A i need the query to return the other part of the ID. Lets not use letters

This is the initial table you have to find there partner
1 Jake
2 Cary
3 Mike
4 Nicole

Her is the second table with Partner and Key
1 Jake
2 Jordan
3 Mike
1 Susan
2 Cary
4 Nicole
3 Ana
4 Henry

If you were given these names and the ID represent a couple let say the similar ID would represent a couple.
So that being said these couples are together
Jake/Susan
Jordan/Cary
Mike/Ana
Henry/Nicole

Same as other example just using people instead. The IDs could be any number. See in the first table I already have 1 partner I need the logic to find the other partner and ignore the partner I already have.

I hope that explains it better.

Norie

Leo

So are do you mean the ID should be disregarded?

Leo Torres

ASKER

Not sure I understand by disregarded . You need the ID to match the couple and then compare who you have from the couple to return the other person.

Leo Torres

ASKER

I think this is easy I may just not be explaining it correctly. I am not that versed in python but I can do this is SQL. The final select table is what I am trying to accomplish in python

Create table #tmp1(
ID int,
[Case] varchar(20),
[NxtCase] varchar(20)
)

Create table #tmp2(
ID int,
[Case] varchar(20),
)



Insert into #tmp1(ID,[Case])
Select 23, 'A'
Union
Select 51, 'H'
Union
Select 63, 'L'


Insert into #tmp2(ID,[Case])
Select 23, 'A'
Union
Select 51, 'H'
Union
Select 63, 'L'
union
Select 23, 'D'
Union
Select 51, 'O'
Union
Select 63, 'E'



update t1
Set [NxtCase] =  t2.[Case]
from #tmp1 t1
    join #tmp2 t2
        on t1.ID = t2.ID
            and t1.[Case] <> t2.[Case] 

Select * from #tmp1

Open in new window

Norie

Have a look at this SQL.

Create table #tmp1(
ID int,
[Case] varchar(20),
[NxtCase] varchar(20)
)

Create table #tmp2(
ID int,
[Case] varchar(20),
)



Insert into #tmp1(ID,[Case])
Select 23, 'D'
Union
Select 51, 'H'
Union
Select 63, 'L'

Select * from #tmp1

Insert into #tmp2(ID,[Case])
Select 23, 'A'
Union
Select 51, 'H'
Union
Select 63, 'L'
union
Select 23, 'D'
Union
Select 51, 'O'
Union
Select 63, 'E'
Union
Select 63, 'G'
Union
Select 23, 'Z'

Select * from #tmp2

update t1
Set [NxtCase] =  t2.[Case]
from #tmp1 t1
    join #tmp2 t2
        on t1.ID = t2.ID
            and t1.[Case] <> t2.[Case] 

Select * from #tmp1

Open in new window

Leo Torres

ASKER

Ah ok I see what your concern is now for 63. There are multiple options to return. This is easy. In my case this will not exists 63 will only have 1 pair. Everything will only have 2 parts. There is no case where there is only 1 or 3 parts or more. If that happens then there is a data issue. So my example still holds. Thank you I should have made this more clear.

Norie

No problem, that definitely clarifies things.

Don't have time to have a proper look at it right now but I'll try and come up with something tomorrow.

Leo Torres

ASKER

any updates?

Norie

I've been looking at it but I can't find a simple way to do it.

ASKER CERTIFIED SOLUTION

Norie

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Leo Torres

ASKER

Thank you Norie just saw comment. I thought it was just me I could not figure this out in python. I will test this
I was thinking are there addon out there that provide the ability to write SQL on python that way this type of manipulation would be easier(at least for me). Thank you I will test today. Thank you!

Norie

You can use SQL in Python but I didn't look into that as I thought you wanted a 'pure' Python method, plus you would need a bit of setup depending on which database you wanted to use.

Norie

This is what I came up with using sqlite which comes as standard with Python.

Wasn't able to get the UPDATE bit to work but the correct results appear to be returned.

import pandas as pd
import sqlite3

dfObj = pd.DataFrame(columns=['ID', 'Case','NxtCase']) 
dfObj = dfObj.append({'ID': 1, 'Case': 'A', 'NxtCase': ''}, ignore_index=True) 
dfObj = dfObj.append({'ID': 2, 'Case': 'C', 'NxtCase': ''}, ignore_index=True) 
dfObj = dfObj.append({'ID': 3, 'Case': 'E', 'NxtCase': ''}, ignore_index=True) 

dfObj2 = pd.DataFrame(columns=['CaseID', 'Case']) 
dfObj2 = dfObj2.append({'ID': 1, 'Case': 'A' }, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 1, 'Case': 'B'}, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 2, 'Case': 'C'}, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 2, 'Case': 'D'}, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 3, 'Case': 'E'}, ignore_index=True) 
dfObj2 = dfObj2.append({'ID': 3, 'Case': 'F'}, ignore_index=True)

conn = sqlite3.connect('test.db')

dfObj.to_sql('Table1', conn, index=False)

dfObj2.to_sql('Table2', conn, index=False)

query = """
  SELECT table1.[Case], table2.[Case]
  from table1 
  Left join table2
  on table1.[ID] = table2.[ID]
  and table1.[Case] <> table2.[Case];
"""
curr = conn.cursor()

data = curr.execute(query).fetchall()

print(data)

Open in new window