asked on

FETCH, then DELETE

Greetings, experts!

I'm receiving some invalid data from an external source, and I need to "clean it up" before loading it into our main database. Based on two columns in the table, I need to take the FIRST row.

For instance, given the following data:

memberNumber lastName firstName effectiveDate
12345 jarzombek ron 2007-10-01
12345 jarzombek ron 2005-01-01
54321 mcmaster jason 2001-01-01
34567 keyser doug 2006-01-01
34567 keyser doug 2001-11-01

I need:

memberNumber lastName firstName effectiveDate
12345 jarzombek ron 2007-10-01
54321 mcmaster jason 2001-01-01
34567 keyser doug 2006-01-01

I thought about looping through a cursor, record by record. I'd check to see if I already encountered this memberNumber and lastName, and write it to a separate file if I haven't.

Alternately, I thought about deleting the duplicate records as I encounter them in the FETCH loop, but it mildly scares me to delete records out of a cursor I'm in the middle of looping through. (Would that be a problem?)

So, which of these alternatives would you choose? Or do you have another alternative?

I'm using DB2 for iSeries v5r3

Thanks!
DaveSlash

ASKER CERTIFIED SOLUTION

mcmonap

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Member_2_2484401

ASKER

Thanks, mcmonap! That's a great idea.

I think the only problem might be that the FIRST record may not always have the max effectiveDate. If it does, though, then your solution is WAY better than mine.

I'll verify with the sender of the data if I can make that assumption.

Thanks again,
Dave

SOLUTION

David Todd

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

mcmonap

Hi daveslash,

It shouldn't matter if it is the first or not ie if the data was this:
memberNumber lastName firstName effectiveDate
12345 jarzombek ron 2002-10-01
12345 jarzombek ron 2003-01-01
12345 jarzombek ron 2007-10-01
12345 jarzombek ron 1999-01-01

You would get back:
memberNumber lastName firstName effectiveDate
12345 jarzombek ron 2007-10-01

(At least I would in MS SQL!)

Member_2_2484401

ASKER

Hi mcmonap and dtodd,

mcmonap, that again for your help. In your example, what I would want is the FIRST matching row:

12345 jarzombek ron 2002-10-01

dtodd, I'm looking into your suggestion. I'll get back to you.

Thanks!
DaveSlash

Member_2_2484401

ASKER

Ok, here's a bit more info. Thanks to everyone for getting my brain down the right path.

This returns the correct rows to delete:

with KeepThese as (
SELECT memberNumber,
lastName,
min(rrn(mytable)) as minrrn
FROM mytable
group by memberNumber,
lastName
)
select *
from mytable
where rrn(mytable) not in (
select minrrn
from KeepThese
)

But when I change the "select *" to a DELETE, it fails with:

Message ID . . . . . . : SQL0199
Severity . . . . . . . : 30
Message type . . . . . : Diagnostic

Keyword DELETE not expected. Valid tokens: ( SELECT.

Aarrgghh!!

So, I moved the WITH clause down into the query, and it does work (albeit slowly):

delete
from mytable
where rrn(mytable) not in (
select minrrn
from (SELECT memberNumber,
lastName,
min(rrn(mytable)) as minrrn
FROM mytable
group by memberNumber,
lastName) as withTable
)

Lastly, I just split it up into several steps, and it does work, and it works much faster. (For those not familiar with the AS/400's "qtemp" schema, it's basically a place to create temporary data that gets automatically dropped when the database connection closes.)

create table qtemp/KeepThese (
memberNumber integer,
lastName varchar(25),
minrrn integer
)

insert into qtemp/KeepThese
SELECT memberNumber,
lastName,
min(rrn(mytable))
FROM mytable
group by memberNumber,
lastName
with nc

delete
from mytable
where rrn(mytable) not in (
select minrrn
from qtemp/KeepThese
)

Whew! There's got to be a better way!

-- DaveSlash

SOLUTION

momi_sabag

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

SOLUTION

Lowfatspread

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

SOLUTION

Kent Olsen

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

mcmonap

Hi Kdo,

>>>>I can't sweat to the inner workings of DB2 for all platforms, but on many database systems, there is no assurance that the rows will be returned to a query in the same order as they are stored. In fact, there's no assurance that they will be returned in the same order on successive queries.<<<<
This is also the case in MSSQL, that is why there were suggestions for creating an auto incrementing primary key during the import to retain the original order - records will be imported sequentially in my experience.

>>>>Loading the data into a table and then loading the data from that table into the "real" table is a bad idea<<<<
This is a fact finding DB2 mission for me now (I know nothing at the moment)! Can you import your data directly into an exceptions table, your code suggests it comes from another table in the db but your first comment suggest this is a bad way to go?

prmurphy63026

Provided the data is in the correct sorted sequence, you could put a unique key constraint on LastName+FirstName or perhaps just MemberNumber then use the LOAD utility to get the data into the table. The first row will be loaded and the others will fail to load with Unique Key violations....

Member_2_2484401

ASKER

Thanks to all of you for your help. It steered my brain down the correct path, and that's exactly what I needed.

Thanks again,
DaveSlash

Kent Olsen

>>>>Loading the data into a table and then loading the data from that table into the "real" table is a bad idea<<<<
>>This is a fact finding DB2 mission for me now (I know nothing at the moment)! Can you import your data directly into an exceptions table, your code suggests it comes from another table in the db but your first comment suggest this is a bad way to go?

Well, the DB2 load process can handle this automatically. It will be a lot faster to let the load process work within its own contraints instead of artifically building them.

One of the capabilities of the LOAD utility (process) is that when a record violates any RI, write the record to an exception table. Attempting to write a second record with the same unique / primary key is an RI violation. And since Dave wants to keep only the first one, writing the exceptions to another table is just a simple case of creating the table and adding one parameter to the LOAD command.

Kent