Link to home
Start Free TrialLog in
Avatar of Member_2_2484401
Member_2_2484401Flag for United States of America

asked on

FETCH, then DELETE

Greetings, experts!

I'm receiving some invalid data from an external source, and I need to "clean it up" before loading it into our main database. Based on two columns in the table, I need to take the FIRST row.

For instance, given the following data:

memberNumber  lastName   firstName  effectiveDate
12345                 jarzombek  ron          2007-10-01
12345                 jarzombek  ron          2005-01-01
54321                 mcmaster   jason       2001-01-01
34567                 keyser       doug        2006-01-01
34567                 keyser       doug        2001-11-01

I need:

memberNumber  lastName   firstName  effectiveDate
12345                 jarzombek  ron          2007-10-01
54321                 mcmaster   jason       2001-01-01
34567                 keyser       doug        2006-01-01

I thought about looping through a cursor, record by record. I'd check to see if I already encountered this memberNumber and lastName, and write it to a separate file if I haven't.

Alternately, I thought about deleting the duplicate records as I encounter them in the FETCH loop, but it mildly scares me to delete records out of a cursor I'm in the middle of looping through. (Would that be a problem?)

So, which of these alternatives would you choose?  Or do you have another alternative?

I'm using DB2 for iSeries v5r3

Thanks!
DaveSlash
ASKER CERTIFIED SOLUTION
Avatar of mcmonap
mcmonap
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of Member_2_2484401

ASKER


Thanks, mcmonap!  That's a great idea.

I think the only problem might be that the FIRST record may not always have the max effectiveDate. If it does, though, then your solution is WAY better than mine.

I'll verify with the sender of the data if I can make that assumption.

Thanks again,
Dave
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi daveslash,

It shouldn't matter if it is the first or not ie if the data was this:
memberNumber  lastName   firstName  effectiveDate
12345                 jarzombek  ron          2002-10-01
12345                 jarzombek  ron          2003-01-01
12345                 jarzombek  ron          2007-10-01
12345                 jarzombek  ron          1999-01-01

You would get back:
memberNumber  lastName   firstName  effectiveDate
12345                 jarzombek  ron          2007-10-01

(At least I would in MS SQL!)

Hi mcmonap and dtodd,

mcmonap, that again for your help. In your example, what I would want is the FIRST matching row:

12345                 jarzombek  ron          2002-10-01

dtodd, I'm looking into your suggestion. I'll get back to you.

Thanks!
DaveSlash


Ok, here's a bit more info. Thanks to everyone for getting my brain down the right path.

This returns the correct rows to delete:

with KeepThese as (                  
SELECT memberNumber,
             lastName,              
             min(rrn(mytable)) as minrrn
FROM   mytable              
group by memberNumber,
             lastName
)
select   *
from     mytable
where  rrn(mytable) not in (
   select minrrn
   from   KeepThese
)

But when I change the "select *" to a DELETE, it fails with:

Message ID . . . . . . :   SQL0199
Severity . . . . . . . :   30
Message type . . . . . :   Diagnostic

Keyword DELETE not expected. Valid tokens: ( SELECT.

Aarrgghh!!

So, I moved the WITH clause down into the query, and it does work (albeit slowly):

delete                                        
from     mytable                        
where  rrn(mytable) not in (                
   select minrrn                              
   from  (SELECT memberNumber,
                            lastName,
                            min(rrn(mytable)) as minrrn
             FROM    mytable              
             group by memberNumber,
                           lastName) as withTable
)

Lastly, I just split it up into several steps, and it does work, and it works much faster. (For those not familiar with the AS/400's "qtemp" schema, it's basically a place to create temporary data that gets automatically dropped when the database connection closes.)

create table qtemp/KeepThese (
  memberNumber integer,
  lastName varchar(25),
  minrrn integer
)

insert into qtemp/KeepThese
SELECT memberNumber,
             lastName,
             min(rrn(mytable))
FROM   mytable
group by memberNumber,
            lastName
with nc

delete                        
from      mytable        
where   rrn(mytable) not in (
   select minrrn              
   from   qtemp/KeepThese      
)

Whew!  There's got to be a better way!

-- DaveSlash
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi Kdo,

>>>>I can't sweat to the inner workings of DB2 for all platforms, but on many database systems, there is no assurance that the rows will be returned to a query in the same order as they are stored.  In fact, there's no assurance that they will be returned in the same order on successive queries.<<<<
This is also the case in MSSQL, that is why there were suggestions for creating an auto incrementing primary key during the import to retain the original order - records will be imported sequentially in my experience.

>>>>Loading the data into a table and then loading the data from that table into the "real" table is a bad idea<<<<
This is a fact finding DB2 mission for me now (I know nothing at the moment)!  Can you import your data directly into an exceptions table, your code suggests it comes from another table in the db but your first comment suggest this is a bad way to go?
Avatar of prmurphy63026
prmurphy63026

Provided the data is in the correct sorted sequence, you could put a unique key constraint on LastName+FirstName or perhaps just MemberNumber then use the LOAD utility to get the data into the table.  The first row will be loaded and the others will fail to load with Unique Key violations....

Thanks to all of you for your help. It steered my brain down the correct path, and that's exactly what I needed.

Thanks again,
DaveSlash

>>>>Loading the data into a table and then loading the data from that table into the "real" table is a bad idea<<<<
>>This is a fact finding DB2 mission for me now (I know nothing at the moment)!  Can you import your data directly into an exceptions table, your code suggests it comes from another table in the db but your first comment suggest this is a bad way to go?

Well, the DB2 load process can handle this automatically.  It will be a lot faster to let the load process work within its own contraints instead of artifically building them.

One of the capabilities of the LOAD utility (process) is that when a record violates any RI, write the record to an exception table.  Attempting to write a second record with the same unique / primary key is an RI violation.  And since Dave wants to keep only the first one, writing the exceptions to another table is just a simple case of creating the table and adding one parameter to the LOAD command.


Kent