Solved

FETCH, then DELETE

Posted on 2007-11-19
13
3,860 Views
Last Modified: 2008-02-01
Greetings, experts!

I'm receiving some invalid data from an external source, and I need to "clean it up" before loading it into our main database. Based on two columns in the table, I need to take the FIRST row.

For instance, given the following data:

memberNumber  lastName   firstName  effectiveDate
12345                 jarzombek  ron          2007-10-01
12345                 jarzombek  ron          2005-01-01
54321                 mcmaster   jason       2001-01-01
34567                 keyser       doug        2006-01-01
34567                 keyser       doug        2001-11-01

I need:

memberNumber  lastName   firstName  effectiveDate
12345                 jarzombek  ron          2007-10-01
54321                 mcmaster   jason       2001-01-01
34567                 keyser       doug        2006-01-01

I thought about looping through a cursor, record by record. I'd check to see if I already encountered this memberNumber and lastName, and write it to a separate file if I haven't.

Alternately, I thought about deleting the duplicate records as I encounter them in the FETCH loop, but it mildly scares me to delete records out of a cursor I'm in the middle of looping through. (Would that be a problem?)

So, which of these alternatives would you choose?  Or do you have another alternative?

I'm using DB2 for iSeries v5r3

Thanks!
DaveSlash
0
Comment
Question by:daveslash
  • 4
  • 3
  • 2
  • +4
13 Comments
 
LVL 15

Accepted Solution

by:
mcmonap earned 100 total points
ID: 20316666
Hi daveslash,

This is from a MSSQL perspective so sorry if it doesn't apply.
Can you not import all records into a temporary table, from the temporary table run something like the query below?
These results could then be collected as your "real" data.  If you need to combine with data already imported then you could consider some kind of union, you might also need to convert the date to a proper date/time datatype (not sure what is available in DB2)
SELECT

	memberNumber

	, lastName

	, firstName

	MAX(effectiveDate)

FROM

	tblTemp

GROUP BY

	memberNumber

	, lastName

	, firstName

Open in new window

0
 
LVL 18

Author Comment

by:daveslash
ID: 20316702

Thanks, mcmonap!  That's a great idea.

I think the only problem might be that the FIRST record may not always have the max effectiveDate. If it does, though, then your solution is WAY better than mine.

I'll verify with the sender of the data if I can make that assumption.

Thanks again,
Dave
0
 
LVL 35

Assisted Solution

by:David Todd
David Todd earned 100 total points
ID: 20316741
Hi,

If you cant use the effectiveDate, then in the temp table use an integer identity, and use that to select the records

HTH
  David

PS Personally, I would be more likely to pick Max( effectiveDate ), maybe make sure that it is less than current datetime.
select

	tt.memberNumber

	, tt.LastName

	, tt.FirstName

	, tt.EffectiveDate

from tblTemp tt

where tt.ID in

	(

	SELECT

		min( tti.ID )

	FROM

		tblTemp tti

	GROUP BY

		tti.memberNumber

		, tti.lastName

		, tti.firstName

	)

Open in new window

0
 
LVL 15

Expert Comment

by:mcmonap
ID: 20316799
Hi daveslash,

It shouldn't matter if it is the first or not ie if the data was this:
memberNumber  lastName   firstName  effectiveDate
12345                 jarzombek  ron          2002-10-01
12345                 jarzombek  ron          2003-01-01
12345                 jarzombek  ron          2007-10-01
12345                 jarzombek  ron          1999-01-01

You would get back:
memberNumber  lastName   firstName  effectiveDate
12345                 jarzombek  ron          2007-10-01

(At least I would in MS SQL!)
0
 
LVL 18

Author Comment

by:daveslash
ID: 20317024

Hi mcmonap and dtodd,

mcmonap, that again for your help. In your example, what I would want is the FIRST matching row:

12345                 jarzombek  ron          2002-10-01

dtodd, I'm looking into your suggestion. I'll get back to you.

Thanks!
DaveSlash

0
 
LVL 18

Author Comment

by:daveslash
ID: 20317123

Ok, here's a bit more info. Thanks to everyone for getting my brain down the right path.

This returns the correct rows to delete:

with KeepThese as (                  
SELECT memberNumber,
             lastName,              
             min(rrn(mytable)) as minrrn
FROM   mytable              
group by memberNumber,
             lastName
)
select   *
from     mytable
where  rrn(mytable) not in (
   select minrrn
   from   KeepThese
)

But when I change the "select *" to a DELETE, it fails with:

Message ID . . . . . . :   SQL0199
Severity . . . . . . . :   30
Message type . . . . . :   Diagnostic

Keyword DELETE not expected. Valid tokens: ( SELECT.

Aarrgghh!!

So, I moved the WITH clause down into the query, and it does work (albeit slowly):

delete                                        
from     mytable                        
where  rrn(mytable) not in (                
   select minrrn                              
   from  (SELECT memberNumber,
                            lastName,
                            min(rrn(mytable)) as minrrn
             FROM    mytable              
             group by memberNumber,
                           lastName) as withTable
)

Lastly, I just split it up into several steps, and it does work, and it works much faster. (For those not familiar with the AS/400's "qtemp" schema, it's basically a place to create temporary data that gets automatically dropped when the database connection closes.)

create table qtemp/KeepThese (
  memberNumber integer,
  lastName varchar(25),
  minrrn integer
)

insert into qtemp/KeepThese
SELECT memberNumber,
             lastName,
             min(rrn(mytable))
FROM   mytable
group by memberNumber,
            lastName
with nc

delete                        
from      mytable        
where   rrn(mytable) not in (
   select minrrn              
   from   qtemp/KeepThese      
)

Whew!  There's got to be a better way!

-- DaveSlash
0
6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

 
LVL 37

Assisted Solution

by:momi_sabag
momi_sabag earned 100 total points
ID: 20318059
hi
what do you mean by a better way ? you want to be able to do it using 1 delete command that will run just as fast as your last solution ?
it seems as they both should run roughly the same amount of time since the sql is pretty much the same,
but i would think about a different approach
how about this :
1) insert the records to keep into the temp table (as you do)
2) delete all the records from the regular table
3) insert the records from the temp table back into the original table

that should run significantlly faster under the assumption that your table is relativly large, and the amount of records you wish to keep is less than 50% (when you perform delete from table then db2 does a mass delete which is much faster than a regular delete)
anyways, before using this solution you should check if you have
1) triggers on the table - that might cause you some trouble
2) foreign keys on that table

0
 
LVL 50

Assisted Solution

by:Lowfatspread
Lowfatspread earned 100 total points
ID: 20318061
1) you could declare the cursor as FOR UPDATE and use
     DELETE WHERE CURRENT OF CURSOR

2) do you really need a cursor to perform your updates?

     Insert into yourtable (column list)
       Select (ColumnList)
          from inputtable as A
          Where not exists (select memberno from inputtable as x
                                            where a.memberno=x.memberno
                                             and x.effectivedate>a.effectivedate)

3) if you really want the first record in the "file" rather than the lastest dated one
     then you need to import it into a table with a sequence column ...
     and use that as the indicator for first (ie min(seq) for the memberno
                    Where not exists (select memberno from inputtable as x
                                            where a.memberno=x.memberno
                                             and x.seqno<a.seqno)
         
0
 
LVL 45

Assisted Solution

by:Kdo
Kdo earned 100 total points
ID: 20334969
Hi Dave,

I've been on vacation for the past couple of days :) and just got back to the forum.

Loading the data into a table and then loading the data from that table into the "real" table is a bad idea.  Database files are not sequential host files and shouldn't be treated as such.  I can't sweat to the inner workings of DB2 for all platforms, but on many database systems, there is no assurance that the rows will be returned to a query in the same order as they are stored.  In fact, there's no assurance that they will be returned in the same order on successive queries.

DB2 does have a feature that allows you to do exactly what you want to do at load time.  It's called an "exception table".

It appears that the column "membernumber" is the primary key.  If the "real" table has it defined as a primary key, you can load directly into it.  If not, create another table with the mumbernumber columns as the primary key.

  CREATE TABLE mytable as (SELECT * FROM realtable) definition only;
  ALTER TABLE mytable ADD PRIMARY KEY (membernumber);

Then create an exception table exactly like the original table

  CREATE TABLE mytable_exceptions as (SELECT * FROM mytable) definition only;

Then load the table

  LOAD ... INTO mytable FOR EXCEPTION mytable_exceptions


The first item will get loaded, with all duplicates being loaded to the exception table.



Good Luck,
Kent
0
 
LVL 15

Expert Comment

by:mcmonap
ID: 20342192
Hi Kdo,

>>>>I can't sweat to the inner workings of DB2 for all platforms, but on many database systems, there is no assurance that the rows will be returned to a query in the same order as they are stored.  In fact, there's no assurance that they will be returned in the same order on successive queries.<<<<
This is also the case in MSSQL, that is why there were suggestions for creating an auto incrementing primary key during the import to retain the original order - records will be imported sequentially in my experience.

>>>>Loading the data into a table and then loading the data from that table into the "real" table is a bad idea<<<<
This is a fact finding DB2 mission for me now (I know nothing at the moment)!  Can you import your data directly into an exceptions table, your code suggests it comes from another table in the db but your first comment suggest this is a bad way to go?
0
 

Expert Comment

by:prmurphy63026
ID: 20383827
Provided the data is in the correct sorted sequence, you could put a unique key constraint on LastName+FirstName or perhaps just MemberNumber then use the LOAD utility to get the data into the table.  The first row will be loaded and the others will fail to load with Unique Key violations....
0
 
LVL 18

Author Comment

by:daveslash
ID: 20383876

Thanks to all of you for your help. It steered my brain down the correct path, and that's exactly what I needed.

Thanks again,
DaveSlash
0
 
LVL 45

Expert Comment

by:Kdo
ID: 20384531

>>>>Loading the data into a table and then loading the data from that table into the "real" table is a bad idea<<<<
>>This is a fact finding DB2 mission for me now (I know nothing at the moment)!  Can you import your data directly into an exceptions table, your code suggests it comes from another table in the db but your first comment suggest this is a bad way to go?

Well, the DB2 load process can handle this automatically.  It will be a lot faster to let the load process work within its own contraints instead of artifically building them.

One of the capabilities of the LOAD utility (process) is that when a record violates any RI, write the record to an exception table.  Attempting to write a second record with the same unique / primary key is an RI violation.  And since Dave wants to keep only the first one, writing the exceptions to another table is just a simple case of creating the table and adding one parameter to the LOAD command.


Kent
0

Featured Post

What Security Threats Are You Missing?

Enhance your security with threat intelligence from the web. Get trending threat insights on hackers, exploits, and suspicious IP addresses delivered to your inbox with our free Cyber Daily.

Join & Write a Comment

Confronted with some SQL you don't know can be a daunting task. It can be even more daunting if that SQL carries some of the old secret codes used in the Ye Olde query syntax, such as: (+)     as used in Oracle;     *=     =*    as used in Sybase …
This article describes how to use the timestamp of existing data in a database to allow Tableau to calculate the prior work day instead of relying on case statements or if statements to calculate the days of the week.
Internet Business Fax to Email Made Easy - With eFax Corporate (http://www.enterprise.efax.com), you'll receive a dedicated online fax number, which is used the same way as a typical analog fax number. You'll receive secure faxes in your email, fr…
In this tutorial you'll learn about bandwidth monitoring with flows and packet sniffing with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're interested in additional methods for monitoring bandwidt…

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

9 Experts available now in Live!

Get 1:1 Help Now