Solved

FETCH, then DELETE

Posted on 2007-11-19
13
3,873 Views
Last Modified: 2008-02-01
Greetings, experts!

I'm receiving some invalid data from an external source, and I need to "clean it up" before loading it into our main database. Based on two columns in the table, I need to take the FIRST row.

For instance, given the following data:

memberNumber  lastName   firstName  effectiveDate
12345                 jarzombek  ron          2007-10-01
12345                 jarzombek  ron          2005-01-01
54321                 mcmaster   jason       2001-01-01
34567                 keyser       doug        2006-01-01
34567                 keyser       doug        2001-11-01

I need:

memberNumber  lastName   firstName  effectiveDate
12345                 jarzombek  ron          2007-10-01
54321                 mcmaster   jason       2001-01-01
34567                 keyser       doug        2006-01-01

I thought about looping through a cursor, record by record. I'd check to see if I already encountered this memberNumber and lastName, and write it to a separate file if I haven't.

Alternately, I thought about deleting the duplicate records as I encounter them in the FETCH loop, but it mildly scares me to delete records out of a cursor I'm in the middle of looping through. (Would that be a problem?)

So, which of these alternatives would you choose?  Or do you have another alternative?

I'm using DB2 for iSeries v5r3

Thanks!
DaveSlash
0
Comment
Question by:Dave Ford
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 3
  • 2
  • +4
13 Comments
 
LVL 15

Accepted Solution

by:
mcmonap earned 100 total points
ID: 20316666
Hi daveslash,

This is from a MSSQL perspective so sorry if it doesn't apply.
Can you not import all records into a temporary table, from the temporary table run something like the query below?
These results could then be collected as your "real" data.  If you need to combine with data already imported then you could consider some kind of union, you might also need to convert the date to a proper date/time datatype (not sure what is available in DB2)
SELECT
	memberNumber
	, lastName
	, firstName
	MAX(effectiveDate)
FROM
	tblTemp
GROUP BY
	memberNumber
	, lastName
	, firstName

Open in new window

0
 
LVL 18

Author Comment

by:Dave Ford
ID: 20316702

Thanks, mcmonap!  That's a great idea.

I think the only problem might be that the FIRST record may not always have the max effectiveDate. If it does, though, then your solution is WAY better than mine.

I'll verify with the sender of the data if I can make that assumption.

Thanks again,
Dave
0
 
LVL 35

Assisted Solution

by:David Todd
David Todd earned 100 total points
ID: 20316741
Hi,

If you cant use the effectiveDate, then in the temp table use an integer identity, and use that to select the records

HTH
  David

PS Personally, I would be more likely to pick Max( effectiveDate ), maybe make sure that it is less than current datetime.
select
	tt.memberNumber
	, tt.LastName
	, tt.FirstName
	, tt.EffectiveDate
from tblTemp tt
where tt.ID in
	(
	SELECT
		min( tti.ID )
	FROM
		tblTemp tti
	GROUP BY
		tti.memberNumber
		, tti.lastName
		, tti.firstName
	)

Open in new window

0
Resolve Critical IT Incidents Fast

If your data, services or processes become compromised, your organization can suffer damage in just minutes and how fast you communicate during a major IT incident is everything. Learn how to immediately identify incidents & best practices to resolve them quickly and effectively.

 
LVL 15

Expert Comment

by:mcmonap
ID: 20316799
Hi daveslash,

It shouldn't matter if it is the first or not ie if the data was this:
memberNumber  lastName   firstName  effectiveDate
12345                 jarzombek  ron          2002-10-01
12345                 jarzombek  ron          2003-01-01
12345                 jarzombek  ron          2007-10-01
12345                 jarzombek  ron          1999-01-01

You would get back:
memberNumber  lastName   firstName  effectiveDate
12345                 jarzombek  ron          2007-10-01

(At least I would in MS SQL!)
0
 
LVL 18

Author Comment

by:Dave Ford
ID: 20317024

Hi mcmonap and dtodd,

mcmonap, that again for your help. In your example, what I would want is the FIRST matching row:

12345                 jarzombek  ron          2002-10-01

dtodd, I'm looking into your suggestion. I'll get back to you.

Thanks!
DaveSlash

0
 
LVL 18

Author Comment

by:Dave Ford
ID: 20317123

Ok, here's a bit more info. Thanks to everyone for getting my brain down the right path.

This returns the correct rows to delete:

with KeepThese as (                  
SELECT memberNumber,
             lastName,              
             min(rrn(mytable)) as minrrn
FROM   mytable              
group by memberNumber,
             lastName
)
select   *
from     mytable
where  rrn(mytable) not in (
   select minrrn
   from   KeepThese
)

But when I change the "select *" to a DELETE, it fails with:

Message ID . . . . . . :   SQL0199
Severity . . . . . . . :   30
Message type . . . . . :   Diagnostic

Keyword DELETE not expected. Valid tokens: ( SELECT.

Aarrgghh!!

So, I moved the WITH clause down into the query, and it does work (albeit slowly):

delete                                        
from     mytable                        
where  rrn(mytable) not in (                
   select minrrn                              
   from  (SELECT memberNumber,
                            lastName,
                            min(rrn(mytable)) as minrrn
             FROM    mytable              
             group by memberNumber,
                           lastName) as withTable
)

Lastly, I just split it up into several steps, and it does work, and it works much faster. (For those not familiar with the AS/400's "qtemp" schema, it's basically a place to create temporary data that gets automatically dropped when the database connection closes.)

create table qtemp/KeepThese (
  memberNumber integer,
  lastName varchar(25),
  minrrn integer
)

insert into qtemp/KeepThese
SELECT memberNumber,
             lastName,
             min(rrn(mytable))
FROM   mytable
group by memberNumber,
            lastName
with nc

delete                        
from      mytable        
where   rrn(mytable) not in (
   select minrrn              
   from   qtemp/KeepThese      
)

Whew!  There's got to be a better way!

-- DaveSlash
0
 
LVL 37

Assisted Solution

by:momi_sabag
momi_sabag earned 100 total points
ID: 20318059
hi
what do you mean by a better way ? you want to be able to do it using 1 delete command that will run just as fast as your last solution ?
it seems as they both should run roughly the same amount of time since the sql is pretty much the same,
but i would think about a different approach
how about this :
1) insert the records to keep into the temp table (as you do)
2) delete all the records from the regular table
3) insert the records from the temp table back into the original table

that should run significantlly faster under the assumption that your table is relativly large, and the amount of records you wish to keep is less than 50% (when you perform delete from table then db2 does a mass delete which is much faster than a regular delete)
anyways, before using this solution you should check if you have
1) triggers on the table - that might cause you some trouble
2) foreign keys on that table

0
 
LVL 50

Assisted Solution

by:Lowfatspread
Lowfatspread earned 100 total points
ID: 20318061
1) you could declare the cursor as FOR UPDATE and use
     DELETE WHERE CURRENT OF CURSOR

2) do you really need a cursor to perform your updates?

     Insert into yourtable (column list)
       Select (ColumnList)
          from inputtable as A
          Where not exists (select memberno from inputtable as x
                                            where a.memberno=x.memberno
                                             and x.effectivedate>a.effectivedate)

3) if you really want the first record in the "file" rather than the lastest dated one
     then you need to import it into a table with a sequence column ...
     and use that as the indicator for first (ie min(seq) for the memberno
                    Where not exists (select memberno from inputtable as x
                                            where a.memberno=x.memberno
                                             and x.seqno<a.seqno)
         
0
 
LVL 45

Assisted Solution

by:Kent Olsen
Kent Olsen earned 100 total points
ID: 20334969
Hi Dave,

I've been on vacation for the past couple of days :) and just got back to the forum.

Loading the data into a table and then loading the data from that table into the "real" table is a bad idea.  Database files are not sequential host files and shouldn't be treated as such.  I can't sweat to the inner workings of DB2 for all platforms, but on many database systems, there is no assurance that the rows will be returned to a query in the same order as they are stored.  In fact, there's no assurance that they will be returned in the same order on successive queries.

DB2 does have a feature that allows you to do exactly what you want to do at load time.  It's called an "exception table".

It appears that the column "membernumber" is the primary key.  If the "real" table has it defined as a primary key, you can load directly into it.  If not, create another table with the mumbernumber columns as the primary key.

  CREATE TABLE mytable as (SELECT * FROM realtable) definition only;
  ALTER TABLE mytable ADD PRIMARY KEY (membernumber);

Then create an exception table exactly like the original table

  CREATE TABLE mytable_exceptions as (SELECT * FROM mytable) definition only;

Then load the table

  LOAD ... INTO mytable FOR EXCEPTION mytable_exceptions


The first item will get loaded, with all duplicates being loaded to the exception table.



Good Luck,
Kent
0
 
LVL 15

Expert Comment

by:mcmonap
ID: 20342192
Hi Kdo,

>>>>I can't sweat to the inner workings of DB2 for all platforms, but on many database systems, there is no assurance that the rows will be returned to a query in the same order as they are stored.  In fact, there's no assurance that they will be returned in the same order on successive queries.<<<<
This is also the case in MSSQL, that is why there were suggestions for creating an auto incrementing primary key during the import to retain the original order - records will be imported sequentially in my experience.

>>>>Loading the data into a table and then loading the data from that table into the "real" table is a bad idea<<<<
This is a fact finding DB2 mission for me now (I know nothing at the moment)!  Can you import your data directly into an exceptions table, your code suggests it comes from another table in the db but your first comment suggest this is a bad way to go?
0
 

Expert Comment

by:prmurphy63026
ID: 20383827
Provided the data is in the correct sorted sequence, you could put a unique key constraint on LastName+FirstName or perhaps just MemberNumber then use the LOAD utility to get the data into the table.  The first row will be loaded and the others will fail to load with Unique Key violations....
0
 
LVL 18

Author Comment

by:Dave Ford
ID: 20383876

Thanks to all of you for your help. It steered my brain down the correct path, and that's exactly what I needed.

Thanks again,
DaveSlash
0
 
LVL 45

Expert Comment

by:Kent Olsen
ID: 20384531

>>>>Loading the data into a table and then loading the data from that table into the "real" table is a bad idea<<<<
>>This is a fact finding DB2 mission for me now (I know nothing at the moment)!  Can you import your data directly into an exceptions table, your code suggests it comes from another table in the db but your first comment suggest this is a bad way to go?

Well, the DB2 load process can handle this automatically.  It will be a lot faster to let the load process work within its own contraints instead of artifically building them.

One of the capabilities of the LOAD utility (process) is that when a record violates any RI, write the record to an exception table.  Attempting to write a second record with the same unique / primary key is an RI violation.  And since Dave wants to keep only the first one, writing the exceptions to another table is just a simple case of creating the table and adding one parameter to the LOAD command.


Kent
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

PL/SQL can be a very powerful tool for working directly with database tables. Being able to loop will allow you to perform more complex operations, but can be a little tricky to write correctly. This article will provide examples of basic loops alon…
Occasionally there is a need to clean table columns, especially if you have inherited legacy data. There are obviously many ways to accomplish that, including elaborate UPDATE queries with anywhere from one to numerous REPLACE functions (even within…
Attackers love to prey on accounts that have privileges. Reducing privileged accounts and protecting privileged accounts therefore is paramount. Users, groups, and service accounts need to be protected to help protect the entire Active Directory …
Finding and deleting duplicate (picture) files can be a time consuming task. My wife and I, our three kids and their families all share one dilemma: Managing our pictures. Between desktops, laptops, phones, tablets, and cameras; over the last decade…

737 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question