Rob4077
asked on
SQL to delete all but last 500 records per customer
I have a growing table of records in a table that I would like to cull. The relevant fields are:
MessageDate
CellNo
What I want to do is delete all but the last 500 records for each CellNo. I know how to define the TOP 500 in an SQL statement but how would I define all but the TOP 500?
MessageDate
CellNo
What I want to do is delete all but the last 500 records for each CellNo. I know how to define the TOP 500 in an SQL statement but how would I define all but the TOP 500?
ASKER
Thanks. How would I modify that to delete all but the last 500 records for EVERY CellNo regardless of date bearing in mind that not all CellNo have 500 records?
DELETE FROM TABLENAME
WHERE
ID NOT IN
(SELECT TOP 500 ID
FROM TABLENAME T2
WHERE T2.CELLNO = TABLENAME.CELLNO)
If there are not 500 records, then nothing would be deleted for a CellNo.
WHERE
ID NOT IN
(SELECT TOP 500 ID
FROM TABLENAME T2
WHERE T2.CELLNO = TABLENAME.CELLNO)
If there are not 500 records, then nothing would be deleted for a CellNo.
TOP n is influence by the order applied
so, in simple terms apply an order where the unwanted records occur first and use TOP n%
To achieve a hard number that is retained (500) I suggest you need to calculate the number of rows (total) then figure out what 500 is of that number in percentage terms, and then delete the percentage that unwanted via TOP n%
something like
@totrows = select count(*) from thatTable
@nPercent = (500*100)/@totrows
@delPercent = 100 - @nPercent
then use "TOP @delPercent%" (which may need to be dynamic SQL ?, needs confirming)
I'm not sure if order by applies to delete (as I never tried it), but assuming it doesn't, to ensure you get the right records might require selecting IDs from the table-to-cull into a temp table so that order by does apply.
Then delete from the table-to-cull
joined to the temp table (which has "TOP @delPercent%" IDs in it)
then drop the temp table
Sorry if this is a bit cryptic - as I've not actually ever tried what you have asked for - but I think the above might be a workable (if not well explained) approach.
btw: I am assuming approximately 500 would be acceptable
{+edit, now I read what I just submitted maybe percent isn't needed and absolute numbers used instead - except for that, the same method would apply}
so, in simple terms apply an order where the unwanted records occur first and use TOP n%
To achieve a hard number that is retained (500) I suggest you need to calculate the number of rows (total) then figure out what 500 is of that number in percentage terms, and then delete the percentage that unwanted via TOP n%
something like
@totrows = select count(*) from thatTable
@nPercent = (500*100)/@totrows
@delPercent = 100 - @nPercent
then use "TOP @delPercent%" (which may need to be dynamic SQL ?, needs confirming)
I'm not sure if order by applies to delete (as I never tried it), but assuming it doesn't, to ensure you get the right records might require selecting IDs from the table-to-cull into a temp table so that order by does apply.
Then delete from the table-to-cull
joined to the temp table (which has "TOP @delPercent%" IDs in it)
then drop the temp table
Sorry if this is a bit cryptic - as I've not actually ever tried what you have asked for - but I think the above might be a workable (if not well explained) approach.
btw: I am assuming approximately 500 would be acceptable
{+edit, now I read what I just submitted maybe percent isn't needed and absolute numbers used instead - except for that, the same method would apply}
or, use something sensible like the one above mine.
what was I thinking?
what was I thinking?
ASKER
Just to clarify.
The table contains thousands of records. One field is CellNo and another is MessageDate. Of course there are a number of other fields. Currently messages come in and are added to the table as they arrive. In some cases I now have over 2000 records with the same CellNo. In other cases I only have 2 or 3 records with the same CellNo. What I want to do is only keep the most recent 500 records for each CellNo.
I know how to do it by creating a recordset and looping through the whole table but was wondering if there was an easy way of doing it with SQL. If there's not please let me know and I will write the VBA to loop through the recordset.
The table contains thousands of records. One field is CellNo and another is MessageDate. Of course there are a number of other fields. Currently messages come in and are added to the table as they arrive. In some cases I now have over 2000 records with the same CellNo. In other cases I only have 2 or 3 records with the same CellNo. What I want to do is only keep the most recent 500 records for each CellNo.
I know how to do it by creating a recordset and looping through the whole table but was wondering if there was an easy way of doing it with SQL. If there's not please let me know and I will write the VBA to loop through the recordset.
ASKER
Looking at the answers provided, I think looping through the recordset is going to be a much quicker, simpler and more elegant solution - do you agree?
ASKER
I had something like the following in mind (untested). I'm starting to think this may be better than trying to figure out some SQL code that will do it
sSql = "Select queue.* from Queue orderby CellNo, MessageDate DESC"
set rs = currentdb.opendynaset(sSql )
LastCellNo = rs!CellNo
lRecCount = 0
Do until rs.eof
if rs!CellNo <> LastCellNo then lRecCount = 0
lRecCount = lRecount + 1
if lRecCount > 500 then
rs.delete
else
rs.movenext
Endif
loop
sSql = "Select queue.* from Queue orderby CellNo, MessageDate DESC"
set rs = currentdb.opendynaset(sSql
LastCellNo = rs!CellNo
lRecCount = 0
Do until rs.eof
if rs!CellNo <> LastCellNo then lRecCount = 0
lRecCount = lRecount + 1
if lRecCount > 500 then
rs.delete
else
rs.movenext
Endif
loop
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Thanks.. That all makes sense. Appreciate your help
and thank you! Cheers, Paul
WHERE
MESSAGEDATE = ??
AND CELLNO = ??
AND ID NOT IN
(SELECT TOP 500 ID
FROM TABLENAME
WHERE MESSAGEDATE = ?? AND CELLNO = ??)