• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 647
  • Last Modified:

sas-delete records having count(*)>1

Hi,
i am getting count(*)>1 while running below query so i want to delete dups
and just keep count(*)=1 how can i do that

select item_id||':'||put(date,best.) from table
group by item_id||':'||put(date,best.)
having count(*)>1
0
sam2929
Asked:
sam2929
  • 3
  • 2
1 Solution
 
Aloysius LowCommented:
assuming item_id||':'||put(date,best.) provides the unique identifier to the records to be deleted:

delete from table
where item_id||':'||put(date,best.) in (
    select item_id||':'||put(date,best.)
    from table
    group by item_id||':'||put(date,best.)
    having count(*)>1)

have you tried this?
0
 
IanStatisticianCommented:
Hi sam2929,
proc sort data=TABLE out=NEW_TABLE nodupkey;
by item_id;
run;

Open in new window


You may want to also check if "noduprecs" may fit your purpose better than  "nodupkey".
See  http://support.sas.com/documentation/cdl/en/proc/61895/HTML/default/viewer.htm#a000146878.htm
for more information about sort options.

Ian.
0
 
theartfuldazzlerCommented:
Hi

Alternatively, just instruct SQl to select UNIQUE or DISTINCT records on the select statement:

 PROC SQL;
  SELECT DISTINCT item_id||':'||put(date,best.) from table ;
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

 
IanStatisticianCommented:
proc sort data=TABLE out=NEW_TABLE nodupkey;
by item_id date;
run;

Open in new window


May be required if date field is critical in checking for duplicates.
0
 
Aloysius LowCommented:
reading the question, asker would like to delete records from table where record count is greater than 1, and just keep records where record count is 1, in which case, would be my solution.

however, if asker meant to ask how to delete the duplicates, such that for each and every item_id and date field, only 1 record is left as the end state, then ShannonEE's post above would be right
0
 
IanStatisticianCommented:
so i want to delete dups

I assumed this to mean keep one representative and delete the remaining DUPLICATES though there is a worrying notion about removing information and keeping an arbitrary representative.

 Using a data step it is trivial to keep the first, the last, or in fact the 2nd or 2nd last, or other possibilities in any possible sorting order (using other variables after item_id and date).

More complicated but still possible is to keep the middle record in each group - requires 2 sorts, or to be exact 1 sort and a reversal.


The question as posed wants to delete duplicate records where the duplication is only on the two fields item_id and date, however this throws away more information than the case of deleting duplicates where the duplication is on all fields.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Improve Your Query Performance Tuning

In this FREE six-day email course, you'll learn from Janis Griffin, Database Performance Evangelist. She'll teach 12 steps that you can use to optimize your queries as much as possible and see measurable results in your work. Get started today!

  • 3
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now