Solved

Need to get rid off some duplicate records from the table under specific conditions

Posted on 2004-09-20
11
165 Views
Last Modified: 2010-05-18
I have two tables in the database:
ProductGroup with fields: GroupID, GroupName
Product with fields: ProdID, GroupID, ProdName, DistribCode, ExpiryDate

I have to write a store procedure of cursor (I am not sure what will fit the best) that will get rid of unneeded records.
It must first look for ProdName, if there are some duplicates then must check DistribCode, if can find the same code for this ProdName then must look in GroupName for word "French".If can find it , delete all occurences of the same ProductName with the same DistribCode except one that have "French" in parent table.
If can't find "French" then it must keep just a record with the latest expiry date.

I will give you some data to better describe what I am looking for:

ProductGroup table:

GroupID                           GroupName
-----------------------------------------------------------------------------------------------------------------
5000                                Cheese
5001                                Valley cheese
5002                                French cheese
5003                                Village cheese
5004                                Milk

Product table:

ProdID             GroupID              ProdName                     DistribCode                      ExpiryDate
------------------------------------------------------------------------------------------------------------------------
1000                5000                  Country cheese                  30                              01-09-2004
1001                5001                  Country cheese                  30                              01-09-2004
1002                5001                  Country cheese                  30                              01-10-2004
1003                5000                  Country cheese                100                              01-10-2004
1004                5000                  Country cheese                150                              01-11-2004
1005                5002                  Country cheese                150                              01-09-2004


After running the code I need to have just 3 rows.

ProdID             GroupID              ProdName                     DistribCode                      ExpiryDate
---------------------------------------------------------------------------------------------------------------
1002                5001                  Country cheese                  30                              01-10-2004

because there is no "French" word in GroupName, so record with the latest expiry date was kept in the table

ProdID             GroupID              ProdName                     DistribCode                      ExpiryDate
---------------------------------------------------------------------------------------------------------------
1003                5000                  Country cheese                100                              01-10-2004
because there is only one record for this ProdName and this DistribCode

ProdID             GroupID              ProdName                     DistribCode                      ExpiryDate
---------------------------------------------------------------------------------------------------------------
1005                5002                  Country cheese                150                              01-09-2004
because "French"  word from GroupName has the higher priority than expirydate.

Thx
Iwona


0
Comment
Question by:iwona62
  • 5
  • 5
11 Comments
 
LVL 8

Expert Comment

by:MartinCMS
ID: 12103868
Don't know if this will work for you or not since I don't have the data to test.  Give it a try..... This is a select Statement without dups, not deleting your dups.

select ProdID,pg.GroupID,ProdName,DistribCode,ExpiryDate
From      (select * From       ProductGroup pg join Product p on pg.GroupID = p.GroupID) t1
Where      EXISTS (SELECT 1
            FROM   (select * From       ProductGroup pg join Product p on pg.GroupID = p.GroupID) AS yt2
              WHERE  yt2.ProdName = t1.ProdName AND
                 yt2.GroupName like '%French%' AND
                     yt2.ProdID < t1.ProdID)
0
 
LVL 14

Accepted Solution

by:
Jan_Franek earned 500 total points
ID: 12104230
I would do it this way:

1. gather important data:

select
p.ProdName,
p.DistribCode,
count(1) as duplicates,
max(p.ExpiryDate) as maxdate,
case when exists ( select 1 from ProductGroup pg1, Product p1 where pg1.GroupID = p1.GroupID and p1.ProdName = p.ProdName and p1.DistribCode = p.DistribCode and pg1.GroupName like "French%" ) then 1 else 0 end as french
from Product p
group by p.ProdName, p.DistribCode

this query should show, whether for given ProdName and DistribCode there are duplicate rows, it show maximum ExpiryDate and also it shows where there exists group begining with French.

2. browse through this data using cursor and do appropriate action:

- if there is no duplicate - no action
- else if there is French - delete non French records
- else if there is no French - delete rows with lower expiry date

0
 

Author Comment

by:iwona62
ID: 12104906
Sorry,
I made mistake.
ExpiryDate column  also exist in ProductGroup table, not in Product table.
How this change will affect sql code?
0
Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

 

Author Comment

by:iwona62
ID: 12105344
Hi Janek,

I used your query actually modifying to be able retrieve also max(expirydate).

select
p.ProdName,
p.DistribCode,
count(1) as duplicates,
max(pg.ExpiryDate) as maxdate,
case when exists ( select 1 from ProductGroup pg1, Product p1 where pg1.GroupID = p1.GroupID and p1.ProdName = p.ProdName and p1.DistribCode = p.DistribCode and pg1.GroupName like "French%" ) then 1 else 0 end as french
from Product p, ProductGroup pg
where p.GroupID= pg.GroupID
group by p.ProdName, p.DistribCode


I have additional question.
How I can handle the situation if I will have more than one record with the same max(expirydate) and/or with word "French".
Actually after running the query I have this issue, even worse because I have "French" and the same dates.
Take a look:

ProductName        DistribCode Duplicates  ExpiryDate               French
----------------------------------------------------------------------------------------
Pills                         30             2           2002-05-24 00:00:00.000     1
Pills                         30             3           2002-05-24 00:00:00.000     1
0
 
LVL 14

Expert Comment

by:Jan_Franek
ID: 12109809
First - it's strangem that you have 2 rows with the same ProductName and DistribCode in the result of that query - there should be just one (because of GROUP BY).

Second - if you really have the problem with more than one record with the same max(expirydate) or French, you have to define more specific criteria - for example - if there are two rows with French, then keep the one with max(expirydate), if there are two with the same expirydate, keep the one with maximum ProdID. But this has to be defined by you or somebody, who assigned you this task.
0
 

Author Comment

by:iwona62
ID: 12113079
Yes, it make sense.

Now when I am trying to delete records from Product table (child table) I have a problem because I have to check conditions in parent table ProductGroup like expirydate.
How I can combine those two tables in delete statement, but delete only from Product.
0
 
LVL 14

Expert Comment

by:Jan_Franek
ID: 12113116
delete Product
from Product p, ProductGroup pg
where ....
0
 

Author Comment

by:iwona62
ID: 12115321
Yes, it works.
If I need to update the child table and have ProductName to be a concatanation of GroupName (from ProductGroup table) and ProductID (from Product table) how the statement will look like?
0
 
LVL 14

Expert Comment

by:Jan_Franek
ID: 12116893
update Products
set ProductName = pg.GroupName + p.ProductID -- may be you need some type conversion here
from Product p, ProductGroup pg
where ....
0
 

Author Comment

by:iwona62
ID: 12116974
Thanks Jan,

Actually I found out.

My statement is:

update Products
set Products.ProductName = Products.ProductName + ProductGroup.DistribCode
from Products
inner join ProductGroup
on ProductGroup.GroupID = Products.GroupID

I am sure that your statement will work too.

Thx for your help and quick response

Iwona
0
 
LVL 14

Expert Comment

by:Jan_Franek
ID: 12116990
Glad to help you :-)
0

Featured Post

VMware Disaster Recovery and Data Protection

In this expert guide, you’ll learn about the components of a Modern Data Center. You will use cases for the value-added capabilities of Veeam®, including combining backup and replication for VMware disaster recovery and using replication for data center migration.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
sql server insert 12 30
Sql server get data from a usp to use in a usp 5 16
Help in Bulk Insert 9 35
SQL invalid column name 5 13
I wrote this interesting script that really help me find jobs or procedures when working in a huge environment. I could I have written it as a Procedure but then I would have to have it on each machine or have a link to a server-related search that …
Slowly Changing Dimension Transformation component in data task flow is very useful for us to manage and control how data changes in SSIS.
Via a live example, show how to extract information from SQL Server on Database, Connection and Server properties
Via a live example, show how to backup a database, simulate a failure backup the tail of the database transaction log and perform the restore.

810 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question