• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 725
  • Last Modified:

SELECT only duplicate records

Hello,

How do I select the records where a user has submitted exactly same values multiple times, primary key field is an auto.

A hypothetical example would be the same user bought the same item and same quantity.

id, email-id,                item_code, item_qty
1,  abc@mymail.com, 9,                1
.., .., .., ..
.., .., .., ..
.., .., .., ..
50,  abc@mymail.com, 9,               1

Thanks
0
practitioner
Asked:
practitioner
  • 8
  • 3
  • 3
  • +3
5 Solutions
 
Ivo StoykovCommented:
The easiest way to avoid duplicates (if I've understood correct) is to set an unique key on each column you'd like to avoid this.

As to select duplicates use HAVING clause - i.e.

select duplicated_column_name, count(duplicated_column_name) cnt
from your_table
group by duplicated_column_name
having count (duplicated_column_name) > 1;

Open in new window


HTH

Ivo Stoykov
0
 
practitionerAuthor Commented:
Thanks Ivo,

probably I couldn't state properly, I do not want this, I want to list the duplicate records where values in all the fields but primary key are identical.
0
 
lwadwellCommented:
SELECT email-id, item_code, item_qty, count(*)
FROM your_table
GROUP BY email-id, item_code, item_qty
HAVING count(*) > 1
0
Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

 
hforhirenpatelCommented:
Hi, I think this may help u.

I made one table for practical NAME: customer

IN THAT TABLE DATA ARE GIVEN BELOW


 select * from customer;
+-------------+-------+---------+---------------------+-------+
| CUSTOMER_ID | NAME  | ADDRESS | CREATED_DATE        | email |
+-------------+-------+---------+---------------------+-------+
|           1 | hiren | xyz     | 2012-09-20 11:17:57 | pqr   |
|           2 | kp    | zzz     | 2012-09-20 11:18:14 | rst   |
|           3 | sp    | aaa     | 2012-09-20 11:18:39 | yyy   |
|           4 | hiren | xyz     | 2012-09-20 11:18:56 | pqr   |
|           5 | hiren | xyz     | 2012-09-20 11:19:01 | rest  |
|           6 | hiren | z       | 2012-09-20 11:19:13 | pqr   |
|           7 | sp    | nnn     | 2012-09-20 11:19:28 | yyy   |
|           8 | sp    | nnn     | 2012-09-20 11:19:33 | yyy   |
+-------------+-------+---------+---------------------+-------+
8 rows in set (0.00 sec)

now i made one query that u want some thing like that:

select NAME,count(*) from customer group by ADDRESS,NAME,email having count(*)>1 ;
+-------+----------+
| NAME  | count(*) |
+-------+----------+
| sp    |        2 |
| hiren |        2 |
+-------+----------+
2 rows in set (0.01 sec)

I think this same thing u want.

NORMALIZE QUERY for your table :
select email-id,count(*) count from TABLE_NAME group by email-id,item_code,item_qty having count(*) > 1;
0
 
practitionerAuthor Commented:
Thanks lwadwell and hforhirenpatel,

what if I have 61 columns in the table? I believe this will be a hit on the performance? Do you suggest some other method to get the duplicate records?
0
 
tel2Commented:
Q1. How many rows are in your table, practitioner?
Q2. How often are you going to want to run this check?
Q3. How long do the above solutions run for?  Have you tried them yet?
0
 
hforhirenpatelCommented:
Hey practitioner,

If you donot have many rows and you don't need to run this often you can go for group by.
0
 
practitionerAuthor Commented:
Hi hforhirenpatel, tel2

As of now, I've 35000+ rows but I'll be limiting this by specifying the WHERE clause, records of a given email id and only for last 3 days. So mostly the returned value would be 0, I mean none. In some cases it would be 2 or 3 records at the most.

Thanks
0
 
tel2Commented:
So what were the answers to my Q2 & Q3?
0
 
Ivo StoykovCommented:
If you do not have idea how many columns you;re going to check for duplicates then you must create your query dynamically including all desired columns into the having clause

HTH

Ivo Stoykov
0
 
practitionerAuthor Commented:
@tel2,
Sorry I missed the last two questions.
We are checking for duplicate orders so
2. I'll be running it through CRON every 15 mts.
3. Forever, I've not tried these solutions yet.

@Ivo, thanks
0
 
SAMIR BHOGAYTAFreelancer and IT ConsultantCommented:
Hello, try this example,


SELECT `duped_field1`, `duped_field2`, COUNT(*) `tot`
FROM `table`
GROUP BY `duped_field1`, `duped_field2`
HAVING `tot` > 1
0
 
practitionerAuthor Commented:
Thanks samirbhogayta,
I've already done it, examples given by  lwadwell and hforhirenpatel, I'm now looking for an answer if it is advisable to group by 60 fields just to check duplicate. I'm actually trying to figure out if I do not use all the fields and get the desired result.
0
 
lwadwellCommented:
You need to specify as many fields as what defines it being 'duplicate' ... if it is duplicate only on 60 fields ... then yes, you must specify all 60.

You stated that you will be doing this every 15 minutes has me puzzled ... seems like you should be concentrating on stopping the duplicates being entered in the first place.  What do you plan to do with the duplicated?
0
 
practitionerAuthor Commented:
Hi lwadwell,
1.
Okay, so the issue is with the payment gateway, it works only in redirect auth, so sometimes users close the browser after paying at the payment gateway, or they are not redirected to the merchant website for some reason thus the order is not logged in the website database.
2.
This happens at least with one order everyday, sometimes (not daily) one of such users places the order again thinking that something went wrong and he should.
3.
We query the payment gateway every 15 mts. to check if there is a successful order but not logged in website database, then it is automatically added (we store the details before submitting to the Payment Gateway) in the website.
4.
So to handle later part of point 2, I'll display a flag against such orders so that web admin can easily avoid multiple deliveries of the same order, unnecessary. The webadmin will call up the client/user and delete one of the orders, if really it is duplicate.

We get orders frequently, so I'm checking this through a CRON every 15 mts.

Everything has been implemented and working fine and as expected except this part I posted for, now I know the logic, it'll be fixed soon.

As far as 60 (+1 primary key) fields are concerned, now I don't need it, I've figured out that I can do it with 3 fields, the cust id, order_total and order date range withing last 3 days. That's it.

I thank you and everybody else who helped me.
0
 
lwadwellCommented:
Sorry for not replying earlier - I have been very sick the last few days.

a) If the orders that are recorded via point 3 above are the ones of concern and you want to minimise the database performance impacts ... I would consider logging these into another table (as well as the existing tables) that can be used to look for the duplicates based on.  Delete the older entries as they age past 3 days.  As you indicate the volumes of these is low, a join back to the main table matching on cust id, order total (etc) where the order id PK is not the same ... may be less intensive, especially if you have an index on cust id.

b) the group by/having approach can be made softer by only selecting records within the last three days.

c) a self join method could be an alternative,
SELECT *
FROM order_table o1
INNER JOIN order_table o2
ON o1.custId = o2.custId   -- same customer
    AND o1.orderId <> o2.orderId  -- different order
    AND o2.orderDate >= DATE_ADD(GETDATE(), INTERVAL -3 DAYS)
    AND ... etc -- other criteria
WHERE o1.orderDate >= DATE_ADD(GETDATE(), INTERVAL -3 DAYS)
0
 
tel2Commented:
Hi practitioner,

In my Q3 above, when I said "How long do the above solutions run for?" I meant "How long does it take to run each of the above solutions"?  You could just try them from the Linux command line.  I'm expecting the answer will be in seconds.  If it's low enough, then I suggest you pick one of them and keep things simple.
0
 
practitionerAuthor Commented:
My account was suspended, I'll come back with the detailed answer.
0
 
practitionerAuthor Commented:
Thanks everyone
0

Featured Post

Concerto's Cloud Advisory Services

Want to avoid the missteps to gaining all the benefits of the cloud? Learn more about the different assessment options from our Cloud Advisory team.

  • 8
  • 3
  • 3
  • +3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now