Link to home
Start Free TrialLog in
Avatar of practitioner
practitionerFlag for Singapore

asked on

SELECT only duplicate records

Hello,

How do I select the records where a user has submitted exactly same values multiple times, primary key field is an auto.

A hypothetical example would be the same user bought the same item and same quantity.

id, email-id,                item_code, item_qty
1,  abc@mymail.com, 9,                1
.., .., .., ..
.., .., .., ..
.., .., .., ..
50,  abc@mymail.com, 9,               1

Thanks
Avatar of Ivo Stoykov
Ivo Stoykov
Flag of Bulgaria image

The easiest way to avoid duplicates (if I've understood correct) is to set an unique key on each column you'd like to avoid this.

As to select duplicates use HAVING clause - i.e.

select duplicated_column_name, count(duplicated_column_name) cnt
from your_table
group by duplicated_column_name
having count (duplicated_column_name) > 1;

Open in new window


HTH

Ivo Stoykov
Avatar of practitioner

ASKER

Thanks Ivo,

probably I couldn't state properly, I do not want this, I want to list the duplicate records where values in all the fields but primary key are identical.
SOLUTION
Avatar of Lee Wadwell
Lee Wadwell
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks lwadwell and hforhirenpatel,

what if I have 61 columns in the table? I believe this will be a hit on the performance? Do you suggest some other method to get the duplicate records?
Q1. How many rows are in your table, practitioner?
Q2. How often are you going to want to run this check?
Q3. How long do the above solutions run for?  Have you tried them yet?
Avatar of hforhirenpatel
hforhirenpatel

Hey practitioner,

If you donot have many rows and you don't need to run this often you can go for group by.
Hi hforhirenpatel, tel2

As of now, I've 35000+ rows but I'll be limiting this by specifying the WHERE clause, records of a given email id and only for last 3 days. So mostly the returned value would be 0, I mean none. In some cases it would be 2 or 3 records at the most.

Thanks
So what were the answers to my Q2 & Q3?
If you do not have idea how many columns you;re going to check for duplicates then you must create your query dynamically including all desired columns into the having clause

HTH

Ivo Stoykov
@tel2,
Sorry I missed the last two questions.
We are checking for duplicate orders so
2. I'll be running it through CRON every 15 mts.
3. Forever, I've not tried these solutions yet.

@Ivo, thanks
Hello, try this example,


SELECT `duped_field1`, `duped_field2`, COUNT(*) `tot`
FROM `table`
GROUP BY `duped_field1`, `duped_field2`
HAVING `tot` > 1
Thanks samirbhogayta,
I've already done it, examples given by  lwadwell and hforhirenpatel, I'm now looking for an answer if it is advisable to group by 60 fields just to check duplicate. I'm actually trying to figure out if I do not use all the fields and get the desired result.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Hi lwadwell,
1.
Okay, so the issue is with the payment gateway, it works only in redirect auth, so sometimes users close the browser after paying at the payment gateway, or they are not redirected to the merchant website for some reason thus the order is not logged in the website database.
2.
This happens at least with one order everyday, sometimes (not daily) one of such users places the order again thinking that something went wrong and he should.
3.
We query the payment gateway every 15 mts. to check if there is a successful order but not logged in website database, then it is automatically added (we store the details before submitting to the Payment Gateway) in the website.
4.
So to handle later part of point 2, I'll display a flag against such orders so that web admin can easily avoid multiple deliveries of the same order, unnecessary. The webadmin will call up the client/user and delete one of the orders, if really it is duplicate.

We get orders frequently, so I'm checking this through a CRON every 15 mts.

Everything has been implemented and working fine and as expected except this part I posted for, now I know the logic, it'll be fixed soon.

As far as 60 (+1 primary key) fields are concerned, now I don't need it, I've figured out that I can do it with 3 fields, the cust id, order_total and order date range withing last 3 days. That's it.

I thank you and everybody else who helped me.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
My account was suspended, I'll come back with the detailed answer.
Thanks everyone