hotwaterfortea
asked on
Identifying duplicate entries based on differences
I just asked a question about duplicate invoices, and now I have another. I need to be able to identify duplicate invoices that have the same invoice number and are for the same amount but that have different vendor names. For example, in a table called Invoices:
INVOICE_NUMBER, AMOUNT, VENDOR_NAME
001, $100, Geico
001, $100, Geico Insurance
002, $100, Geico
I would want the query to return the first two but not the third, since it has a different invoice number and is probably the next month's bill instead of a duplicate. Any help would be appreciated.
INVOICE_NUMBER, AMOUNT, VENDOR_NAME
001, $100, Geico
001, $100, Geico Insurance
002, $100, Geico
I would want the query to return the first two but not the third, since it has a different invoice number and is probably the next month's bill instead of a duplicate. Any help would be appreciated.
To make sure the vendor names are actually different, try this:
SELECT invoice_number, amount, a.vendor_name AS vendor_a, b.vendor_name AS vendor_b
FROM table_name a, table_name b
WHERE a.invoice_number = b.invoice_number and a.amount = b.amount AND a.vendor_name <> b.vendor_name;
SELECT invoice_number, amount, a.vendor_name AS vendor_a, b.vendor_name AS vendor_b
FROM table_name a, table_name b
WHERE a.invoice_number = b.invoice_number and a.amount = b.amount AND a.vendor_name <> b.vendor_name;
I think this should work:
select * from invoices
where invoice_number in
(select invoice_number
from invoices
group by invoice_number, amount
having count(invoice_number, amount)>1
)
It would probably help if you posted the ID of the previous thread.
SELECT * FROM invoices a
INNER JOIN (
SELECT INVOICE_NUMBER, AMOUNT
FROM Invoices
GROUP BY INVOICE_NUMBER, AMOUNT
HAVING COUNT(DISTINCT VENDOR_NAME) > 1
) b ON b.INVOICE_NUMBER = a.INVOICE_NUMBER AND b.AMOUNT = a.AMOUNT
INNER JOIN (
SELECT INVOICE_NUMBER, AMOUNT
FROM Invoices
GROUP BY INVOICE_NUMBER, AMOUNT
HAVING COUNT(DISTINCT VENDOR_NAME) > 1
) b ON b.INVOICE_NUMBER = a.INVOICE_NUMBER AND b.AMOUNT = a.AMOUNT
ASKER
I actually can't get either of those to work.
TerryAtOpus, yours gives me an error message that the wrong number of arguments are used with the function "having count(invoice_number, amount)>1", but even if it didn't, I'm not sure that would solve my problem, since it would be possible to have entries with the same invoice number, amount, and vendor name, and I don't want the query to consider entries as potential duplicates unless there's more than one vendor name involved. I mean, if there's something like this:
001, $100, Geico
001, $100, Geico
001, $100, Geico Insurance
I would want to return all three of those. But if it's just this:
001, $100, Geico
001, $100, Geico
I'm not looking for that right now.
Miss_Sellaneus - Your second option sounds a little more like what I'm looking for, but I don't quite understand. I only have one table for this. Should that be a join?
GRayL - Technically, the only thing these threads have in common is that they're about duplicate invoices; I'm supposed to be finding alternative methods of identifying them, and the definition of duplicate changes depending on what angle I'm using. They aren't really related.
TerryAtOpus, yours gives me an error message that the wrong number of arguments are used with the function "having count(invoice_number, amount)>1", but even if it didn't, I'm not sure that would solve my problem, since it would be possible to have entries with the same invoice number, amount, and vendor name, and I don't want the query to consider entries as potential duplicates unless there's more than one vendor name involved. I mean, if there's something like this:
001, $100, Geico
001, $100, Geico
001, $100, Geico Insurance
I would want to return all three of those. But if it's just this:
001, $100, Geico
001, $100, Geico
I'm not looking for that right now.
Miss_Sellaneus - Your second option sounds a little more like what I'm looking for, but I don't quite understand. I only have one table for this. Should that be a join?
GRayL - Technically, the only thing these threads have in common is that they're about duplicate invoices; I'm supposed to be finding alternative methods of identifying them, and the definition of duplicate changes depending on what angle I'm using. They aren't really related.
hi hotwaterfortea:, have u try my suggestion?
hotwaterfortea,
That second option uses the same table twice, each with a different alias.. 'a' and 'b'.
That second option uses the same table twice, each with a different alias.. 'a' and 'b'.
ASKER
Yuching - At the moment, It says that there's a missing operator in "COUNT(DISTINCT VENDOR_NAME) > 1". I'm playing around with it a little.
Miss_Sellaneus - I see what you're saying, but I just noticed another issue. I'm not sure "a.vendor_name <> b.vendor_name" is going to work, since sometimes there are entries like this:
001, $100, Geico
001, $100, Geico
001, $100, Geico Insurance
In that case, I would want to return all three, since they're all related.
Miss_Sellaneus - I see what you're saying, but I just noticed another issue. I'm not sure "a.vendor_name <> b.vendor_name" is going to work, since sometimes there are entries like this:
001, $100, Geico
001, $100, Geico
001, $100, Geico Insurance
In that case, I would want to return all three, since they're all related.
hotwaterfortea,
Please try this. I tested it. I believe it gives you what you are after.
SELECT invoice_number, amount, vendor_name
FROM table1
WHERE invoice_number & cstr(amount) IN
(SELECT invoice_number & cstr(amount)
FROM table1 GROUP BY invoice_number, amount
HAVING COUNT(*) > 1);
Please try this. I tested it. I believe it gives you what you are after.
SELECT invoice_number, amount, vendor_name
FROM table1
WHERE invoice_number & cstr(amount) IN
(SELECT invoice_number & cstr(amount)
FROM table1 GROUP BY invoice_number, amount
HAVING COUNT(*) > 1);
ASKER
Miss_Sellaneus,
Well, yes and no. It works beautifully on the little dummy database I made to experiment with, but when I booted up my work computer for a minute and tried it on one of the store databases, it returned every row regardless of vendor name. I'll take it into work tomorrow and look at it there; if it works on the dummy, I'm sure I must have done something wrong somewhere.
Well, yes and no. It works beautifully on the little dummy database I made to experiment with, but when I booted up my work computer for a minute and tried it on one of the store databases, it returned every row regardless of vendor name. I'll take it into work tomorrow and look at it there; if it works on the dummy, I'm sure I must have done something wrong somewhere.
then try this, believe that access didnt support count distinct
SELECT * FROM invoices a
INNER JOIN (
SELECT INVOICE_NUMBER, AMOUNT
FROM Invoices
GROUP BY INVOICE_NUMBER, AMOUNT
HAVING COUNT(VENDOR_NAME) > 1
) b ON b.INVOICE_NUMBER = a.INVOICE_NUMBER AND b.AMOUNT = a.AMOUNT
SELECT * FROM invoices a
INNER JOIN (
SELECT INVOICE_NUMBER, AMOUNT
FROM Invoices
GROUP BY INVOICE_NUMBER, AMOUNT
HAVING COUNT(VENDOR_NAME) > 1
) b ON b.INVOICE_NUMBER = a.INVOICE_NUMBER AND b.AMOUNT = a.AMOUNT
> the same invoice number and are for
> the same amount but
> that have different vendor names.
You nearly answered your own question (the second query below).
/gustav
> the same amount but
> that have different vendor names.
You nearly answered your own question (the second query below).
/gustav
Access JET SQL style:
SELECT DISTINCT
Invoices_1.Invoice_Number,
Invoices_1.Amount,
Invoices_1.Vendor_Number,
Invoices_2.Vendor_Number
FROM
Invoices AS Invoices_1
INNER JOIN
Invoices AS Invoices_2 ON
(Invoices_1.Amount = Invoices_2.Amount)
AND
(Invoices_1.Invoice_Number = Invoices_2.Invoice_Number)
WHERE
Invoices_1.Vendor_Number <> [Invoices_2]![Vendor_Number]
AND
Invoices_2.Vendor_Number) > [Invoices_1]![Vendor_Number]
ORDER BY
Invoices_1.Invoice_Number,
Invoices_1.Amount;
Coult be done this way too. Traditional SQL style:
SELECT DISTINCT
Invoices_1.Invoice_Number,
Invoices_1.Amount,
Invoices_1.Vendor_Number,
Invoices_2.Vendor_Number
FROM
Invoices AS Invoices_1,
Invoices AS Invoices_2
WHERE
Invoices_1.Invoice_Number = [Invoices_2]![Invoice_Number]
AND
Invoices_1.Amount = [Invoices_2]![Amount]
AND
Invoices_1.Vendor_Number) <> [Invoices_2]![Vendor_Number]
AND
Invoices_2.Vendor_Number > [Invoices_1]![Vendor_Number]
ORDER BY
Invoices_1.Invoice_Number,
Invoices_1.Amount;
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Isn't this simply a case of duplicate invoice numbers and amounts - never mind whether the name is duplicated? In which case:
SELECT a.Invoice_Number, a.Amount, (SELECT True as Duplicate FROM Invoices AS b WHERE b.Invoice_Number = a.Invoice_Number, and b.Amount = a.Amount) as Duplicate FROM Invoices AS a ORDER BY a.Invoice_Number, a.Amount;
SELECT a.Invoice_Number, a.Amount, (SELECT True as Duplicate FROM Invoices AS b WHERE b.Invoice_Number = a.Invoice_Number, and b.Amount = a.Amount) as Duplicate FROM Invoices AS a ORDER BY a.Invoice_Number, a.Amount;
ASKER
The one that ended up working for me was cactus_data's (the traditional SQL-style one), although I had to modify it a little. It was initially returning at least one but not all of the vendors associated with an invoice number/amount - for example, it would return GEICO and GEICO but not GEICO Insurance. All I had to do to was turn
AND
Invoices_2.Vendor_Number > [Invoices_1]![Vendor_Numbe r]
into
AND
(Invoices_2.Vendor_Number > [Invoices_1]![Vendor_Numbe r] OR Invoices_2.Vendor_Number < [Invoices_1].[Vendor_Numbe r])
Thanks to everyone for all your help - believe me, you've made my life that much easier.
AND
Invoices_2.Vendor_Number > [Invoices_1]![Vendor_Numbe
into
AND
(Invoices_2.Vendor_Number > [Invoices_1]![Vendor_Numbe
Thanks to everyone for all your help - believe me, you've made my life that much easier.
You are welcome!
/gustav
/gustav
Any comment on my last post?
SELECT invoice_number, amount, COUNT(*) AS occurances FROM table_name HAVING occurances > 1;