Link to home
Start Free TrialLog in
Avatar of hotwaterfortea
hotwaterfortea

asked on

Identifying duplicate entries based on differences

I just asked a question about duplicate invoices, and now I have another.  I need to be able to identify duplicate invoices that have the same invoice number and are for the same amount but that have different vendor names.  For example, in a table called Invoices:

INVOICE_NUMBER, AMOUNT, VENDOR_NAME
001, $100, Geico
001, $100, Geico Insurance
002, $100, Geico

I would want the query to return the first two but not the third, since it has a different invoice number and is probably the next month's bill instead of a duplicate.  Any help would be appreciated.
Avatar of Missus Miss_Sellaneus
Missus Miss_Sellaneus
Flag of United States of America image

This will find the duplicates. It will also find entries where the invoice #, amount and vendor_name are the same.

SELECT invoice_number, amount, COUNT(*) AS occurances FROM table_name HAVING occurances > 1;
To make sure the vendor names are actually different, try this:

SELECT invoice_number, amount, a.vendor_name AS vendor_a, b.vendor_name AS vendor_b
FROM table_name a, table_name b
WHERE a.invoice_number = b.invoice_number and a.amount = b.amount AND a.vendor_name <> b.vendor_name;
I think this should work:
select * from invoices
where invoice_number in 
 (select invoice_number 
    from invoices 
    group by invoice_number, amount 
    having count(invoice_number, amount)>1
 )

Open in new window

It would probably help if you posted the ID of the previous thread.
Avatar of yuching
yuching

SELECT * FROM invoices a
INNER JOIN (
   SELECT INVOICE_NUMBER, AMOUNT
   FROM Invoices
    GROUP BY INVOICE_NUMBER, AMOUNT
    HAVING COUNT(DISTINCT VENDOR_NAME) > 1
) b ON b.INVOICE_NUMBER = a.INVOICE_NUMBER AND b.AMOUNT = a.AMOUNT
Avatar of hotwaterfortea

ASKER

I actually can't get either of those to work.  

TerryAtOpus, yours gives me an error message that the wrong number of arguments are used with the function "having count(invoice_number, amount)>1", but even if it didn't, I'm not sure that would solve my problem, since it would be possible to have entries with the same invoice number, amount, and vendor name, and I don't want the query to consider entries as potential duplicates unless there's more than one vendor name involved.  I mean, if there's something like this:

001, $100, Geico
001, $100, Geico
001, $100, Geico Insurance

I would want to return all three of those.  But if it's just this:

001, $100, Geico
001, $100, Geico

I'm not looking for that right now.

Miss_Sellaneus - Your second option sounds a little more like what I'm looking for, but I don't quite understand.  I only have one table for this.  Should that be a join?

GRayL - Technically, the only thing these threads have in common is that they're about duplicate invoices; I'm supposed to be finding alternative methods of identifying them, and the definition of duplicate changes depending on what angle I'm using.  They aren't really related.
hi hotwaterfortea:, have u try my suggestion?
hotwaterfortea,
That second option uses the same table twice, each with a different alias.. 'a' and 'b'.
 
Yuching - At the moment, It says that there's a missing operator in "COUNT(DISTINCT VENDOR_NAME) > 1".  I'm playing around with it a little.

Miss_Sellaneus - I see what you're saying, but I just noticed another issue.  I'm not sure "a.vendor_name <> b.vendor_name" is going to work, since sometimes there are entries like this:

001, $100, Geico
001, $100, Geico
001, $100, Geico Insurance

In that case, I would want to return all three, since they're all related.
hotwaterfortea,
Please try this. I tested it. I believe it gives you what you are after.

SELECT invoice_number, amount, vendor_name
FROM table1
WHERE invoice_number & cstr(amount) IN
(SELECT invoice_number & cstr(amount)
FROM table1 GROUP BY invoice_number, amount
HAVING COUNT(*) > 1);

Miss_Sellaneus,

Well, yes and no.  It works beautifully on the little dummy database I made to experiment with, but when I booted up my work computer for a minute and tried it on one of the store databases, it returned every row regardless of vendor name.  I'll take it into work tomorrow and look at it there; if it works on the dummy, I'm sure I must have done something wrong somewhere.
then try this, believe that access didnt support count distinct

SELECT * FROM invoices a
INNER JOIN (
   SELECT INVOICE_NUMBER, AMOUNT
   FROM Invoices
    GROUP BY INVOICE_NUMBER, AMOUNT
    HAVING COUNT(VENDOR_NAME) > 1
) b ON b.INVOICE_NUMBER = a.INVOICE_NUMBER AND b.AMOUNT = a.AMOUNT
> the same invoice number and are for
> the same amount but
> that have different vendor names.

You nearly answered your own question (the second query below).

/gustav
Access JET SQL style:
 
SELECT DISTINCT 
  Invoices_1.Invoice_Number, 
  Invoices_1.Amount, 
  Invoices_1.Vendor_Number, 
  Invoices_2.Vendor_Number
FROM 
  Invoices AS Invoices_1 
INNER JOIN 
  Invoices AS Invoices_2 ON 
    (Invoices_1.Amount = Invoices_2.Amount) 
    AND 
    (Invoices_1.Invoice_Number = Invoices_2.Invoice_Number)
WHERE 
  Invoices_1.Vendor_Number <> [Invoices_2]![Vendor_Number] 
  AND
  Invoices_2.Vendor_Number) > [Invoices_1]![Vendor_Number]
ORDER BY 
  Invoices_1.Invoice_Number, 
  Invoices_1.Amount;
 
Coult be done this way too. Traditional SQL style:
 
SELECT DISTINCT 
  Invoices_1.Invoice_Number, 
  Invoices_1.Amount, 
  Invoices_1.Vendor_Number, 
  Invoices_2.Vendor_Number
FROM 
  Invoices AS Invoices_1, 
  Invoices AS Invoices_2
WHERE 
  Invoices_1.Invoice_Number = [Invoices_2]![Invoice_Number] 
  AND 
  Invoices_1.Amount = [Invoices_2]![Amount]
  AND 
  Invoices_1.Vendor_Number) <> [Invoices_2]![Vendor_Number] 
  AND 
  Invoices_2.Vendor_Number > [Invoices_1]![Vendor_Number]
ORDER BY 
  Invoices_1.Invoice_Number, 
  Invoices_1.Amount;

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of Gustav Brock
Gustav Brock
Flag of Denmark image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Isn't this simply a case of duplicate invoice numbers and amounts - never mind whether the name is duplicated?  In which case:

SELECT a.Invoice_Number, a.Amount, (SELECT True as Duplicate FROM Invoices AS b WHERE b.Invoice_Number = a.Invoice_Number, and b.Amount = a.Amount) as Duplicate FROM Invoices AS a ORDER BY a.Invoice_Number, a.Amount;
The one that ended up working for me was cactus_data's (the traditional SQL-style one), although I had to modify it a little.  It was initially returning at least one but not all of the vendors associated with an invoice number/amount - for example, it would return GEICO and GEICO but not GEICO Insurance.  All I had to do to was turn

 AND
  Invoices_2.Vendor_Number > [Invoices_1]![Vendor_Number]

into

AND
(Invoices_2.Vendor_Number > [Invoices_1]![Vendor_Number] OR Invoices_2.Vendor_Number < [Invoices_1].[Vendor_Number])

Thanks to everyone for all your help - believe me, you've made my life that much easier.
You are welcome!

/gustav
Any comment on my last post?