• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 269
  • Last Modified:

Identifying duplicate entries based on differences

I just asked a question about duplicate invoices, and now I have another.  I need to be able to identify duplicate invoices that have the same invoice number and are for the same amount but that have different vendor names.  For example, in a table called Invoices:

INVOICE_NUMBER, AMOUNT, VENDOR_NAME
001, $100, Geico
001, $100, Geico Insurance
002, $100, Geico

I would want the query to return the first two but not the third, since it has a different invoice number and is probably the next month's bill instead of a duplicate.  Any help would be appreciated.
0
hotwaterfortea
Asked:
hotwaterfortea
  • 4
  • 4
  • 3
  • +3
1 Solution
 
Missus Miss_SellaneusCommented:
This will find the duplicates. It will also find entries where the invoice #, amount and vendor_name are the same.

SELECT invoice_number, amount, COUNT(*) AS occurances FROM table_name HAVING occurances > 1;
0
 
Missus Miss_SellaneusCommented:
To make sure the vendor names are actually different, try this:

SELECT invoice_number, amount, a.vendor_name AS vendor_a, b.vendor_name AS vendor_b
FROM table_name a, table_name b
WHERE a.invoice_number = b.invoice_number and a.amount = b.amount AND a.vendor_name <> b.vendor_name;
0
 
Terry WoodsIT GuruCommented:
I think this should work:
select * from invoices
where invoice_number in 
 (select invoice_number 
    from invoices 
    group by invoice_number, amount 
    having count(invoice_number, amount)>1
 )

Open in new window

0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
GRayLCommented:
It would probably help if you posted the ID of the previous thread.
0
 
yuchingCommented:
SELECT * FROM invoices a
INNER JOIN (
   SELECT INVOICE_NUMBER, AMOUNT
   FROM Invoices
    GROUP BY INVOICE_NUMBER, AMOUNT
    HAVING COUNT(DISTINCT VENDOR_NAME) > 1
) b ON b.INVOICE_NUMBER = a.INVOICE_NUMBER AND b.AMOUNT = a.AMOUNT
0
 
hotwaterforteaAuthor Commented:
I actually can't get either of those to work.  

TerryAtOpus, yours gives me an error message that the wrong number of arguments are used with the function "having count(invoice_number, amount)>1", but even if it didn't, I'm not sure that would solve my problem, since it would be possible to have entries with the same invoice number, amount, and vendor name, and I don't want the query to consider entries as potential duplicates unless there's more than one vendor name involved.  I mean, if there's something like this:

001, $100, Geico
001, $100, Geico
001, $100, Geico Insurance

I would want to return all three of those.  But if it's just this:

001, $100, Geico
001, $100, Geico

I'm not looking for that right now.

Miss_Sellaneus - Your second option sounds a little more like what I'm looking for, but I don't quite understand.  I only have one table for this.  Should that be a join?

GRayL - Technically, the only thing these threads have in common is that they're about duplicate invoices; I'm supposed to be finding alternative methods of identifying them, and the definition of duplicate changes depending on what angle I'm using.  They aren't really related.
0
 
yuchingCommented:
hi hotwaterfortea:, have u try my suggestion?
0
 
Missus Miss_SellaneusCommented:
hotwaterfortea,
That second option uses the same table twice, each with a different alias.. 'a' and 'b'.
 
0
 
hotwaterforteaAuthor Commented:
Yuching - At the moment, It says that there's a missing operator in "COUNT(DISTINCT VENDOR_NAME) > 1".  I'm playing around with it a little.

Miss_Sellaneus - I see what you're saying, but I just noticed another issue.  I'm not sure "a.vendor_name <> b.vendor_name" is going to work, since sometimes there are entries like this:

001, $100, Geico
001, $100, Geico
001, $100, Geico Insurance

In that case, I would want to return all three, since they're all related.
0
 
Missus Miss_SellaneusCommented:
hotwaterfortea,
Please try this. I tested it. I believe it gives you what you are after.

SELECT invoice_number, amount, vendor_name
FROM table1
WHERE invoice_number & cstr(amount) IN
(SELECT invoice_number & cstr(amount)
FROM table1 GROUP BY invoice_number, amount
HAVING COUNT(*) > 1);

0
 
hotwaterforteaAuthor Commented:
Miss_Sellaneus,

Well, yes and no.  It works beautifully on the little dummy database I made to experiment with, but when I booted up my work computer for a minute and tried it on one of the store databases, it returned every row regardless of vendor name.  I'll take it into work tomorrow and look at it there; if it works on the dummy, I'm sure I must have done something wrong somewhere.
0
 
yuchingCommented:
then try this, believe that access didnt support count distinct

SELECT * FROM invoices a
INNER JOIN (
   SELECT INVOICE_NUMBER, AMOUNT
   FROM Invoices
    GROUP BY INVOICE_NUMBER, AMOUNT
    HAVING COUNT(VENDOR_NAME) > 1
) b ON b.INVOICE_NUMBER = a.INVOICE_NUMBER AND b.AMOUNT = a.AMOUNT
0
 
Gustav BrockCIOCommented:
> the same invoice number and are for
> the same amount but
> that have different vendor names.

You nearly answered your own question (the second query below).

/gustav
Access JET SQL style:
 
SELECT DISTINCT 
  Invoices_1.Invoice_Number, 
  Invoices_1.Amount, 
  Invoices_1.Vendor_Number, 
  Invoices_2.Vendor_Number
FROM 
  Invoices AS Invoices_1 
INNER JOIN 
  Invoices AS Invoices_2 ON 
    (Invoices_1.Amount = Invoices_2.Amount) 
    AND 
    (Invoices_1.Invoice_Number = Invoices_2.Invoice_Number)
WHERE 
  Invoices_1.Vendor_Number <> [Invoices_2]![Vendor_Number] 
  AND
  Invoices_2.Vendor_Number) > [Invoices_1]![Vendor_Number]
ORDER BY 
  Invoices_1.Invoice_Number, 
  Invoices_1.Amount;
 
Coult be done this way too. Traditional SQL style:
 
SELECT DISTINCT 
  Invoices_1.Invoice_Number, 
  Invoices_1.Amount, 
  Invoices_1.Vendor_Number, 
  Invoices_2.Vendor_Number
FROM 
  Invoices AS Invoices_1, 
  Invoices AS Invoices_2
WHERE 
  Invoices_1.Invoice_Number = [Invoices_2]![Invoice_Number] 
  AND 
  Invoices_1.Amount = [Invoices_2]![Amount]
  AND 
  Invoices_1.Vendor_Number) <> [Invoices_2]![Vendor_Number] 
  AND 
  Invoices_2.Vendor_Number > [Invoices_1]![Vendor_Number]
ORDER BY 
  Invoices_1.Invoice_Number, 
  Invoices_1.Amount;

Open in new window

0
 
Gustav BrockCIOCommented:
Oops, remove two trailing parenthesis ...
Access JET SQL style:
 
SELECT DISTINCT 
  Invoices_1.Invoice_Number, 
  Invoices_1.Amount, 
  Invoices_1.Vendor_Number, 
  Invoices_2.Vendor_Number
FROM 
  Invoices AS Invoices_1 
INNER JOIN 
  Invoices AS Invoices_2 ON 
    (Invoices_1.Amount = Invoices_2.Amount) 
    AND 
    (Invoices_1.Invoice_Number = Invoices_2.Invoice_Number)
WHERE 
  Invoices_1.Vendor_Number <> [Invoices_2]![Vendor_Number] 
  AND
  Invoices_2.Vendor_Number > [Invoices_1]![Vendor_Number]
ORDER BY 
  Invoices_1.Invoice_Number, 
  Invoices_1.Amount;
 
Coult be done this way too. Traditional SQL style:
 
SELECT DISTINCT 
  Invoices_1.Invoice_Number, 
  Invoices_1.Amount, 
  Invoices_1.Vendor_Number, 
  Invoices_2.Vendor_Number
FROM 
  Invoices AS Invoices_1, 
  Invoices AS Invoices_2
WHERE 
  Invoices_1.Invoice_Number = [Invoices_2]![Invoice_Number] 
  AND 
  Invoices_1.Amount = [Invoices_2]![Amount]
  AND 
  Invoices_1.Vendor_Number <> [Invoices_2]![Vendor_Number] 
  AND 
  Invoices_2.Vendor_Number > [Invoices_1]![Vendor_Number]
ORDER BY 
  Invoices_1.Invoice_Number, 
  Invoices_1.Amount;

Open in new window

0
 
GRayLCommented:
Isn't this simply a case of duplicate invoice numbers and amounts - never mind whether the name is duplicated?  In which case:

SELECT a.Invoice_Number, a.Amount, (SELECT True as Duplicate FROM Invoices AS b WHERE b.Invoice_Number = a.Invoice_Number, and b.Amount = a.Amount) as Duplicate FROM Invoices AS a ORDER BY a.Invoice_Number, a.Amount;
0
 
hotwaterforteaAuthor Commented:
The one that ended up working for me was cactus_data's (the traditional SQL-style one), although I had to modify it a little.  It was initially returning at least one but not all of the vendors associated with an invoice number/amount - for example, it would return GEICO and GEICO but not GEICO Insurance.  All I had to do to was turn

 AND
  Invoices_2.Vendor_Number > [Invoices_1]![Vendor_Number]

into

AND
(Invoices_2.Vendor_Number > [Invoices_1]![Vendor_Number] OR Invoices_2.Vendor_Number < [Invoices_1].[Vendor_Number])

Thanks to everyone for all your help - believe me, you've made my life that much easier.
0
 
Gustav BrockCIOCommented:
You are welcome!

/gustav
0
 
GRayLCommented:
Any comment on my last post?
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

  • 4
  • 4
  • 3
  • +3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now