Solved

Identifying duplicate entries based on differences

Posted on 2008-10-02
18
253 Views
Last Modified: 2012-05-05
I just asked a question about duplicate invoices, and now I have another.  I need to be able to identify duplicate invoices that have the same invoice number and are for the same amount but that have different vendor names.  For example, in a table called Invoices:

INVOICE_NUMBER, AMOUNT, VENDOR_NAME
001, $100, Geico
001, $100, Geico Insurance
002, $100, Geico

I would want the query to return the first two but not the third, since it has a different invoice number and is probably the next month's bill instead of a duplicate.  Any help would be appreciated.
0
Comment
Question by:hotwaterfortea
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
  • 3
  • +3
18 Comments
 
LVL 12

Expert Comment

by:Missus Miss_Sellaneus
ID: 22629987
This will find the duplicates. It will also find entries where the invoice #, amount and vendor_name are the same.

SELECT invoice_number, amount, COUNT(*) AS occurances FROM table_name HAVING occurances > 1;
0
 
LVL 12

Expert Comment

by:Missus Miss_Sellaneus
ID: 22630005
To make sure the vendor names are actually different, try this:

SELECT invoice_number, amount, a.vendor_name AS vendor_a, b.vendor_name AS vendor_b
FROM table_name a, table_name b
WHERE a.invoice_number = b.invoice_number and a.amount = b.amount AND a.vendor_name <> b.vendor_name;
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 22630006
I think this should work:
select * from invoices
where invoice_number in 
 (select invoice_number 
    from invoices 
    group by invoice_number, amount 
    having count(invoice_number, amount)>1
 )

Open in new window

0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 44

Expert Comment

by:GRayL
ID: 22630082
It would probably help if you posted the ID of the previous thread.
0
 
LVL 11

Expert Comment

by:yuching
ID: 22630242
SELECT * FROM invoices a
INNER JOIN (
   SELECT INVOICE_NUMBER, AMOUNT
   FROM Invoices
    GROUP BY INVOICE_NUMBER, AMOUNT
    HAVING COUNT(DISTINCT VENDOR_NAME) > 1
) b ON b.INVOICE_NUMBER = a.INVOICE_NUMBER AND b.AMOUNT = a.AMOUNT
0
 

Author Comment

by:hotwaterfortea
ID: 22630293
I actually can't get either of those to work.  

TerryAtOpus, yours gives me an error message that the wrong number of arguments are used with the function "having count(invoice_number, amount)>1", but even if it didn't, I'm not sure that would solve my problem, since it would be possible to have entries with the same invoice number, amount, and vendor name, and I don't want the query to consider entries as potential duplicates unless there's more than one vendor name involved.  I mean, if there's something like this:

001, $100, Geico
001, $100, Geico
001, $100, Geico Insurance

I would want to return all three of those.  But if it's just this:

001, $100, Geico
001, $100, Geico

I'm not looking for that right now.

Miss_Sellaneus - Your second option sounds a little more like what I'm looking for, but I don't quite understand.  I only have one table for this.  Should that be a join?

GRayL - Technically, the only thing these threads have in common is that they're about duplicate invoices; I'm supposed to be finding alternative methods of identifying them, and the definition of duplicate changes depending on what angle I'm using.  They aren't really related.
0
 
LVL 11

Expert Comment

by:yuching
ID: 22630302
hi hotwaterfortea:, have u try my suggestion?
0
 
LVL 12

Expert Comment

by:Missus Miss_Sellaneus
ID: 22630311
hotwaterfortea,
That second option uses the same table twice, each with a different alias.. 'a' and 'b'.
 
0
 

Author Comment

by:hotwaterfortea
ID: 22630344
Yuching - At the moment, It says that there's a missing operator in "COUNT(DISTINCT VENDOR_NAME) > 1".  I'm playing around with it a little.

Miss_Sellaneus - I see what you're saying, but I just noticed another issue.  I'm not sure "a.vendor_name <> b.vendor_name" is going to work, since sometimes there are entries like this:

001, $100, Geico
001, $100, Geico
001, $100, Geico Insurance

In that case, I would want to return all three, since they're all related.
0
 
LVL 12

Expert Comment

by:Missus Miss_Sellaneus
ID: 22630492
hotwaterfortea,
Please try this. I tested it. I believe it gives you what you are after.

SELECT invoice_number, amount, vendor_name
FROM table1
WHERE invoice_number & cstr(amount) IN
(SELECT invoice_number & cstr(amount)
FROM table1 GROUP BY invoice_number, amount
HAVING COUNT(*) > 1);

0
 

Author Comment

by:hotwaterfortea
ID: 22630630
Miss_Sellaneus,

Well, yes and no.  It works beautifully on the little dummy database I made to experiment with, but when I booted up my work computer for a minute and tried it on one of the store databases, it returned every row regardless of vendor name.  I'll take it into work tomorrow and look at it there; if it works on the dummy, I'm sure I must have done something wrong somewhere.
0
 
LVL 11

Expert Comment

by:yuching
ID: 22631002
then try this, believe that access didnt support count distinct

SELECT * FROM invoices a
INNER JOIN (
   SELECT INVOICE_NUMBER, AMOUNT
   FROM Invoices
    GROUP BY INVOICE_NUMBER, AMOUNT
    HAVING COUNT(VENDOR_NAME) > 1
) b ON b.INVOICE_NUMBER = a.INVOICE_NUMBER AND b.AMOUNT = a.AMOUNT
0
 
LVL 50

Expert Comment

by:Gustav Brock
ID: 22631475
> the same invoice number and are for
> the same amount but
> that have different vendor names.

You nearly answered your own question (the second query below).

/gustav
Access JET SQL style:
 
SELECT DISTINCT 
  Invoices_1.Invoice_Number, 
  Invoices_1.Amount, 
  Invoices_1.Vendor_Number, 
  Invoices_2.Vendor_Number
FROM 
  Invoices AS Invoices_1 
INNER JOIN 
  Invoices AS Invoices_2 ON 
    (Invoices_1.Amount = Invoices_2.Amount) 
    AND 
    (Invoices_1.Invoice_Number = Invoices_2.Invoice_Number)
WHERE 
  Invoices_1.Vendor_Number <> [Invoices_2]![Vendor_Number] 
  AND
  Invoices_2.Vendor_Number) > [Invoices_1]![Vendor_Number]
ORDER BY 
  Invoices_1.Invoice_Number, 
  Invoices_1.Amount;
 
Coult be done this way too. Traditional SQL style:
 
SELECT DISTINCT 
  Invoices_1.Invoice_Number, 
  Invoices_1.Amount, 
  Invoices_1.Vendor_Number, 
  Invoices_2.Vendor_Number
FROM 
  Invoices AS Invoices_1, 
  Invoices AS Invoices_2
WHERE 
  Invoices_1.Invoice_Number = [Invoices_2]![Invoice_Number] 
  AND 
  Invoices_1.Amount = [Invoices_2]![Amount]
  AND 
  Invoices_1.Vendor_Number) <> [Invoices_2]![Vendor_Number] 
  AND 
  Invoices_2.Vendor_Number > [Invoices_1]![Vendor_Number]
ORDER BY 
  Invoices_1.Invoice_Number, 
  Invoices_1.Amount;

Open in new window

0
 
LVL 50

Accepted Solution

by:
Gustav Brock earned 500 total points
ID: 22631486
Oops, remove two trailing parenthesis ...
Access JET SQL style:
 
SELECT DISTINCT 
  Invoices_1.Invoice_Number, 
  Invoices_1.Amount, 
  Invoices_1.Vendor_Number, 
  Invoices_2.Vendor_Number
FROM 
  Invoices AS Invoices_1 
INNER JOIN 
  Invoices AS Invoices_2 ON 
    (Invoices_1.Amount = Invoices_2.Amount) 
    AND 
    (Invoices_1.Invoice_Number = Invoices_2.Invoice_Number)
WHERE 
  Invoices_1.Vendor_Number <> [Invoices_2]![Vendor_Number] 
  AND
  Invoices_2.Vendor_Number > [Invoices_1]![Vendor_Number]
ORDER BY 
  Invoices_1.Invoice_Number, 
  Invoices_1.Amount;
 
Coult be done this way too. Traditional SQL style:
 
SELECT DISTINCT 
  Invoices_1.Invoice_Number, 
  Invoices_1.Amount, 
  Invoices_1.Vendor_Number, 
  Invoices_2.Vendor_Number
FROM 
  Invoices AS Invoices_1, 
  Invoices AS Invoices_2
WHERE 
  Invoices_1.Invoice_Number = [Invoices_2]![Invoice_Number] 
  AND 
  Invoices_1.Amount = [Invoices_2]![Amount]
  AND 
  Invoices_1.Vendor_Number <> [Invoices_2]![Vendor_Number] 
  AND 
  Invoices_2.Vendor_Number > [Invoices_1]![Vendor_Number]
ORDER BY 
  Invoices_1.Invoice_Number, 
  Invoices_1.Amount;

Open in new window

0
 
LVL 44

Expert Comment

by:GRayL
ID: 22635918
Isn't this simply a case of duplicate invoice numbers and amounts - never mind whether the name is duplicated?  In which case:

SELECT a.Invoice_Number, a.Amount, (SELECT True as Duplicate FROM Invoices AS b WHERE b.Invoice_Number = a.Invoice_Number, and b.Amount = a.Amount) as Duplicate FROM Invoices AS a ORDER BY a.Invoice_Number, a.Amount;
0
 

Author Comment

by:hotwaterfortea
ID: 22638724
The one that ended up working for me was cactus_data's (the traditional SQL-style one), although I had to modify it a little.  It was initially returning at least one but not all of the vendors associated with an invoice number/amount - for example, it would return GEICO and GEICO but not GEICO Insurance.  All I had to do to was turn

 AND
  Invoices_2.Vendor_Number > [Invoices_1]![Vendor_Number]

into

AND
(Invoices_2.Vendor_Number > [Invoices_1]![Vendor_Number] OR Invoices_2.Vendor_Number < [Invoices_1].[Vendor_Number])

Thanks to everyone for all your help - believe me, you've made my life that much easier.
0
 
LVL 50

Expert Comment

by:Gustav Brock
ID: 22639858
You are welcome!

/gustav
0
 
LVL 44

Expert Comment

by:GRayL
ID: 22644459
Any comment on my last post?
0

Featured Post

Enterprise Mobility and BYOD For Dummies

Like “For Dummies” books, you can read this in whatever order you choose and learn about mobility and BYOD; and how to put a competitive mobile infrastructure in place. Developed for SMBs and large enterprises alike, you will find helpful use cases, planning, and implementation.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article helps those who get the 0xc004d307 error when trying to rearm (reset the license) Office 2013 in a Virtual Desktop Infrastructure (VDI) and/or those trying to prep the master image for Microsoft Key Management (KMS) activation. (i.e.- C…
Outlook for dependable use in a very small business   This article is about using the Outlook application (part of Microsoft Office) in a very small business, or for homeowners where dependability and reliability are critical requirements. This …
The viewer will learn how to create two correlated normally distributed random variables in Excel, use a normal distribution to simulate the return on different levels of investment in each of the two funds over a period of ten years, and, create a …
Learn how to create and modify your own paragraph styles in Microsoft Word. This can be helpful when wanting to make consistently referenced styles throughout a document or template.

695 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question