Solved

Identifying duplicate entries based on differences

Posted on 2008-10-02
18
221 Views
Last Modified: 2012-05-05
I just asked a question about duplicate invoices, and now I have another.  I need to be able to identify duplicate invoices that have the same invoice number and are for the same amount but that have different vendor names.  For example, in a table called Invoices:

INVOICE_NUMBER, AMOUNT, VENDOR_NAME
001, $100, Geico
001, $100, Geico Insurance
002, $100, Geico

I would want the query to return the first two but not the third, since it has a different invoice number and is probably the next month's bill instead of a duplicate.  Any help would be appreciated.
0
Comment
Question by:hotwaterfortea
  • 4
  • 4
  • 3
  • +3
18 Comments
 
LVL 11

Expert Comment

by:Missus Miss_Sellaneus
ID: 22629987
This will find the duplicates. It will also find entries where the invoice #, amount and vendor_name are the same.

SELECT invoice_number, amount, COUNT(*) AS occurances FROM table_name HAVING occurances > 1;
0
 
LVL 11

Expert Comment

by:Missus Miss_Sellaneus
ID: 22630005
To make sure the vendor names are actually different, try this:

SELECT invoice_number, amount, a.vendor_name AS vendor_a, b.vendor_name AS vendor_b
FROM table_name a, table_name b
WHERE a.invoice_number = b.invoice_number and a.amount = b.amount AND a.vendor_name <> b.vendor_name;
0
 
LVL 35

Expert Comment

by:Terry Woods
ID: 22630006
I think this should work:
select * from invoices

where invoice_number in 

 (select invoice_number 

    from invoices 

    group by invoice_number, amount 

    having count(invoice_number, amount)>1

 )

Open in new window

0
 
LVL 44

Expert Comment

by:GRayL
ID: 22630082
It would probably help if you posted the ID of the previous thread.
0
 
LVL 11

Expert Comment

by:yuching
ID: 22630242
SELECT * FROM invoices a
INNER JOIN (
   SELECT INVOICE_NUMBER, AMOUNT
   FROM Invoices
    GROUP BY INVOICE_NUMBER, AMOUNT
    HAVING COUNT(DISTINCT VENDOR_NAME) > 1
) b ON b.INVOICE_NUMBER = a.INVOICE_NUMBER AND b.AMOUNT = a.AMOUNT
0
 

Author Comment

by:hotwaterfortea
ID: 22630293
I actually can't get either of those to work.  

TerryAtOpus, yours gives me an error message that the wrong number of arguments are used with the function "having count(invoice_number, amount)>1", but even if it didn't, I'm not sure that would solve my problem, since it would be possible to have entries with the same invoice number, amount, and vendor name, and I don't want the query to consider entries as potential duplicates unless there's more than one vendor name involved.  I mean, if there's something like this:

001, $100, Geico
001, $100, Geico
001, $100, Geico Insurance

I would want to return all three of those.  But if it's just this:

001, $100, Geico
001, $100, Geico

I'm not looking for that right now.

Miss_Sellaneus - Your second option sounds a little more like what I'm looking for, but I don't quite understand.  I only have one table for this.  Should that be a join?

GRayL - Technically, the only thing these threads have in common is that they're about duplicate invoices; I'm supposed to be finding alternative methods of identifying them, and the definition of duplicate changes depending on what angle I'm using.  They aren't really related.
0
 
LVL 11

Expert Comment

by:yuching
ID: 22630302
hi hotwaterfortea:, have u try my suggestion?
0
 
LVL 11

Expert Comment

by:Missus Miss_Sellaneus
ID: 22630311
hotwaterfortea,
That second option uses the same table twice, each with a different alias.. 'a' and 'b'.
 
0
 

Author Comment

by:hotwaterfortea
ID: 22630344
Yuching - At the moment, It says that there's a missing operator in "COUNT(DISTINCT VENDOR_NAME) > 1".  I'm playing around with it a little.

Miss_Sellaneus - I see what you're saying, but I just noticed another issue.  I'm not sure "a.vendor_name <> b.vendor_name" is going to work, since sometimes there are entries like this:

001, $100, Geico
001, $100, Geico
001, $100, Geico Insurance

In that case, I would want to return all three, since they're all related.
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 
LVL 11

Expert Comment

by:Missus Miss_Sellaneus
ID: 22630492
hotwaterfortea,
Please try this. I tested it. I believe it gives you what you are after.

SELECT invoice_number, amount, vendor_name
FROM table1
WHERE invoice_number & cstr(amount) IN
(SELECT invoice_number & cstr(amount)
FROM table1 GROUP BY invoice_number, amount
HAVING COUNT(*) > 1);

0
 

Author Comment

by:hotwaterfortea
ID: 22630630
Miss_Sellaneus,

Well, yes and no.  It works beautifully on the little dummy database I made to experiment with, but when I booted up my work computer for a minute and tried it on one of the store databases, it returned every row regardless of vendor name.  I'll take it into work tomorrow and look at it there; if it works on the dummy, I'm sure I must have done something wrong somewhere.
0
 
LVL 11

Expert Comment

by:yuching
ID: 22631002
then try this, believe that access didnt support count distinct

SELECT * FROM invoices a
INNER JOIN (
   SELECT INVOICE_NUMBER, AMOUNT
   FROM Invoices
    GROUP BY INVOICE_NUMBER, AMOUNT
    HAVING COUNT(VENDOR_NAME) > 1
) b ON b.INVOICE_NUMBER = a.INVOICE_NUMBER AND b.AMOUNT = a.AMOUNT
0
 
LVL 49

Expert Comment

by:Gustav Brock
ID: 22631475
> the same invoice number and are for
> the same amount but
> that have different vendor names.

You nearly answered your own question (the second query below).

/gustav
Access JET SQL style:
 

SELECT DISTINCT 

  Invoices_1.Invoice_Number, 

  Invoices_1.Amount, 

  Invoices_1.Vendor_Number, 

  Invoices_2.Vendor_Number

FROM 

  Invoices AS Invoices_1 

INNER JOIN 

  Invoices AS Invoices_2 ON 

    (Invoices_1.Amount = Invoices_2.Amount) 

    AND 

    (Invoices_1.Invoice_Number = Invoices_2.Invoice_Number)

WHERE 

  Invoices_1.Vendor_Number <> [Invoices_2]![Vendor_Number] 

  AND

  Invoices_2.Vendor_Number) > [Invoices_1]![Vendor_Number]

ORDER BY 

  Invoices_1.Invoice_Number, 

  Invoices_1.Amount;
 

Coult be done this way too. Traditional SQL style:
 

SELECT DISTINCT 

  Invoices_1.Invoice_Number, 

  Invoices_1.Amount, 

  Invoices_1.Vendor_Number, 

  Invoices_2.Vendor_Number

FROM 

  Invoices AS Invoices_1, 

  Invoices AS Invoices_2

WHERE 

  Invoices_1.Invoice_Number = [Invoices_2]![Invoice_Number] 

  AND 

  Invoices_1.Amount = [Invoices_2]![Amount]

  AND 

  Invoices_1.Vendor_Number) <> [Invoices_2]![Vendor_Number] 

  AND 

  Invoices_2.Vendor_Number > [Invoices_1]![Vendor_Number]

ORDER BY 

  Invoices_1.Invoice_Number, 

  Invoices_1.Amount;

Open in new window

0
 
LVL 49

Accepted Solution

by:
Gustav Brock earned 500 total points
ID: 22631486
Oops, remove two trailing parenthesis ...
Access JET SQL style:

 

SELECT DISTINCT 

  Invoices_1.Invoice_Number, 

  Invoices_1.Amount, 

  Invoices_1.Vendor_Number, 

  Invoices_2.Vendor_Number

FROM 

  Invoices AS Invoices_1 

INNER JOIN 

  Invoices AS Invoices_2 ON 

    (Invoices_1.Amount = Invoices_2.Amount) 

    AND 

    (Invoices_1.Invoice_Number = Invoices_2.Invoice_Number)

WHERE 

  Invoices_1.Vendor_Number <> [Invoices_2]![Vendor_Number] 

  AND

  Invoices_2.Vendor_Number > [Invoices_1]![Vendor_Number]

ORDER BY 

  Invoices_1.Invoice_Number, 

  Invoices_1.Amount;

 

Coult be done this way too. Traditional SQL style:

 

SELECT DISTINCT 

  Invoices_1.Invoice_Number, 

  Invoices_1.Amount, 

  Invoices_1.Vendor_Number, 

  Invoices_2.Vendor_Number

FROM 

  Invoices AS Invoices_1, 

  Invoices AS Invoices_2

WHERE 

  Invoices_1.Invoice_Number = [Invoices_2]![Invoice_Number] 

  AND 

  Invoices_1.Amount = [Invoices_2]![Amount]

  AND 

  Invoices_1.Vendor_Number <> [Invoices_2]![Vendor_Number] 

  AND 

  Invoices_2.Vendor_Number > [Invoices_1]![Vendor_Number]

ORDER BY 

  Invoices_1.Invoice_Number, 

  Invoices_1.Amount;

Open in new window

0
 
LVL 44

Expert Comment

by:GRayL
ID: 22635918
Isn't this simply a case of duplicate invoice numbers and amounts - never mind whether the name is duplicated?  In which case:

SELECT a.Invoice_Number, a.Amount, (SELECT True as Duplicate FROM Invoices AS b WHERE b.Invoice_Number = a.Invoice_Number, and b.Amount = a.Amount) as Duplicate FROM Invoices AS a ORDER BY a.Invoice_Number, a.Amount;
0
 

Author Comment

by:hotwaterfortea
ID: 22638724
The one that ended up working for me was cactus_data's (the traditional SQL-style one), although I had to modify it a little.  It was initially returning at least one but not all of the vendors associated with an invoice number/amount - for example, it would return GEICO and GEICO but not GEICO Insurance.  All I had to do to was turn

 AND
  Invoices_2.Vendor_Number > [Invoices_1]![Vendor_Number]

into

AND
(Invoices_2.Vendor_Number > [Invoices_1]![Vendor_Number] OR Invoices_2.Vendor_Number < [Invoices_1].[Vendor_Number])

Thanks to everyone for all your help - believe me, you've made my life that much easier.
0
 
LVL 49

Expert Comment

by:Gustav Brock
ID: 22639858
You are welcome!

/gustav
0
 
LVL 44

Expert Comment

by:GRayL
ID: 22644459
Any comment on my last post?
0

Featured Post

Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
microsoft access - xml 10 30
formattig excel from access 3 18
DCount using "OR" 4 19
Syntax using Declare 3 16
In this article we discuss how to recover the missing Outlook 2011 for Mac data like Emails and Contacts manually.
This article descibes how to create a connection between Excel and SAP and how to move data from Excel to SAP or the other way around.
In Microsoft Access, learn the trick to repeating sub-report headings at the top of each page. The problem with sub-reports and headings: Add a dummy group to the sub report using the expression =1: Set the “Repeat Section” property of the dummy…
The viewer will learn how to use a discrete random variable to simulate the return on an investment over a period of years, create a Monte Carlo simulation using the discrete random variable, and create a graph to represent the possible returns over…

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now