Solved

SQL to Extract similar rows from a table Oracle

Posted on 2014-01-28
15
467 Views
Last Modified: 2014-01-29
Vendor      INVOICE_ID
ABC      1234567
ABC      1234567ADJ
ABC          SDSD45454
XYZ      100000
XYZ           32324KL1
XYZ      100000ADJ
A11      124588
A12      45454545
A13      56565679256



I need to get the data from above table (VEND_TBL) like below.

Vendor      INVOICE_ID
ABC      1234567
ABC      1234567ADJ
XYZ      100000
XYZ      100000ADJ

SQL needs to pull those rows which have the same pattern like for example ABC vendor has 1234567ADJ and 1234567
0
Comment
Question by:rsj1977
  • 7
  • 6
  • 2
15 Comments
 
LVL 32

Expert Comment

by:awking00
ID: 39815631
Are the differences always going to be alpha characters?
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 39815639
is the target pattern for each vendor the one consisting of only numbers?

If not, what are the rules for picking which pattern we should look for?
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 39815668
Assuming there is only one pattern for each vendor and that pattern is the all-numeric then try this...


SELECT vendor, invoice_id
  FROM (SELECT vendor, invoice_id, COUNT(*) OVER (PARTITION BY vendor) cnt
          FROM (SELECT t.*,
                       MIN(REGEXP_SUBSTR(invoice_id, '[0-9]+')) OVER (PARTITION BY vendor) pattern
                  FROM yourtable t) x
         WHERE invoice_id LIKE '%' || pattern || '%')
 WHERE cnt > 1
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 

Author Comment

by:rsj1977
ID: 39815681
There is no particular pattern , what i learned today is some of the data would have ADJ pattern at the end.
and some of the data would have REKEY as pattern.

First users enter Invoice ID(1234567) and if the invoice needs to be adjusted. They enter a new Invoice with 1234567ADJ or 1234567REKEY something like that.

So when I run the SQL I should be able to see the original Invoice and adjuested Invoice.
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 39815692
your new example still has the pattern as all numeric.

Is that always going to be the case? and, will there always be just one per vendor?

If not, how do we know 123456X vs 123456Y  ?

or  1234567 vs 1234568  ?


if looking at the id isn't sufficient to identify the "original"  - are there other columns?
Is there a date column that we could sort on to find the first one?
0
 
LVL 32

Expert Comment

by:awking00
ID: 39815695
select t1.vendor, t1.invoice_id
from vend_tbl t1, vend_tbl t2
where t1.invoice_id != t2.invoice_id
and t1.invoice_id = regexp_replace(t2.invoice_id,'([[:alpha:]])',null)
union all
select t2.vendor, t2.invoice_id
from vend_tbl t1, vend_tbl t2
where t1.invoice_id != t2.invoice_id
and t1.invoice_id = regexp_replace(t2.invoice_id,'([[:alpha:]])',null)
order by vendor;
0
 
LVL 32

Accepted Solution

by:
awking00 earned 500 total points
ID: 39815720
A little less cumbersome -
select vendor, invoice_id from
(select vendor, invoice_id,
 count(*) over (partition by vendor, regexp_replace(invoice_id,'([[:alpha:]])',null) order by vendor) cnt
 from vend_tbl)
where cnt > 1;
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 39815722
also, will the subsequent ids always be the original with a suffix?

or given 123456  could you have ABC123456  ?

In my answer above I assumed prefixes were possible.  It'll still work even if they aren't it's just less efficient.

Change
WHERE invoice_id LIKE '%' || pattern || '%')
to
WHERE invoice_id LIKE pattern || '%')
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 39815729
note, awking00's approach is making a similar assumption to what I did.

That is,  he is assuming the only difference between your values is the addition of letters
and the original id has no alphabetic characters.'

again, if these assumptions aren't true, please post the rules for identifying which id is the basis for the search
0
 
LVL 32

Expert Comment

by:awking00
ID: 39815730
Note - that would also work assuming there could be records like -
1234567
1234567ADJ
1234567REKEY
0
 
LVL 32

Expert Comment

by:awking00
ID: 39815739
And also for ADJ1234567
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 39815740
also,  can the additions have non-alphabetic characters?
based on your example here:  
XYZ           32324KL1

It looks like they can.

So, if you had an ID 32324,  awking00's method would not find these to be similar rows.

As you can hopefully tell,  we're just guessing.  We need more information about what the rules of "similar" are and, finding the "original" to compare to
0
 
LVL 32

Expert Comment

by:awking00
ID: 39815742
And, yes, this also assumes the difference is always alpha since I got no response to the contrary from my first question.
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 39815749
Same here, just guessing.  

With the limited sample either version will work; but on a larger scale either or both could fail.
0
 

Author Comment

by:rsj1977
ID: 39815757
Thanks all for your help, I am still trying to get more data analysis from my team, I would update as soon I get it.
0

Featured Post

On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Oracle Insert not working 10 46
return value in based on value passed 6 37
Need to Understand Resolution to Oracle Error ORA-00600 2 38
plsql job on oracle 18 72
Introduction A previously published article on Experts Exchange ("Joins in Oracle", http://www.experts-exchange.com/Database/Oracle/A_8249-Joins-in-Oracle.html) makes a statement about "Oracle proprietary" joins and mixes the join syntax with gen…
Shell script to create broker configuration file using current broker Configuration, solely for purpose of backup on Linux. Script may need to be modified depending on OS-installation. Please deploy and verify the script in a test environment.
This video shows information on the Oracle Data Dictionary, starting with the Oracle documentation, explaining the different types of Data Dictionary views available by group and permissions as well as giving examples on how to retrieve data from th…
This video shows how to Export data from an Oracle database using the Datapump Export Utility.  The corresponding Datapump Import utility is also discussed and demonstrated.

685 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question