Solved

SQL to Extract similar rows from a table Oracle

Posted on 2014-01-28
15
463 Views
Last Modified: 2014-01-29
Vendor      INVOICE_ID
ABC      1234567
ABC      1234567ADJ
ABC          SDSD45454
XYZ      100000
XYZ           32324KL1
XYZ      100000ADJ
A11      124588
A12      45454545
A13      56565679256



I need to get the data from above table (VEND_TBL) like below.

Vendor      INVOICE_ID
ABC      1234567
ABC      1234567ADJ
XYZ      100000
XYZ      100000ADJ

SQL needs to pull those rows which have the same pattern like for example ABC vendor has 1234567ADJ and 1234567
0
Comment
Question by:rsj1977
  • 7
  • 6
  • 2
15 Comments
 
LVL 31

Expert Comment

by:awking00
ID: 39815631
Are the differences always going to be alpha characters?
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 39815639
is the target pattern for each vendor the one consisting of only numbers?

If not, what are the rules for picking which pattern we should look for?
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 39815668
Assuming there is only one pattern for each vendor and that pattern is the all-numeric then try this...


SELECT vendor, invoice_id
  FROM (SELECT vendor, invoice_id, COUNT(*) OVER (PARTITION BY vendor) cnt
          FROM (SELECT t.*,
                       MIN(REGEXP_SUBSTR(invoice_id, '[0-9]+')) OVER (PARTITION BY vendor) pattern
                  FROM yourtable t) x
         WHERE invoice_id LIKE '%' || pattern || '%')
 WHERE cnt > 1
0
 

Author Comment

by:rsj1977
ID: 39815681
There is no particular pattern , what i learned today is some of the data would have ADJ pattern at the end.
and some of the data would have REKEY as pattern.

First users enter Invoice ID(1234567) and if the invoice needs to be adjusted. They enter a new Invoice with 1234567ADJ or 1234567REKEY something like that.

So when I run the SQL I should be able to see the original Invoice and adjuested Invoice.
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 39815692
your new example still has the pattern as all numeric.

Is that always going to be the case? and, will there always be just one per vendor?

If not, how do we know 123456X vs 123456Y  ?

or  1234567 vs 1234568  ?


if looking at the id isn't sufficient to identify the "original"  - are there other columns?
Is there a date column that we could sort on to find the first one?
0
 
LVL 31

Expert Comment

by:awking00
ID: 39815695
select t1.vendor, t1.invoice_id
from vend_tbl t1, vend_tbl t2
where t1.invoice_id != t2.invoice_id
and t1.invoice_id = regexp_replace(t2.invoice_id,'([[:alpha:]])',null)
union all
select t2.vendor, t2.invoice_id
from vend_tbl t1, vend_tbl t2
where t1.invoice_id != t2.invoice_id
and t1.invoice_id = regexp_replace(t2.invoice_id,'([[:alpha:]])',null)
order by vendor;
0
 
LVL 31

Accepted Solution

by:
awking00 earned 500 total points
ID: 39815720
A little less cumbersome -
select vendor, invoice_id from
(select vendor, invoice_id,
 count(*) over (partition by vendor, regexp_replace(invoice_id,'([[:alpha:]])',null) order by vendor) cnt
 from vend_tbl)
where cnt > 1;
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 73

Expert Comment

by:sdstuber
ID: 39815722
also, will the subsequent ids always be the original with a suffix?

or given 123456  could you have ABC123456  ?

In my answer above I assumed prefixes were possible.  It'll still work even if they aren't it's just less efficient.

Change
WHERE invoice_id LIKE '%' || pattern || '%')
to
WHERE invoice_id LIKE pattern || '%')
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 39815729
note, awking00's approach is making a similar assumption to what I did.

That is,  he is assuming the only difference between your values is the addition of letters
and the original id has no alphabetic characters.'

again, if these assumptions aren't true, please post the rules for identifying which id is the basis for the search
0
 
LVL 31

Expert Comment

by:awking00
ID: 39815730
Note - that would also work assuming there could be records like -
1234567
1234567ADJ
1234567REKEY
0
 
LVL 31

Expert Comment

by:awking00
ID: 39815739
And also for ADJ1234567
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 39815740
also,  can the additions have non-alphabetic characters?
based on your example here:  
XYZ           32324KL1

It looks like they can.

So, if you had an ID 32324,  awking00's method would not find these to be similar rows.

As you can hopefully tell,  we're just guessing.  We need more information about what the rules of "similar" are and, finding the "original" to compare to
0
 
LVL 31

Expert Comment

by:awking00
ID: 39815742
And, yes, this also assumes the difference is always alpha since I got no response to the contrary from my first question.
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 39815749
Same here, just guessing.  

With the limited sample either version will work; but on a larger scale either or both could fail.
0
 

Author Comment

by:rsj1977
ID: 39815757
Thanks all for your help, I am still trying to get more data analysis from my team, I would update as soon I get it.
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Join & Write a Comment

Suggested Solutions

Working with Network Access Control Lists in Oracle 11g (part 1) Part 2: http://www.e-e.com/A_9074.html So, you upgraded to a shiny new 11g database and all of a sudden every program that used UTL_MAIL, UTL_SMTP, UTL_TCP, UTL_HTTP or any oth…
How to Create User-Defined Aggregates in Oracle Before we begin creating these things, what are user-defined aggregates?  They are a feature introduced in Oracle 9i that allows a developer to create his or her own functions like "SUM", "AVG", and…
This video explains at a high level with the mandatory Oracle Memory processes are as well as touching on some of the more common optional ones.
This video shows how to recover a database from a user managed backup

758 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

20 Experts available now in Live!

Get 1:1 Help Now