Solved

SQL to Extract similar rows from a table Oracle

Posted on 2014-01-28
15
464 Views
Last Modified: 2014-01-29
Vendor      INVOICE_ID
ABC      1234567
ABC      1234567ADJ
ABC          SDSD45454
XYZ      100000
XYZ           32324KL1
XYZ      100000ADJ
A11      124588
A12      45454545
A13      56565679256



I need to get the data from above table (VEND_TBL) like below.

Vendor      INVOICE_ID
ABC      1234567
ABC      1234567ADJ
XYZ      100000
XYZ      100000ADJ

SQL needs to pull those rows which have the same pattern like for example ABC vendor has 1234567ADJ and 1234567
0
Comment
Question by:rsj1977
  • 7
  • 6
  • 2
15 Comments
 
LVL 32

Expert Comment

by:awking00
ID: 39815631
Are the differences always going to be alpha characters?
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 39815639
is the target pattern for each vendor the one consisting of only numbers?

If not, what are the rules for picking which pattern we should look for?
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 39815668
Assuming there is only one pattern for each vendor and that pattern is the all-numeric then try this...


SELECT vendor, invoice_id
  FROM (SELECT vendor, invoice_id, COUNT(*) OVER (PARTITION BY vendor) cnt
          FROM (SELECT t.*,
                       MIN(REGEXP_SUBSTR(invoice_id, '[0-9]+')) OVER (PARTITION BY vendor) pattern
                  FROM yourtable t) x
         WHERE invoice_id LIKE '%' || pattern || '%')
 WHERE cnt > 1
0
 

Author Comment

by:rsj1977
ID: 39815681
There is no particular pattern , what i learned today is some of the data would have ADJ pattern at the end.
and some of the data would have REKEY as pattern.

First users enter Invoice ID(1234567) and if the invoice needs to be adjusted. They enter a new Invoice with 1234567ADJ or 1234567REKEY something like that.

So when I run the SQL I should be able to see the original Invoice and adjuested Invoice.
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 39815692
your new example still has the pattern as all numeric.

Is that always going to be the case? and, will there always be just one per vendor?

If not, how do we know 123456X vs 123456Y  ?

or  1234567 vs 1234568  ?


if looking at the id isn't sufficient to identify the "original"  - are there other columns?
Is there a date column that we could sort on to find the first one?
0
 
LVL 32

Expert Comment

by:awking00
ID: 39815695
select t1.vendor, t1.invoice_id
from vend_tbl t1, vend_tbl t2
where t1.invoice_id != t2.invoice_id
and t1.invoice_id = regexp_replace(t2.invoice_id,'([[:alpha:]])',null)
union all
select t2.vendor, t2.invoice_id
from vend_tbl t1, vend_tbl t2
where t1.invoice_id != t2.invoice_id
and t1.invoice_id = regexp_replace(t2.invoice_id,'([[:alpha:]])',null)
order by vendor;
0
 
LVL 32

Accepted Solution

by:
awking00 earned 500 total points
ID: 39815720
A little less cumbersome -
select vendor, invoice_id from
(select vendor, invoice_id,
 count(*) over (partition by vendor, regexp_replace(invoice_id,'([[:alpha:]])',null) order by vendor) cnt
 from vend_tbl)
where cnt > 1;
0
PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

 
LVL 73

Expert Comment

by:sdstuber
ID: 39815722
also, will the subsequent ids always be the original with a suffix?

or given 123456  could you have ABC123456  ?

In my answer above I assumed prefixes were possible.  It'll still work even if they aren't it's just less efficient.

Change
WHERE invoice_id LIKE '%' || pattern || '%')
to
WHERE invoice_id LIKE pattern || '%')
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 39815729
note, awking00's approach is making a similar assumption to what I did.

That is,  he is assuming the only difference between your values is the addition of letters
and the original id has no alphabetic characters.'

again, if these assumptions aren't true, please post the rules for identifying which id is the basis for the search
0
 
LVL 32

Expert Comment

by:awking00
ID: 39815730
Note - that would also work assuming there could be records like -
1234567
1234567ADJ
1234567REKEY
0
 
LVL 32

Expert Comment

by:awking00
ID: 39815739
And also for ADJ1234567
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 39815740
also,  can the additions have non-alphabetic characters?
based on your example here:  
XYZ           32324KL1

It looks like they can.

So, if you had an ID 32324,  awking00's method would not find these to be similar rows.

As you can hopefully tell,  we're just guessing.  We need more information about what the rules of "similar" are and, finding the "original" to compare to
0
 
LVL 32

Expert Comment

by:awking00
ID: 39815742
And, yes, this also assumes the difference is always alpha since I got no response to the contrary from my first question.
0
 
LVL 73

Expert Comment

by:sdstuber
ID: 39815749
Same here, just guessing.  

With the limited sample either version will work; but on a larger scale either or both could fail.
0
 

Author Comment

by:rsj1977
ID: 39815757
Thanks all for your help, I am still trying to get more data analysis from my team, I would update as soon I get it.
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
What is the version of ojdbc6.jar 2 39
case statement in where clause with not exist 15 46
selective queries 7 22
Use of Exception to end a Loop 3 18
Truncate is a DDL Command where as Delete is a DML Command. Both will delete data from table, but what is the difference between these below statements truncate table <table_name> ?? delete from <table_name> ?? The first command cannot be …
How to Create User-Defined Aggregates in Oracle Before we begin creating these things, what are user-defined aggregates?  They are a feature introduced in Oracle 9i that allows a developer to create his or her own functions like "SUM", "AVG", and…
This video shows, step by step, how to configure Oracle Heterogeneous Services via the Generic Gateway Agent in order to make a connection from an Oracle session and access a remote SQL Server database table.
Via a live example, show how to restore a database from backup after a simulated disk failure using RMAN.

920 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

15 Experts available now in Live!

Get 1:1 Help Now