Solved

SQL to Extract similar rows from a table Oracle

Posted on 2014-01-28
15
468 Views
Last Modified: 2014-01-29
Vendor      INVOICE_ID
ABC      1234567
ABC      1234567ADJ
ABC          SDSD45454
XYZ      100000
XYZ           32324KL1
XYZ      100000ADJ
A11      124588
A12      45454545
A13      56565679256



I need to get the data from above table (VEND_TBL) like below.

Vendor      INVOICE_ID
ABC      1234567
ABC      1234567ADJ
XYZ      100000
XYZ      100000ADJ

SQL needs to pull those rows which have the same pattern like for example ABC vendor has 1234567ADJ and 1234567
0
Comment
Question by:rsj1977
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 7
  • 6
  • 2
15 Comments
 
LVL 32

Expert Comment

by:awking00
ID: 39815631
Are the differences always going to be alpha characters?
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 39815639
is the target pattern for each vendor the one consisting of only numbers?

If not, what are the rules for picking which pattern we should look for?
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 39815668
Assuming there is only one pattern for each vendor and that pattern is the all-numeric then try this...


SELECT vendor, invoice_id
  FROM (SELECT vendor, invoice_id, COUNT(*) OVER (PARTITION BY vendor) cnt
          FROM (SELECT t.*,
                       MIN(REGEXP_SUBSTR(invoice_id, '[0-9]+')) OVER (PARTITION BY vendor) pattern
                  FROM yourtable t) x
         WHERE invoice_id LIKE '%' || pattern || '%')
 WHERE cnt > 1
0
Online Training Solution

Drastically shorten your training time with WalkMe's advanced online training solution that Guides your trainees to action. Forget about retraining and skyrocket knowledge retention rates.

 

Author Comment

by:rsj1977
ID: 39815681
There is no particular pattern , what i learned today is some of the data would have ADJ pattern at the end.
and some of the data would have REKEY as pattern.

First users enter Invoice ID(1234567) and if the invoice needs to be adjusted. They enter a new Invoice with 1234567ADJ or 1234567REKEY something like that.

So when I run the SQL I should be able to see the original Invoice and adjuested Invoice.
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 39815692
your new example still has the pattern as all numeric.

Is that always going to be the case? and, will there always be just one per vendor?

If not, how do we know 123456X vs 123456Y  ?

or  1234567 vs 1234568  ?


if looking at the id isn't sufficient to identify the "original"  - are there other columns?
Is there a date column that we could sort on to find the first one?
0
 
LVL 32

Expert Comment

by:awking00
ID: 39815695
select t1.vendor, t1.invoice_id
from vend_tbl t1, vend_tbl t2
where t1.invoice_id != t2.invoice_id
and t1.invoice_id = regexp_replace(t2.invoice_id,'([[:alpha:]])',null)
union all
select t2.vendor, t2.invoice_id
from vend_tbl t1, vend_tbl t2
where t1.invoice_id != t2.invoice_id
and t1.invoice_id = regexp_replace(t2.invoice_id,'([[:alpha:]])',null)
order by vendor;
0
 
LVL 32

Accepted Solution

by:
awking00 earned 500 total points
ID: 39815720
A little less cumbersome -
select vendor, invoice_id from
(select vendor, invoice_id,
 count(*) over (partition by vendor, regexp_replace(invoice_id,'([[:alpha:]])',null) order by vendor) cnt
 from vend_tbl)
where cnt > 1;
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 39815722
also, will the subsequent ids always be the original with a suffix?

or given 123456  could you have ABC123456  ?

In my answer above I assumed prefixes were possible.  It'll still work even if they aren't it's just less efficient.

Change
WHERE invoice_id LIKE '%' || pattern || '%')
to
WHERE invoice_id LIKE pattern || '%')
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 39815729
note, awking00's approach is making a similar assumption to what I did.

That is,  he is assuming the only difference between your values is the addition of letters
and the original id has no alphabetic characters.'

again, if these assumptions aren't true, please post the rules for identifying which id is the basis for the search
0
 
LVL 32

Expert Comment

by:awking00
ID: 39815730
Note - that would also work assuming there could be records like -
1234567
1234567ADJ
1234567REKEY
0
 
LVL 32

Expert Comment

by:awking00
ID: 39815739
And also for ADJ1234567
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 39815740
also,  can the additions have non-alphabetic characters?
based on your example here:  
XYZ           32324KL1

It looks like they can.

So, if you had an ID 32324,  awking00's method would not find these to be similar rows.

As you can hopefully tell,  we're just guessing.  We need more information about what the rules of "similar" are and, finding the "original" to compare to
0
 
LVL 32

Expert Comment

by:awking00
ID: 39815742
And, yes, this also assumes the difference is always alpha since I got no response to the contrary from my first question.
0
 
LVL 74

Expert Comment

by:sdstuber
ID: 39815749
Same here, just guessing.  

With the limited sample either version will work; but on a larger scale either or both could fail.
0
 

Author Comment

by:rsj1977
ID: 39815757
Thanks all for your help, I am still trying to get more data analysis from my team, I would update as soon I get it.
0

Featured Post

Free Tool: Subnet Calculator

The subnet calculator helps you design networks by taking an IP address and network mask and returning information such as network, broadcast address, and host range.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Truncate is a DDL Command where as Delete is a DML Command. Both will delete data from table, but what is the difference between these below statements truncate table <table_name> ?? delete from <table_name> ?? The first command cannot be …
Using SQL Scripts we can save all the SQL queries as files that we use very frequently on our database later point of time. This is one of the feature present under SQL Workshop in Oracle Application Express.
This video shows how to copy a database user from one database to another user DBMS_METADATA.  It also shows how to copy a user's permissions and discusses password hash differences between Oracle 10g and 11g.
This video shows how to configure and send email from and Oracle database using both UTL_SMTP and UTL_MAIL, as well as comparing UTL_SMTP to a manual SMTP conversation with a mail server.

734 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question