We help IT Professionals succeed at work.
Troubleshooting Question

SQL to group records from the same table where common string is found in different field of different record

44 Views
Last Modified: 2020-09-24
Sample for SQL Experts 20200923.xlsx

Hello Experts,
I inherited a poorly managed list of equipment installation table. It is a pretty large Oracle table with 400+K rows.  I just make an abstract of the sample as per attached XLSX

Short descriptions:
Channel = Partner organization ID, in the table there can be many ID's
KeyID     = Unique ID of that record in the table
Name1   = Installation name 1
Name 2  = Installation name 2

What I would like to do is
- For each Channel ID
  - Group together different records where KeyID1.Name1 = KeyID2.Name2
  - I want to avoid Name1 = Name2 for the same KeyID
  - Each group will have its own GroupID (not mandatory, but would nice if possible)

See Desired outcome as shown in the XLSX as well.

I tried several ways with SQL but could not nail it :-(

I would greatly appreciate your help to provide SQL - thanks a bunch in advance

Note:
In the table there are other columns for Street address, City Name, Country Name, Country Code, Latitude, Longitude (numeric with decimal value) etc. as well.  Most reliable info for these column would be CountryCode as it is ISO3 standard
Comment
Watch Question

Brian CroweDatabase Engineer
CERTIFIED EXPERT
Top Expert 2005
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
HainKurtSr. System Analyst
CERTIFIED EXPERT
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
HainKurtSr. System Analyst
CERTIFIED EXPERT

Commented:
above solution misses
 
4013_AUS_SYDNEY_00001EYTANAU005
4013_AUS_Sydney_0014013_AUS_Sydney_001

and
 
4013_BGR_Sofia_00001EYGSSOFBG02
4013_BGR_Sofia_000024013_BGR_Sofia_001

since I don't know the logic for that and they do not match...
CERTIFIED EXPERT
Most Valuable Expert 2012
Distinguished Expert 2019
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION
Sr. System Analyst
CERTIFIED EXPERT
Commented:
This one is on us!
(Get your first solution completely free - no credit card required)
UNLOCK SOLUTION

Author

Commented:
I was tweaking initial advice from Brian and came back here again to see more advices. I check all advices and took all the good parts into it. It works a lot better for me now. THANK YOU all of you so much !!!

Note: about issue like  _0002 and _002
Absolutely, I was thinking fuzzy match will help but I guess there is no fuzzy function in SQL yet (else please enlighten me, I'd love to work with it). My colleague @work do have Talend DQ Module software which I think has fuzzy match, I'll see with them if something can be done to nail this.


HainKurtSr. System Analyst
CERTIFIED EXPERT

Commented:
used regex, to make all NAMES

to trim last 0's
select REGEXP_REPLACE('4013_BGR_Sofia_002', '(_[0]{1,})','_') r from dual;

4013_BGR_Sofia_2
so, final query will be
with t1 as (
select row_number() over (order by i1.name1) r, i1.* from Installation i1
inner join installation i2 on REGEXP_REPLACE(i1.name1, '(_[0]{1,})','_')=REGEXP_REPLACE(i2.name2, '(_[0]{1,})','_') and i1.keyid<>i2.keyid
  ),
  t2 as (
select row_number() over (order by i2.name2) r, i2.* from Installation i1
inner join installation i2 on REGEXP_REPLACE(i1.name1, '(_[0]{1,})','_')=REGEXP_REPLACE(i2.name2, '(_[0]{1,})','_') and i1.keyid<>i2.keyid
  )
 select t1.* from t1
 union
 select t2.* from t2
 order by 1,2,3;

Author

Commented:
Thank you HainKurt
CERTIFIED EXPERT
Most Valuable Expert 2012
Distinguished Expert 2019

Commented:
>>I guess there is no fuzzy function in SQL yet

Depends on how 'fuzzy' you want to get?  There are a few ways to compare strings.

Probably the worst:  soundex.

select soundex('4013_AUS_SYDNEY_00001'), soundex('4013_AUS_Sydney_001') from dual;

Here is why it is the worst:
select soundex('AUS #_*$%@ SIDNEY') from dual;

Probably better UTL_MATCH:
The UTL_MATCH package facilitates matching two records. This is typically used to match names, such as two First Names or two Last Names.
https://docs.oracle.com/en/database/oracle/oracle-database/18/arpls/UTL_MATCH.html

You will need to experiment to see what return value is "good enough" for a match.

The two methods in there have their quirks as well.  Many years ago I tried different methods to match up addresses.  None of the methods above were good enough for our needs. No, I never found a way and that part of the project died.

Maybe they will work for your data.

There are other string compare algorithms out there and a few have been ported to PL/SQL.
Unlock the solution to this question.
Join our community and discover your potential

Experts Exchange is the only place where you can interact directly with leading experts in the technology field. Become a member today and access the collective knowledge of thousands of technology experts.

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.