Solved

Compare SQL columns to find pattern match between 2 tables

Posted on 2014-09-18
9
131 Views
Last Modified: 2014-10-03
I have a table with Item numbers that were mis-typed and I want to find a pattern match to the proper item numbers in the inventory table, is this doable? For example the mis-typed item was BIR32309 but the actual item is BIR35309 but the pattern mismatch can be anywhere in the string
0
Comment
Question by:skull52
  • 4
  • 3
  • 2
9 Comments
 
LVL 8

Expert Comment

by:Ganapathi
ID: 40331019
As far as I know, this is very difficult. This is purely based on the data. You will have to go for almost all combinations character by character and compare it which is too complex. May be we can identify few mistypes but not all.

Post some more data. Will try.
0
 

Author Comment

by:skull52
ID: 40331026
a few is better than none
0
 
LVL 8

Expert Comment

by:Ganapathi
ID: 40331052
Can you post some actual data?
0
 
LVL 69

Expert Comment

by:ScottPletcher
ID: 40331085
---- test data setup ----

IF OBJECT_ID('tempdb.dbo.#items_maybe_bad') IS NOT NULL
    DROP TABLE #items_maybe_bad
IF OBJECT_ID('tempdb.dbo.#inventory') IS NOT NULL
    DROP TABLE #inventory

CREATE TABLE #items_maybe_bad (
    key_col int NOT NULL,
    item_number varchar(30) NULL
    )
CREATE TABLE #inventory (
    item_number varchar(30) NOT NULL PRIMARY KEY
    )
TRUNCATE TABLE #items_maybe_bad
INSERT INTO #items_maybe_bad
SELECT 10, 'BIR32309' UNION ALL --one char off, should match
SELECT 12, 'BIR-35300' UNION ALL --one char off (after scrubbing), should match
SELECT 20, 'BIR35309' UNION ALL --exact match to existing item, should be ignored
SELECT 30, 'BIR32209' --two chars off, shouldn't match, unless @max_chars_off set to 2+
--
INSERT INTO #inventory
SELECT 'BIR35309'

------------------------------------------------------------------------------------------------------------------------

---- main code ----

DECLARE @max_chars_off_to_match tinyint
SET @max_chars_off_to_match = 1
--SET @max_chars_off_to_match = 2 --uncomment this line, and key_col 30 should also then match

;WITH
cteTally10 AS (
    SELECT 0 AS tally UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL
    SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
),
cteTally100 AS (
    SELECT ROW_NUMBER() OVER(ORDER BY c1.tally) AS tally
    FROM cteTally10 c1
    CROSS JOIN cteTally10 c2
)
SELECT
    imb.key_col,
    scrubbed_item,
    i.item_number
FROM #items_maybe_bad imb
CROSS APPLY (
    --remove miscellaneous editing char(s) anyone typed in by mistake
    SELECT REPLACE(REPLACE(REPLACE(item_number, ' ', ''), ',', ''), '-', '') AS scrubbed_item
) AS data_scrub_1
INNER JOIN #inventory i ON
    i.item_number <> scrubbed_item AND
    LEN(i.item_number) = LEN(scrubbed_item)
INNER JOIN cteTally100 t ON
    t.tally <= LEN(scrubbed_item)
GROUP BY
    imb.key_col,
    scrubbed_item,
    i.item_number
HAVING
    SUM(CASE WHEN SUBSTRING(i.item_number, t.tally, 1) = SUBSTRING(scrubbed_item, t.tally, 1) THEN 1 ELSE 0 END) >=
    LEN(scrubbed_item) - @max_chars_off_to_match


This was just to get something working, not sure how it will perform; didn't worry about performance tweaking as yet.
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 

Author Comment

by:skull52
ID: 40331140
This a sample from the mis-typed table <MisMatchedFlyr>
ITEMNMBR
BROG01221304                   
BROG01221305                   
BROG0131350604                 
BROG01331226                   
BROG035800228                  
BROG12211604                   
BROG1221604                    
BROG308290891                  
BROGO12212204                  

Open in new window


This is from the inventory table matching to the prefix  <IV00101>
ITEMNMBR
BROG012211304                  
BROG012211305                  
BROG012211604                  
BROG012211605                  
BROG012211813                  
BROG012211814                  
BROG012211914                  
BROG012212113                  
BROG012212114                  
BROG012212204                  
BROG012212205                  
BROG012212303                  
BROG012212304                  
BROG012212305                  
BROG012214324                  
BROG012214624                  
BROG012216307                  
BROG012216607                  
BROG012224607                  

Open in new window

0
 

Author Comment

by:skull52
ID: 40331263
Scott,
I tried the sample you posted and it seems to work so how would I  replace
CREATE TABLE #items_maybe_bad (
    key_col int NOT NULL,
    item_number varchar(30) NULL
    )
CREATE TABLE #inventory (
    item_number varchar(30) NOT NULL PRIMARY KEY
    )
TRUNCATE TABLE #items_maybe_bad <MisMatchedFlyr>
INSERT INTO #items_maybe_bad
SELECT 10, 'BIR32309' UNION ALL --one char off, should match
SELECT 12, 'BIR-35300' UNION ALL --one char off (after scrubbing), should match
SELECT 20, 'BIR35309' UNION ALL --exact match to existing item, should be ignored
SELECT 30, 'BIR32209' --two chars off, shouldn't match, unless @max_chars_off set to 2+
--
INSERT INTO #inventory <IV00101>
SELECT 'BIR35309'

 with data from  the 2 tables in bold above
0
 
LVL 69

Expert Comment

by:ScottPletcher
ID: 40331272
CORRECTION first, to the code:
...
INNER JOIN #inventory i ON
     i.item_number <> scrubbed_item AND
     ABS(LEN(i.item_number) - LEN(scrubbed_item)) <= 1
...


After that:

For a test, you can just insert the sample data you posted above into the two temp tables, then run the (corrected) code using the temp tables.


To run it against the real tables, replace "#items_maybe_bad" and "#inventory" with your actual table names, and make whatever column name and other changes are needed to make the code run.
0
 
LVL 69

Accepted Solution

by:
ScottPletcher earned 500 total points
ID: 40331292
Don't attempt to run the data part above on your real tables -- that data is just dummy test data.

Just change the main query itself to refer to your table names:
...
FROM MisMatchedFlyr imb
 CROSS APPLY (
     --remove miscellaneous editing char(s) anyone typed in by mistake
     SELECT REPLACE(REPLACE(REPLACE(item_number, ' ', ''), ',', ''), '-', '') AS scrubbed_item
 ) AS data_scrub_1
 INNER JOIN IV00101 i ON
...

If you want to make the alias consistent with the table name, you can also do a global replace in SSMS of "imb." with "mmf." and change the FROM to:
FROM MisMatchedFlyer mmf
0
 

Author Comment

by:skull52
ID: 40360318
Sorry I didn't get back sooner was out of town... I tried everything and can't seem to get the results I need. So I am closing this thread. Scott thanks for your assistance. I am going to give you the points as you got me the closest.
0

Featured Post

Backup Your Microsoft Windows Server®

Backup all your Microsoft Windows Server – on-premises, in remote locations, in private and hybrid clouds. Your entire Windows Server will be backed up in one easy step with patented, block-level disk imaging. We achieve RTOs (recovery time objectives) as low as 15 seconds.

Join & Write a Comment

If you find yourself in this situation “I have used SELECT DISTINCT but I’m getting duplicates” then I'm sorry to say you are using the wrong SQL technique as it only does one thing which is: produces whole rows that are unique. If the results you a…
Occasionally there is a need to clean table columns, especially if you have inherited legacy data. There are obviously many ways to accomplish that, including elaborate UPDATE queries with anywhere from one to numerous REPLACE functions (even within…
In this seventh video of the Xpdf series, we discuss and demonstrate the PDFfonts utility, which lists all the fonts used in a PDF file. It does this via a command line interface, making it suitable for use in programs, scripts, batch files — any pl…
This video gives you a great overview about bandwidth monitoring with SNMP and WMI with our network monitoring solution PRTG Network Monitor (https://www.paessler.com/prtg). If you're looking for how to monitor bandwidth using netflow or packet s…

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now