Compare SQL columns to find pattern match between 2 tables

I have a table with Item numbers that were mis-typed and I want to find a pattern match to the proper item numbers in the inventory table, is this doable? For example the mis-typed item was BIR32309 but the actual item is BIR35309 but the pattern mismatch can be anywhere in the string
skull52Asked:
Who is Participating?
 
Scott PletcherConnect With a Mentor Senior DBACommented:
Don't attempt to run the data part above on your real tables -- that data is just dummy test data.

Just change the main query itself to refer to your table names:
...
FROM MisMatchedFlyr imb
 CROSS APPLY (
     --remove miscellaneous editing char(s) anyone typed in by mistake
     SELECT REPLACE(REPLACE(REPLACE(item_number, ' ', ''), ',', ''), '-', '') AS scrubbed_item
 ) AS data_scrub_1
 INNER JOIN IV00101 i ON
...

If you want to make the alias consistent with the table name, you can also do a global replace in SSMS of "imb." with "mmf." and change the FROM to:
FROM MisMatchedFlyer mmf
0
 
GanapathiFacets DeveloperCommented:
As far as I know, this is very difficult. This is purely based on the data. You will have to go for almost all combinations character by character and compare it which is too complex. May be we can identify few mistypes but not all.

Post some more data. Will try.
0
 
skull52Author Commented:
a few is better than none
0
Cloud Class® Course: CompTIA Healthcare IT Tech

This course will help prep you to earn the CompTIA Healthcare IT Technician certification showing that you have the knowledge and skills needed to succeed in installing, managing, and troubleshooting IT systems in medical and clinical settings.

 
GanapathiFacets DeveloperCommented:
Can you post some actual data?
0
 
Scott PletcherSenior DBACommented:
---- test data setup ----

IF OBJECT_ID('tempdb.dbo.#items_maybe_bad') IS NOT NULL
    DROP TABLE #items_maybe_bad
IF OBJECT_ID('tempdb.dbo.#inventory') IS NOT NULL
    DROP TABLE #inventory

CREATE TABLE #items_maybe_bad (
    key_col int NOT NULL,
    item_number varchar(30) NULL
    )
CREATE TABLE #inventory (
    item_number varchar(30) NOT NULL PRIMARY KEY
    )
TRUNCATE TABLE #items_maybe_bad
INSERT INTO #items_maybe_bad
SELECT 10, 'BIR32309' UNION ALL --one char off, should match
SELECT 12, 'BIR-35300' UNION ALL --one char off (after scrubbing), should match
SELECT 20, 'BIR35309' UNION ALL --exact match to existing item, should be ignored
SELECT 30, 'BIR32209' --two chars off, shouldn't match, unless @max_chars_off set to 2+
--
INSERT INTO #inventory
SELECT 'BIR35309'

------------------------------------------------------------------------------------------------------------------------

---- main code ----

DECLARE @max_chars_off_to_match tinyint
SET @max_chars_off_to_match = 1
--SET @max_chars_off_to_match = 2 --uncomment this line, and key_col 30 should also then match

;WITH
cteTally10 AS (
    SELECT 0 AS tally UNION ALL SELECT 1 UNION ALL SELECT 2 UNION ALL SELECT 3 UNION ALL SELECT 4 UNION ALL
    SELECT 5 UNION ALL SELECT 6 UNION ALL SELECT 7 UNION ALL SELECT 8 UNION ALL SELECT 9
),
cteTally100 AS (
    SELECT ROW_NUMBER() OVER(ORDER BY c1.tally) AS tally
    FROM cteTally10 c1
    CROSS JOIN cteTally10 c2
)
SELECT
    imb.key_col,
    scrubbed_item,
    i.item_number
FROM #items_maybe_bad imb
CROSS APPLY (
    --remove miscellaneous editing char(s) anyone typed in by mistake
    SELECT REPLACE(REPLACE(REPLACE(item_number, ' ', ''), ',', ''), '-', '') AS scrubbed_item
) AS data_scrub_1
INNER JOIN #inventory i ON
    i.item_number <> scrubbed_item AND
    LEN(i.item_number) = LEN(scrubbed_item)
INNER JOIN cteTally100 t ON
    t.tally <= LEN(scrubbed_item)
GROUP BY
    imb.key_col,
    scrubbed_item,
    i.item_number
HAVING
    SUM(CASE WHEN SUBSTRING(i.item_number, t.tally, 1) = SUBSTRING(scrubbed_item, t.tally, 1) THEN 1 ELSE 0 END) >=
    LEN(scrubbed_item) - @max_chars_off_to_match


This was just to get something working, not sure how it will perform; didn't worry about performance tweaking as yet.
0
 
skull52Author Commented:
This a sample from the mis-typed table <MisMatchedFlyr>
ITEMNMBR
BROG01221304                   
BROG01221305                   
BROG0131350604                 
BROG01331226                   
BROG035800228                  
BROG12211604                   
BROG1221604                    
BROG308290891                  
BROGO12212204                  

Open in new window


This is from the inventory table matching to the prefix  <IV00101>
ITEMNMBR
BROG012211304                  
BROG012211305                  
BROG012211604                  
BROG012211605                  
BROG012211813                  
BROG012211814                  
BROG012211914                  
BROG012212113                  
BROG012212114                  
BROG012212204                  
BROG012212205                  
BROG012212303                  
BROG012212304                  
BROG012212305                  
BROG012214324                  
BROG012214624                  
BROG012216307                  
BROG012216607                  
BROG012224607                  

Open in new window

0
 
skull52Author Commented:
Scott,
I tried the sample you posted and it seems to work so how would I  replace
CREATE TABLE #items_maybe_bad (
    key_col int NOT NULL,
    item_number varchar(30) NULL
    )
CREATE TABLE #inventory (
    item_number varchar(30) NOT NULL PRIMARY KEY
    )
TRUNCATE TABLE #items_maybe_bad <MisMatchedFlyr>
INSERT INTO #items_maybe_bad
SELECT 10, 'BIR32309' UNION ALL --one char off, should match
SELECT 12, 'BIR-35300' UNION ALL --one char off (after scrubbing), should match
SELECT 20, 'BIR35309' UNION ALL --exact match to existing item, should be ignored
SELECT 30, 'BIR32209' --two chars off, shouldn't match, unless @max_chars_off set to 2+
--
INSERT INTO #inventory <IV00101>
SELECT 'BIR35309'

 with data from  the 2 tables in bold above
0
 
Scott PletcherSenior DBACommented:
CORRECTION first, to the code:
...
INNER JOIN #inventory i ON
     i.item_number <> scrubbed_item AND
     ABS(LEN(i.item_number) - LEN(scrubbed_item)) <= 1
...


After that:

For a test, you can just insert the sample data you posted above into the two temp tables, then run the (corrected) code using the temp tables.


To run it against the real tables, replace "#items_maybe_bad" and "#inventory" with your actual table names, and make whatever column name and other changes are needed to make the code run.
0
 
skull52Author Commented:
Sorry I didn't get back sooner was out of town... I tried everything and can't seem to get the results I need. So I am closing this thread. Scott thanks for your assistance. I am going to give you the points as you got me the closest.
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.