Link to home
Start Free TrialLog in
Avatar of gpvanarsdale
gpvanarsdale

asked on

Query to compare two tables and match at least 2 words

I have two SQL tables.  1 table has the name of prospects (PROSPECTS) and the other table has the names of current customers (CUSTOMERS).  I want to be able to compare all the prospect names against the Customer names and show me the records that match at least two words.  I would like to be able to use the same query to match additional words if needed (up to 4).  This is an urgent project so I will appreciate all the help you experts can provide.

The structures of the tables are:
PROSPECT TABLE:
MCID (float, null)
CUSTOMER (nvarchar(255,null)  -- This field contains prospect name and needs to match 2 words against the Customer table

CUSTOMER TABLE:
MCID (float, null)
CUSTOMER (nvarchar(100,null)  -- This field contains customer names and needs  to match 2 words against the Prospect table

The query should show the following fields prospect name, customer name and 2 words matched
Avatar of sameer2010
sameer2010
Flag of India image

Does it have to be matched for similar MCID or has to be scanned all across?
Avatar of gpvanarsdale
gpvanarsdale

ASKER

No, I do not need to match MCIDs, just the names across the tables
Avatar of Kevin Cross
One solution would be to pull apart each CUSTOMER string with spaces as the delimiter, then count how many of these individual words match or appear in the other CUSTOMER string. This may not perform well. Therefore, before going down that path may I suggest you try something like SOUNDEX or DIFFERENCE.

It is not perfect, but can be useful. I had to match prospects to an existing list the other day, and I compared the SOUNDEX values (or DIFFERENCE equal 4) of the customer name as well as the street address within the same City/State. In other words, if you have other unique attributes for each list, you can use it to make the SOUNDEX/DIFFERENCE comparison close to 100% accurate.
Could you show me an example?
For the SOUNDEX/DIFFERENCE solution:
;WITH prospects(MCID, CUSTOMER) AS (
    SELECT CONVERT(FLOAT, 1), CONVERT(NVARCHAR(255), 'John Smith Co')
    UNION SELECT 2, 'This Company Has Five Words'
    UNION SELECT 3, 'Another Company'
), customers(MCID, CUSTOMER) AS (
    SELECT CONVERT(FLOAT, 10), CONVERT(NVARCHAR(255), 'John & Jane Smith Co')
    UNION SELECT 12, 'A Company With Five Words'
    UNION SELECT 31, 'Another Company'
)
SELECT c.MCID AS CustomerID, c.CUSTOMER AS CustomerName
     , p.MCID AS ProspectID, p.CUSTOMER AS ProspectName
FROM customers c
JOIN prospects p 
  ON DIFFERENCE(p.CUSTOMER, c.CUSTOMER) = 4
;

Open in new window


Note it does not match 'A Company With Five Words' and 'This Company Has Five Words' because they sound like different companies despite the match on three words. Again, in my experience this was preferable but may be an issue for you.

For the other solution, you could start with a function similar to the one explained by Brandon in the following article:
https://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/SQL-Server-2005/A_192-Delimited-String-Parsing-in-SQL-Server-2005-and-later.html

Instead of a comma as the delimiter, you would use a space (' '). Once the values are in a table, you can use aggregates and normal JOINs. For example, you could denormalize both tables into new tables with MCID, CustomerWord then JOIN the tables on CustomerWord and count how many times the same two MCID values appear together.
I am not able to make it work.  Could you show me the code if I had the tables and information below?
Since I want to find names of prospects in Prospect table that matches at least 2 words in Customer tables, the resulting query should show that the prospect tables has a 2 word match for record 1 (matches 2 words on record 3 of Customer table) and record 2 matches 2 words on record 2 of Customer table)

Prospects Table                                             Customer Table
Record 1                                                           Record 1
Customer:  John Smith Co                          Customer:  Mary Romero
Record 2                                                           Record 2
Customer:  John and Mary Smith                Customer: Mary Smith & Associates
Record 3                                                           Record 3:
Customer:  Mary Jones                               Customer: John Smith
Use this. You can modify @matchcount to suit your needs. PS: The data contains customer as common word. So when you actually want to compare for 2 matching words, you should use 3 as the match count. I could very well strip off Customer word , but was not sure if you have given this data for understanding purpose or it actually includes word Customer. Anyways, here is the script:
declare @matchcount int=2
declare @customer table(name varchar(100))
insert into @customer values('Customer:  Mary Romero')
insert into @customer values('Customer: Mary Smith & Associates')
insert into @customer values('Customer: John Smith')
declare @prospects table(name varchar(100))
insert into @prospects values('Customer:  John Smith Co')
insert into @prospects values('Customer:  John and Mary Smith')
insert into @prospects values('Customer:  Mary Jones')
;with 
custref as(
SELECT row_number() over(order by name) as cint,name from @customer
),
cust as(
select cint,display_term
 FROM custref
CROSS APPLY sys.dm_fts_parser('"' + name + '"', 1033, 0,0)
),
prosp as(
select t.pint,display_term
 FROM (SELECT row_number() over(order by name) as pint,name from @prospects)t
CROSS APPLY sys.dm_fts_parser('"' + name + '"', 1033, 0,0)
) 
select distinct f.name
from cust a, custref f 
	where exists(
	select 1 from prosp b where b.display_term=a.display_term
	and @matchcount <= 
	(select count(c.display_term) from cust c, prosp d where c.display_term=d.display_term
	and c.cint=a.cint and d.pint=b.pint)
	and a.cint=f.cint
)

Open in new window

Sameer when I run your script I get the message below.  Any suggestions?

Msg 7609, Level 17, State 5, Line 10
Full-Text Search is not installed, or a full-text component cannot be loaded.
Can you get fulltext feature installed? Turn on the fulltext search service as well as enable it through advanced configuration.
I do not have rights to install it and it is grayed out on the table
What does
SELECT FULLTEXTSERVICEPROPERTY ('IsFulltextInstalled') 

Open in new window

return?
Thank you for jumping on @sameer2010! Full-text search is what I thought too, but got pulled away. I think it will perform better than pulling the text apart with function.
Sameer2020,

I checked with IT and they do not have FULL-TEXT SEARCH installed and they do not plan to.  Any other solutions without using full-text search?
Looking at the suggestion above, I see it was just using the parser from the full-text search. If that is the case, you can replace that function with Brandon's in my comment above.
https://www.experts-exchange.com/questions/28231718/Query-to-compare-two-tables-and-match-at-least-2-words.html?anchorAnswerId=39468068#a39468068

It converts the text into a table, so the CROSS APPLY should still work the same.
ASKER CERTIFIED SOLUTION
Avatar of sameer2010
sameer2010
Flag of India image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Sameer,

I will try this weekend and let you know.  Thanks for the code.