Query to compare two tables and match at least 2 words

I have two SQL tables.  1 table has the name of prospects (PROSPECTS) and the other table has the names of current customers (CUSTOMERS).  I want to be able to compare all the prospect names against the Customer names and show me the records that match at least two words.  I would like to be able to use the same query to match additional words if needed (up to 4).  This is an urgent project so I will appreciate all the help you experts can provide.

The structures of the tables are:
PROSPECT TABLE:
MCID (float, null)
CUSTOMER (nvarchar(255,null)  -- This field contains prospect name and needs to match 2 words against the Customer table

CUSTOMER TABLE:
MCID (float, null)
CUSTOMER (nvarchar(100,null)  -- This field contains customer names and needs  to match 2 words against the Prospect table

The query should show the following fields prospect name, customer name and 2 words matched
gpvanarsdaleAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

sameer2010Commented:
Does it have to be matched for similar MCID or has to be scanned all across?
0
gpvanarsdaleAuthor Commented:
No, I do not need to match MCIDs, just the names across the tables
0
Kevin CrossChief Technology OfficerCommented:
One solution would be to pull apart each CUSTOMER string with spaces as the delimiter, then count how many of these individual words match or appear in the other CUSTOMER string. This may not perform well. Therefore, before going down that path may I suggest you try something like SOUNDEX or DIFFERENCE.

It is not perfect, but can be useful. I had to match prospects to an existing list the other day, and I compared the SOUNDEX values (or DIFFERENCE equal 4) of the customer name as well as the street address within the same City/State. In other words, if you have other unique attributes for each list, you can use it to make the SOUNDEX/DIFFERENCE comparison close to 100% accurate.
0
Newly released Acronis True Image 2019

In announcing the release of the 15th Anniversary Edition of Acronis True Image 2019, the company revealed that its artificial intelligence-based anti-ransomware technology – stopped more than 200,000 ransomware attacks on 150,000 customers last year.

gpvanarsdaleAuthor Commented:
Could you show me an example?
0
Kevin CrossChief Technology OfficerCommented:
For the SOUNDEX/DIFFERENCE solution:
;WITH prospects(MCID, CUSTOMER) AS (
    SELECT CONVERT(FLOAT, 1), CONVERT(NVARCHAR(255), 'John Smith Co')
    UNION SELECT 2, 'This Company Has Five Words'
    UNION SELECT 3, 'Another Company'
), customers(MCID, CUSTOMER) AS (
    SELECT CONVERT(FLOAT, 10), CONVERT(NVARCHAR(255), 'John & Jane Smith Co')
    UNION SELECT 12, 'A Company With Five Words'
    UNION SELECT 31, 'Another Company'
)
SELECT c.MCID AS CustomerID, c.CUSTOMER AS CustomerName
     , p.MCID AS ProspectID, p.CUSTOMER AS ProspectName
FROM customers c
JOIN prospects p 
  ON DIFFERENCE(p.CUSTOMER, c.CUSTOMER) = 4
;

Open in new window


Note it does not match 'A Company With Five Words' and 'This Company Has Five Words' because they sound like different companies despite the match on three words. Again, in my experience this was preferable but may be an issue for you.

For the other solution, you could start with a function similar to the one explained by Brandon in the following article:
http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/SQL-Server-2005/A_192-Delimited-String-Parsing-in-SQL-Server-2005-and-later.html

Instead of a comma as the delimiter, you would use a space (' '). Once the values are in a table, you can use aggregates and normal JOINs. For example, you could denormalize both tables into new tables with MCID, CustomerWord then JOIN the tables on CustomerWord and count how many times the same two MCID values appear together.
0
gpvanarsdaleAuthor Commented:
I am not able to make it work.  Could you show me the code if I had the tables and information below?
Since I want to find names of prospects in Prospect table that matches at least 2 words in Customer tables, the resulting query should show that the prospect tables has a 2 word match for record 1 (matches 2 words on record 3 of Customer table) and record 2 matches 2 words on record 2 of Customer table)

Prospects Table                                             Customer Table
Record 1                                                           Record 1
Customer:  John Smith Co                          Customer:  Mary Romero
Record 2                                                           Record 2
Customer:  John and Mary Smith                Customer: Mary Smith & Associates
Record 3                                                           Record 3:
Customer:  Mary Jones                               Customer: John Smith
0
sameer2010Commented:
Use this. You can modify @matchcount to suit your needs. PS: The data contains customer as common word. So when you actually want to compare for 2 matching words, you should use 3 as the match count. I could very well strip off Customer word , but was not sure if you have given this data for understanding purpose or it actually includes word Customer. Anyways, here is the script:
declare @matchcount int=2
declare @customer table(name varchar(100))
insert into @customer values('Customer:  Mary Romero')
insert into @customer values('Customer: Mary Smith & Associates')
insert into @customer values('Customer: John Smith')
declare @prospects table(name varchar(100))
insert into @prospects values('Customer:  John Smith Co')
insert into @prospects values('Customer:  John and Mary Smith')
insert into @prospects values('Customer:  Mary Jones')
;with 
custref as(
SELECT row_number() over(order by name) as cint,name from @customer
),
cust as(
select cint,display_term
 FROM custref
CROSS APPLY sys.dm_fts_parser('"' + name + '"', 1033, 0,0)
),
prosp as(
select t.pint,display_term
 FROM (SELECT row_number() over(order by name) as pint,name from @prospects)t
CROSS APPLY sys.dm_fts_parser('"' + name + '"', 1033, 0,0)
) 
select distinct f.name
from cust a, custref f 
	where exists(
	select 1 from prosp b where b.display_term=a.display_term
	and @matchcount <= 
	(select count(c.display_term) from cust c, prosp d where c.display_term=d.display_term
	and c.cint=a.cint and d.pint=b.pint)
	and a.cint=f.cint
)

Open in new window

0
gpvanarsdaleAuthor Commented:
Sameer when I run your script I get the message below.  Any suggestions?

Msg 7609, Level 17, State 5, Line 10
Full-Text Search is not installed, or a full-text component cannot be loaded.
0
sameer2010Commented:
Can you get fulltext feature installed? Turn on the fulltext search service as well as enable it through advanced configuration.
0
gpvanarsdaleAuthor Commented:
I do not have rights to install it and it is grayed out on the table
0
sameer2010Commented:
What does
SELECT FULLTEXTSERVICEPROPERTY ('IsFulltextInstalled') 

Open in new window

return?
0
Kevin CrossChief Technology OfficerCommented:
Thank you for jumping on @sameer2010! Full-text search is what I thought too, but got pulled away. I think it will perform better than pulling the text apart with function.
0
gpvanarsdaleAuthor Commented:
Sameer2020,

I checked with IT and they do not have FULL-TEXT SEARCH installed and they do not plan to.  Any other solutions without using full-text search?
0
Kevin CrossChief Technology OfficerCommented:
Looking at the suggestion above, I see it was just using the parser from the full-text search. If that is the case, you can replace that function with Brandon's in my comment above.
http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/Q_28231718.html#a39468068

It converts the text into a table, so the CROSS APPLY should still work the same.
0
sameer2010Commented:
Try this. You can modify Customer: to suit your needs and also modify @matchcount to decide what level of granularity the two should be matched

declare @matchcount int=2
declare @customer table(name varchar(100))
insert into @customer values('Customer: Mary Romero')
insert into @customer values('Customer: Mary Smith & Associates')
insert into @customer values('Customer: John Smith')
declare @prospects table(name varchar(100))
insert into @prospects values('Customer: John Smith Co')
insert into @prospects values('Customer: John and Mary Smith')
insert into @prospects values('Customer: Mary Jones')
;with 
custref as(
SELECT row_number() over(order by name) as cint,name from @customer
),
prospect as(
SELECT row_number() over(order by name) as pint,name from @prospects
),
cust as(
select cint,CHARINDEX(' ',name,1) as cstart,name,SUBSTRING(name,1,CHARINDEX(' ',name,1)) namepart
from custref
union all
select a.cint,case when CHARINDEX(' ',a.name,cstart+1)=0 then len(a.name) else CHARINDEX(' ',a.name,cstart+1) end as cstart,a.name,
SUBSTRING(a.name,cstart+1,(case when CHARINDEX(' ',a.name,cstart+1)=0 then len(a.name) else CHARINDEX(' ',a.name,cstart+1) end)-cstart) namepart
from cust a, custref b
where a.cint=b.cint and a.cstart < len(a.name)
),
prosp as(
select pint,CHARINDEX(' ',name,1) as pstart,name,SUBSTRING(name,1,CHARINDEX(' ',name,1)) namepart
from prospect
union all
select a.pint,case when CHARINDEX(' ',a.name,pstart+1)=0 then len(a.name) else CHARINDEX(' ',a.name,pstart+1) end as cstart,a.name,
SUBSTRING(a.name,pstart+1,(case when CHARINDEX(' ',a.name,pstart+1)=0 then len(a.name) else CHARINDEX(' ',a.name,pstart+1) end)-pstart) namepart
from prosp a, prospect b
where a.pint=b.pint and a.pstart < len(a.name)
)
select distinct a.name
from cust a 
     where exists(
     select 1 from prosp b where b.namepart=a.namepart
	 and b.namepart<>'Customer:'
     and @matchcount <= 
     (select count(c.namepart) from cust c, prosp d where c.namepart=d.namepart
     and c.cint=a.cint and d.pint=b.pint and d.namepart<>'Customer:'))

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
gpvanarsdaleAuthor Commented:
Sameer,

I will try this weekend and let you know.  Thanks for the code.
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.