Help with SELECT to find duplicate records

I have the start of a query--but it's not complete (this is where I need help)--that finds duplicate records based on the criteria below.  Basically, I need to identify any companies where the company name, address1, and city are similar and the state and zip code are the same....My criteria and the start of the query are below, but I have no idea how to add in the rest.  Help?

LIKE: Company name (comp_name)
LIKE: Address 1 (addr_address1)
LIKE: City (addr_city)
EQUAL: State (addr_state)
EQUAL: Zip code (addr_postcode)

SELECT Comp_Name, Addr_Address1, Addr_City, Addr_State, Addr_PostCode, Count(*)
FROM vSearchListCompany
WHERE  Comp_Deleted is null
GROUP BY Comp_Name, Addr_Address1, Addr_City, Addr_State, Addr_PostCode
HAVING Count(*) > 1
saturationAsked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Jim HornMicrosoft SQL Server Developer, Architect, and AuthorCommented:
>where the company name, address1, and city are similar
Define 'are similar', as you can't group based on a LIKE comparison.

Otherwise, lose the COUNT(*)
SELECT Comp_Name, Addr_Address1, Addr_City, Addr_State, Addr_PostCode,
FROM vSearchListCompany
WHERE  Comp_Deleted is null
GROUP BY Comp_Name, Addr_Address1, Addr_City, Addr_State, Addr_PostCode
HAVING Count(Addr_PostCode) > 1

Open in new window

0
saturationAuthor Commented:
I can lose the COUNT(*).   Also, "are similar" would be where the company name, for example, is like any other company name in the system using the LIKE statement and % symbols.  Does that help?  How would I structure the query that way without the count?
0
Jim HornMicrosoft SQL Server Developer, Architect, and AuthorCommented:
>Also, "are similar" would be where the company name, for example, is like any other company name in the system using the LIKE statement and % symbols.  Does that help?
No.  Again, you can't GROUP BY using a LIKE.  

Would Microsoft be like Microstrategy?  
Would Experts Exchange be like Expert Sex Change?
What criteria would you use to define 'like'?
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

saturationAuthor Commented:
I'm unsure, actually--never written a query like this. Any rules you would suggest starting with looking for similar company names?   The company names would be typed in by human beings, we may ignore any "inc.","co.", or "LLC" at the end...Does that help?  I guess that's a start.
0
Jim HornMicrosoft SQL Server Developer, Architect, and AuthorCommented:
>we may ignore any "inc.","co.", or "LLC" at the end...Does that help?
That's a start.  You'll have to define this in excruitiatingly clear terms, as SQL has no clue if inc is the same as co.

I believe SSIS has a Fuzzy Grouping task that performs this well
T-SQL SOUNDEX also shows some promise, but I haven't used either of these so I can't comment on it.

Otherwise, If we're just talking 'give me the count of rows where company name fragments in table a are found in the company name of table b, then using a subquery it would go like this
CREATE TABLE #a (company_name varchar(255))
CREATE TABLE #b (company_name varchar(255))
INSERT INTO #a (company_name) values ('goo'), ('foo'), ('boo'), ('bar'), ('baz')
INSERT INTO #b (company_name) values ('goofyriffic'), ('banana'), ('oo'), ('bazinga'), ('googoogaga')

SELECT a_name, COUNT(b_name) as the_count
FROM (
	SELECT #a.company_name as a_name, #b.company_name as b_name
	FROM #a 
		JOIN #b ON #b.company_name LIKE CONCAT('%', #a.company_name, '%')) a
GROUP BY a_name

Open in new window

0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
saturationAuthor Commented:
I just did a little bit of research on Fuzzy Grouping in SSIS and was able to get what I want--I had no idea anything like that existed, and it was rather easy.  Thank you so much for the recommendation and time!
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Microsoft SQL Server

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.