Consolidate the query result

erikTsomik
erikTsomik used Ask the Experts™
on
I have the query that return the result properly. What I am trying to do is to get the total per school. However, the schools have different IDs but this is basically the same schools. How can do some type of search to get the calculated as a same school.

	SELECT U.SchoolID,LS.SchoolNameTx
		FROM SessionMap SM
		inner join users u on u.userKey = SM.userKey
		inner join lkup_school LS on LS.SchoolID = U.SchoolID
		order by U.SchoolID

Open in new window


The result returned:
SchoolID	SchoolNameTx
429	       Quince Orchard High School
1189	Walter Johnson High School (WJHS)
1228	Academy of the Holy Cross
1395	Academny of the Holy Cross
1395	Academny of the Holy Cross
2656	Academy of Holy Cross
3144	Academy of the Holy Cross High School (AHC)
3144	Academy of the Holy Cross High School (AHC)
3144	Academy of the Holy Cross High School (AHC)
3144	Academy of the Holy Cross High School (AHC)
3144	Academy of the Holy Cross High School (AHC)
3144	Academy of the Holy Cross High School (AHC)
3450	Academy  of the Holy Cross
4892	Stonehill College

Open in new window

Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Most Valuable Expert 2015

Commented:
What's an example of the expected results?
erikTsomikSystem Architect, CF programmer

Author

Commented:
the expected result is something like this

       Quince Orchard High School
	Walter Johnson High School (WJHS)
	Academy  of the Holy Cross
	Stonehill College

Open in new window

Most Valuable Expert 2015
Commented:
Forget that last response, I misread the question.

So you want to list similar schools under one name. What about the "total per school"? Do you also want to include a count(*) - in addition to the school names?
Should you be charging more for IT Services?

Do you wonder if your IT business is truly profitable or if you should raise your prices? Learn how to calculate your overhead burden using our free interactive tool and use it to determine the right price for your IT services. Start calculating Now!

erikTsomikSystem Architect, CF programmer

Author

Commented:
yes that is exactly what I am trying to accomplish. School and the count
Most Valuable Expert 2015
Commented:
There's a few ways you could write it, but something like this should work.  

If there were multiple groupings you want to merge, then creating a mapping table might be better.

SELECT nw.NewSchoolName, COUNT(*) AS TotalRows
FROM   SessionMap SM
		  INNER JOIN users u on u.userKey = SM.userKey
		  INNER JOIN lkup_school LS on LS.SchoolID = U.SchoolID
		  CROSS APPLY (
			    SELECT CASE WHEN LS.SchoolID IN (1228,1395,2656,3144,3450) 
						    THEN 'Academy of the Holy Cross'
						ELSE LS.SchoolNameTx
						END AS NewSchoolName
		) nw
GROUP BY nw.NewSchoolName
ORDER BY nw.NewSchoolName

Open in new window

erikTsomikSystem Architect, CF programmer

Author

Commented:
how can I make it more dynamic . In example below you can see so I am talking about the same schools

Academy of the Holy Cross  AHC	4
Academy of the Holy Cross (AHC)	29
Academy of the Holy Cross High School	1344
Academy of the Holy Cross High School (ACH)	251

Open in new window


Alber Einstein HS	24
Albert Einstein	4
Albert Einstein High School	314
Albert Einstein HS	4
Albert Einstien High School	375

Open in new window

erikTsomikSystem Architect, CF programmer

Author

Commented:
winston chirchill high school	29
Winston Chruchill High School	1842
Winston Churchihl High School	5
Winston Churchil High School	113
Winston Churchill	128
Winston churchill H.S.	214
Winston Churchill High	18
Winston Churchill High School	6107
Winston Churchill High School (WCHS)	3406
Winston Churchill High School Potomac MD	65
Winston Churchill High School, Potomac, MD	64
Winston Churchill Highschool	10
Winston Churchill HS	128
Winston Churchill HS in Potomac	88

Open in new window

Most Valuable Expert 2015

Commented:
For it to be dynamic but still manageable, there'd need to be more of a common pattern in all of the names, like starts with "ABC School%", but there isn't - not for all of them.  

If there's a bunch of these, I'd probably create some sort of grouping or mapping table instead.  That would make for cleaner SQL. Plus you could even create a web screen to add/remove mappings for easy maintenance (optional).
Olaf DoschkeSoftware Developer

Commented:
You may think of a function seeing two names are meaning the same just like you think. SQL Server is quite weak on string functions in general, as most any database is and things like SOUNDEX computing a representation of similarly pronounced words also won't help you to find the equality of names.

You could replace some common abbreviations like H.S., HS and HIGHSCHOOL and either completely remove them or always replace them with one of the ways they're written, perhaps the shortest one, to never get into trouble with prolonging names getting too long.

But in the end you shouldn't consolidate the result, you should consolidate the school name data in lkup_school.

A mapping table is ideal for this, as you can start out with automatism and never have to fear wrong consolidations, as you keep the data as is and only create mappings of several  IDs on lkup_school to point to a new consolidated record, that in the end means one more join in the final query, not limited to this query, you can reuse the mappings, once you have them, and once they are not just computed/estimated but manually confirmed, you may then even really consolidate the records in lkup_school itself and cascade foreign keys in relationships with other tables.

Ideally you'd know some norm and id/number of schools, that's intended for this scenario of data analysis, so you don't have to match data by names.

I guess your current lkup_school is just the result of taking every name very literally and there also is no simple mechanism like you have as a human to see the equivalence.

I see two stages of getting from lkup_school to a mapping table having pairs of (consolidatedSchoolId, lkup_SchoolID):

Step 1: Create a table tmpSchoolNames based on lkup_school.SchoolID and SchoolNameTx as ConsolidatedName
Step 2: Do several string replacements in SchoolNameTx like

UPDATE tmpSchoolNames SET SchoolNameTx = REPLACE(SchoolNameTx,"H.S.", "HS")

Open in new window


Then finally after you made such replacements of varying spellings of similar name parts you can simply get your consolidated data as SELECT min(SchoolID), ConsolidatedName grouped by ConsolidatedName.and at the same time have the list of mappings to all Ids having the same consolidated name.

As we have no idea how large your lkup_school table is, it's hard to recommend how much work to invest, you might never get a 100% solution or it is even simpler doing that manually, but I guess from the examples, there's a lot to go through and so you better find a strategy consolidation by such name parts or also disposable parts. A simple thing to do would drop off anything in parenthesis as unimportant not significant part of a name.

Bye, Olaf.

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial