erikTsomik
asked on
Consolidate the query result
I have the query that return the result properly. What I am trying to do is to get the total per school. However, the schools have different IDs but this is basically the same schools. How can do some type of search to get the calculated as a same school.
The result returned:
SELECT U.SchoolID,LS.SchoolNameTx
FROM SessionMap SM
inner join users u on u.userKey = SM.userKey
inner join lkup_school LS on LS.SchoolID = U.SchoolID
order by U.SchoolID
The result returned:
SchoolID SchoolNameTx
429 Quince Orchard High School
1189 Walter Johnson High School (WJHS)
1228 Academy of the Holy Cross
1395 Academny of the Holy Cross
1395 Academny of the Holy Cross
2656 Academy of Holy Cross
3144 Academy of the Holy Cross High School (AHC)
3144 Academy of the Holy Cross High School (AHC)
3144 Academy of the Holy Cross High School (AHC)
3144 Academy of the Holy Cross High School (AHC)
3144 Academy of the Holy Cross High School (AHC)
3144 Academy of the Holy Cross High School (AHC)
3450 Academy of the Holy Cross
4892 Stonehill College
What's an example of the expected results?
ASKER
the expected result is something like this
Quince Orchard High School
Walter Johnson High School (WJHS)
Academy of the Holy Cross
Stonehill College
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
yes that is exactly what I am trying to accomplish. School and the count
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
how can I make it more dynamic . In example below you can see so I am talking about the same schools
Academy of the Holy Cross AHC 4
Academy of the Holy Cross (AHC) 29
Academy of the Holy Cross High School 1344
Academy of the Holy Cross High School (ACH) 251
Alber Einstein HS 24
Albert Einstein 4
Albert Einstein High School 314
Albert Einstein HS 4
Albert Einstien High School 375
ASKER
winston chirchill high school 29
Winston Chruchill High School 1842
Winston Churchihl High School 5
Winston Churchil High School 113
Winston Churchill 128
Winston churchill H.S. 214
Winston Churchill High 18
Winston Churchill High School 6107
Winston Churchill High School (WCHS) 3406
Winston Churchill High School Potomac MD 65
Winston Churchill High School, Potomac, MD 64
Winston Churchill Highschool 10
Winston Churchill HS 128
Winston Churchill HS in Potomac 88
For it to be dynamic but still manageable, there'd need to be more of a common pattern in all of the names, like starts with "ABC School%", but there isn't - not for all of them.
If there's a bunch of these, I'd probably create some sort of grouping or mapping table instead. That would make for cleaner SQL. Plus you could even create a web screen to add/remove mappings for easy maintenance (optional).
If there's a bunch of these, I'd probably create some sort of grouping or mapping table instead. That would make for cleaner SQL. Plus you could even create a web screen to add/remove mappings for easy maintenance (optional).
You may think of a function seeing two names are meaning the same just like you think. SQL Server is quite weak on string functions in general, as most any database is and things like SOUNDEX computing a representation of similarly pronounced words also won't help you to find the equality of names.
You could replace some common abbreviations like H.S., HS and HIGHSCHOOL and either completely remove them or always replace them with one of the ways they're written, perhaps the shortest one, to never get into trouble with prolonging names getting too long.
But in the end you shouldn't consolidate the result, you should consolidate the school name data in lkup_school.
A mapping table is ideal for this, as you can start out with automatism and never have to fear wrong consolidations, as you keep the data as is and only create mappings of several IDs on lkup_school to point to a new consolidated record, that in the end means one more join in the final query, not limited to this query, you can reuse the mappings, once you have them, and once they are not just computed/estimated but manually confirmed, you may then even really consolidate the records in lkup_school itself and cascade foreign keys in relationships with other tables.
Ideally you'd know some norm and id/number of schools, that's intended for this scenario of data analysis, so you don't have to match data by names.
I guess your current lkup_school is just the result of taking every name very literally and there also is no simple mechanism like you have as a human to see the equivalence.
I see two stages of getting from lkup_school to a mapping table having pairs of (consolidatedSchoolId, lkup_SchoolID):
Step 1: Create a table tmpSchoolNames based on lkup_school.SchoolID and SchoolNameTx as ConsolidatedName
Step 2: Do several string replacements in SchoolNameTx like
Then finally after you made such replacements of varying spellings of similar name parts you can simply get your consolidated data as SELECT min(SchoolID), ConsolidatedName grouped by ConsolidatedName.and at the same time have the list of mappings to all Ids having the same consolidated name.
As we have no idea how large your lkup_school table is, it's hard to recommend how much work to invest, you might never get a 100% solution or it is even simpler doing that manually, but I guess from the examples, there's a lot to go through and so you better find a strategy consolidation by such name parts or also disposable parts. A simple thing to do would drop off anything in parenthesis as unimportant not significant part of a name.
Bye, Olaf.
You could replace some common abbreviations like H.S., HS and HIGHSCHOOL and either completely remove them or always replace them with one of the ways they're written, perhaps the shortest one, to never get into trouble with prolonging names getting too long.
But in the end you shouldn't consolidate the result, you should consolidate the school name data in lkup_school.
A mapping table is ideal for this, as you can start out with automatism and never have to fear wrong consolidations, as you keep the data as is and only create mappings of several IDs on lkup_school to point to a new consolidated record, that in the end means one more join in the final query, not limited to this query, you can reuse the mappings, once you have them, and once they are not just computed/estimated but manually confirmed, you may then even really consolidate the records in lkup_school itself and cascade foreign keys in relationships with other tables.
Ideally you'd know some norm and id/number of schools, that's intended for this scenario of data analysis, so you don't have to match data by names.
I guess your current lkup_school is just the result of taking every name very literally and there also is no simple mechanism like you have as a human to see the equivalence.
I see two stages of getting from lkup_school to a mapping table having pairs of (consolidatedSchoolId, lkup_SchoolID):
Step 1: Create a table tmpSchoolNames based on lkup_school.SchoolID and SchoolNameTx as ConsolidatedName
Step 2: Do several string replacements in SchoolNameTx like
UPDATE tmpSchoolNames SET SchoolNameTx = REPLACE(SchoolNameTx,"H.S.", "HS")
Then finally after you made such replacements of varying spellings of similar name parts you can simply get your consolidated data as SELECT min(SchoolID), ConsolidatedName grouped by ConsolidatedName.and at the same time have the list of mappings to all Ids having the same consolidated name.
As we have no idea how large your lkup_school table is, it's hard to recommend how much work to invest, you might never get a 100% solution or it is even simpler doing that manually, but I guess from the examples, there's a lot to go through and so you better find a strategy consolidation by such name parts or also disposable parts. A simple thing to do would drop off anything in parenthesis as unimportant not significant part of a name.
Bye, Olaf.