25112
asked on
logic to make unique abbreviations of a column
could you suggest code logic that would work down on a column to make it abbreviated into specific characters, but keep it unique?
ASKER
there are 391 entries.. but when you put distinct on left (column1,4), it becomes down to 356. how you would work around to make sure all 391 have unique abbreviations? (preferably take out the spaces)
ASKER
the goal is to keep the abbreviation close to the original but there is no strict rules how it should be done.. and that entries should have a unique abbreviation.
select count(distinct replace(column1,' ', '')) from @test
ASKER
we need to abbreviate to 4 characters..
so
select count(distinct left(replace(column1,' ', ''),4)) from @test
is only 353.
so
select count(distinct left(replace(column1,' ', ''),4)) from @test
is only 353.
ASKER
these are the records where there is duplicates.. is there a logic to make it iterate and find a unique abbreviation from the source?
column1 column2
AAMER AAME
A AMERICA AAME
ACCENT ACCE
ACCENTTREA ACCE
AMER HEIR AMER
AMERICAN AMER
A NOT/AVL ANOT
A NOT/AV ANOT
ASHLEN ASHL
ASHLEY ASHL
BRADFORD BRAD
BRADINGTON BRAD
CANDLEBARN CAND
CANDYM CAND
CARPENTER CARP
CARPENTREE CARP
CHEST/CF CHES
CHESTS UNL CHES
COLONIAL H COLO
COLORSHOP COLO
COUNDESIGN COUN
COUNTRY COUN
COUNTRYV COUN
CRESCENT CRES
CRESTVIEW CRES
DIMPLEX DIMP
DIMPLEXE DIMP
ELECTRICAL ELEC
ELECTROLUX ELEC
ENGLAND ENGL
ENGLANDER ENGL
FRANKE FRAN
FRANKLIN FRAN
FRIGIDAIRE FRIG
FRIGO FRIG
HARDEN HARD
HARDWICK HARD
HERITAGE HERI
HERITAGE L HERI
HERITAGE W HERI
HERITAGEDI HERI
HOMESPICE HOME
HOMESTEAD HOME
INTERACTIV INTE
INTERIOREX INTE
LITTLE VAL LITT
LITTON LITT
MAGNATROL MAGN
MAGNUSSEN MAGN
MOBEL MOBE
MOBEL/CF MOBE
PARK DES PARK
PARKER SO PARK
RANGE AIRE RANG
RANGEAIRE RANG
SUNROC SUNR
SUN RAY SUNR
SUPER/CF SUPE
SUPERIOR SUPE
TEMPLE TEMP
TEMPO TEMP
TOWN HALL TOWN
TOWN SQUAR TOWN
TOWNES/CF TOWN
VAUGHAN VAUG
VAUGHN VAUG
WELL MATE WELL
WELLXTROL WELL
WOODCRAFT WOOD
WOODMARC WOOD
WOODS WOOD
I would maybe go with first initial of each word, then if not enough words are found then use the first and last of the last word. It may eliminate some of the conflicts.
ASKER
can you give an example with the below:
FRANKE FRAN
FRANKLIN FRAN
how does your logic work for the above?
FRANKE FRAN
FRANKLIN FRAN
how does your logic work for the above?
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
With your two test cases, the above algorithm returns:
FRANKE = FRAE
FRANKLIN = FRNN
Kevin
FRANKE = FRAE
FRANKLIN = FRNN
Kevin
As an enhancement, you can strip out special characters like the '/' and '&' and make those spaces which gives you more words.
dbo.fn_DelimitedToTable(replace(replace(column1, '/', ' '), '&', ''), ' ')
Also, if you data may just contain one letter, this is a problem:
substring(w1.word, 2, 1)
I tried a variation with replacing '/' and '&' as well as adding another join as #words w3 -- I can get to 388 distinct values, so again if you play with this based on other rules about your data you know to be true, i.e., if always X length guaranteed, then more substring(word, n, 1) formulas can be used without causing errors in nonexistent index passed to substring.
Hope that all makes sense.
Kevin
dbo.fn_DelimitedToTable(replace(replace(column1, '/', ' '), '&', ''), ' ')
Also, if you data may just contain one letter, this is a problem:
substring(w1.word, 2, 1)
I tried a variation with replacing '/' and '&' as well as adding another join as #words w3 -- I can get to 388 distinct values, so again if you play with this based on other rules about your data you know to be true, i.e., if always X length guaranteed, then more substring(word, n, 1) formulas can be used without causing errors in nonexistent index passed to substring.
Hope that all makes sense.
Kevin
ASKER
truly helped.. thanks Kevin.
ASKER
Open in new window