?
Solved

logic to make unique abbreviations of a column

Posted on 2011-10-03
12
Medium Priority
?
355 Views
Last Modified: 2012-08-13
could you suggest code logic that would work down on a column to make it abbreviated into specific characters, but keep it unique?
0
Comment
Question by:25112
  • 7
  • 4
12 Comments
 
LVL 5

Author Comment

by:25112
ID: 36907501
here is an example..

declare @test table (column1 varchar(100))

insert into @test  
select 	'A   NOT/AV'	union
select 	'A AMERICA'	union
select 	'A B ART'	union
select 	'A NOT/AVL'	union
select 	'A O SMITH'	union
select 	'AAMER'	union
select 	'ABAKKA'	union
select 	'ACCENT'	union
select 	'ACCENTTREA'	union
select 	'ACORN TRAI'	union
select 	'ADJUST'	union
select 	'ADMIRAL'	union
select 	'AIR TEMP'	union
select 	'ALBION'	union
select 	'ALEXANDER'	union
select 	'AMANA'	union
select 	'AMER HEIR'	union
select 	'AMERICAN'	union
select 	'AMES'	union
select 	'AMETEK'	union
select 	'AMISCO'	union
select 	'AOG'	union
select 	'AQUA MATIC'	union
select 	'ARTAGE'	union
select 	'ASEA'	union
select 	'ASHLEN'	union
select 	'ASHLEY'	union
select 	'ASKO'	union
select 	'ASPEN'	union
select 	'ASSURANT'	union
select 	'AUTOTROL'	union
select 	'AVANTI'	union
select 	'B&B CHLOR'	union
select 	'BARCALOUNG'	union
select 	'BARD'	union
select 	'BAROQUE'	union
select 	'BEANPOD'	union
select 	'BEECHDALE'	union
select 	'BELLECRAFT'	union
select 	'BEMIS'	union
select 	'BERRYHILL'	union
select 	'BERTAZZONI'	union
select 	'BEST'	union
select 	'BFGINC'	union
select 	'BLANCO'	union
select 	'BLOMBERG'	union
select 	'BLUESTAR'	union
select 	'BOSCH'	union
select 	'BOYDS'	union
select 	'BRADFORD'	union
select 	'BRADINGTON'	union
select 	'BREEZYACRE'	union
select 	'BRIDGEWATE'	union
select 	'BROAN'	union
select 	'BROOKS'	union
select 	'BROWN'	union
select 	'BUTLER'	union
select 	'CALORIC'	union
select 	'CANDLEBARN'	union
select 	'CANDYM'	union
select 	'CAPEL'	union
select 	'CAPITAL'	union
select 	'CAROLINA'	union
select 	'CARPENTER'	union
select 	'CARPENTREE'	union
select 	'CARRIER'	union
select 	'CATNAPPER'	union
select 	'CBK'	union
select 	'CDM OUTLET'	union
select 	'CELADON'	union
select 	'CHEM-TECH'	union
select 	'CHEST/CF'	union
select 	'CHESTS UNL'	union
select 	'CHRISTOPHE'	union
select 	'CHROM'	union
select 	'CLASSIC'	union
select 	'CLAYTON'	union
select 	'COLONIAL H'	union
select 	'COLORSHOP'	union
select 	'COMFDESIGN'	union
select 	'COMPLEMENT'	union
select 	'CONNER'	union
select 	'COUNDESIGN'	union
select 	'COUNTRY'	union
select 	'COUNTRYV'	union
select 	'CRAFTMAST'	union
select 	'CRESCENT'	union
select 	'CRESTVIEW'	union
select 	'CROSLEY'	union
select 	'CRYSTAL CL'	union
select 	'CULLIGAN'	union
select 	'D&C CORP'	union
select 	'DACOR'	union
select 	'DALYN'	union
select 	'DANBY'	union
select 	'DANIELLAMP'	union
select 	'DCS'	union
select 	'DELONGHI'	union
select 	'DIMPLEX'	union
select 	'DIMPLEXE'	union
select 	'DIVA'	union
select 	'DONIQUE'	union
select 	'DUCANE'	union
select 	'DURAFLAME'	union
select 	'EASTERNACC'	union
select 	'ECHOES'	union
select 	'EDEN PURE'	union
select 	'EDRICH'	union
select 	'ELECTRICAL'	union
select 	'ELECTROLUX'	union
select 	'ELITE'	union
select 	'ELMCREST'	union
select 	'ELMIRA'	union
select 	'ELRAN'	union
select 	'EMERSON'	union
select 	'ENGLAND'	union
select 	'ENGLANDER'	union
select 	'ERIE'	union
select 	'EUROTECH'	union
select 	'EVERPURE'	union
select 	'EXPRESSION'	union
select 	'EZGLIDECO'	union
select 	'FABER'	union
select 	'FAIRFIELD'	union
select 	'FAMILY'	union
select 	'FEDDERS'	union
select 	'FISHERPAYK'	union
select 	'FIVE STAR'	union
select 	'FLECK'	union
select 	'FLEXBED'	union
select 	'FRANKE'	union
select 	'FRANKLIN'	union
select 	'FRIEDRICH'	union
select 	'FRIGIDAIRE'	union
select 	'FRIGO'	union
select 	'FUJITSU'	union
select 	'GAGGENAU'	union
select 	'GARLAND'	union
select 	'GARMIN'	union
select 	'GAYLA''S'	union
select 	'GE'	union
select 	'GEM'	union
select 	'GENERAL'	union
select 	'GIBSON'	union
select 	'GIFTCRAFT'	union
select 	'GOLDSTAR'	union
select 	'GOODMAN'	union
select 	'GOULDS'	union
select 	'GRAHAMDUNN'	union
select 	'GRANDHALL'	union
select 	'GRILLI'	union
select 	'GRISTMILL'	union
select 	'GROFF'	union
select 	'GUARDSMAN'	union
select 	'H'	union
select 	'HACH'	union
select 	'HAIER'	union
select 	'HALIANT'	union
select 	'HALSEY TAY'	union
select 	'HAMILTON'	union
select 	'HARDEN'	union
select 	'HARDWICK'	union
select 	'HARMSCO'	union
select 	'HE ANDERSO'	union
select 	'HERITAGE'	union
select 	'HERITAGE L'	union
select 	'HERITAGE W'	union
select 	'HERITAGEDI'	union
select 	'HOLLAND'	union
select 	'HOLTON'	union
select 	'HOMESPICE'	union
select 	'HOMESTEAD'	union
select 	'HOOKER'	union
select 	'HOOVER'	union
select 	'HORNING'	union
select 	'HOTPOINT'	union
select 	'HUBBARTON'	union
select 	'HUEBSCH'	union
select 	'HWCHAIR'	union
select 	'IDEAL HORI'	union
select 	'IMAX'	union
select 	'INSINKERAT'	union
select 	'INSTALL'	union
select 	'INTERACTIV'	union
select 	'INTERIOREX'	union
select 	'IRON-A-WAY'	union
select 	'ISTA'	union
select 	'J C PENNY'	union
select 	'J EDWARD'	union
select 	'JACKFURN'	union
select 	'JANDAL'	union
select 	'JENN-AIR'	union
select 	'JWEAVERQU'	union
select 	'KALALOU'	union
select 	'KELLER'	union
select 	'KELVINATOR'	union
select 	'KENMORE'	union
select 	'KENNEBUNK'	union
select 	'KEYSTONE'	union
select 	'KEYTON'	union
select 	'KICHLER'	union
select 	'KINCAID'	union
select 	'KINETICO'	union
select 	'KITCHENAID'	union
select 	'KLAUSSNER'	union
select 	'KOHLER'	union
select 	'LAMOTTE'	union
select 	'LANCASTER'	union
select 	'LANE'	union
select 	'LANG'	union
select 	'LATEX'	union
select 	'LDBEAN'	union
select 	'LEA'	union
select 	'LEGGETT'	union
select 	'LG'	union
select 	'LIEBHERR'	union
select 	'LINDSAY'	union
select 	'LITTLE VAL'	union
select 	'LITTON'	union
select 	'LIVINGSTON'	union
select 	'LMI'	union
select 	'LYNX'	union
select 	'MADISON'	union
select 	'MAGICCHEF'	union
select 	'MAGNATROL'	union
select 	'MAGNUSSEN'	union
select 	'MANUEL'	union
select 	'MAPP'	union
select 	'MARATHON'	union
select 	'MARFURN'	union
select 	'MARVEL'	union
select 	'MASTER'	union
select 	'MAXWELL WO'	union
select 	'MAYTAG'	union
select 	'MAZZEI'	union
select 	'MEDALLION'	union
select 	'MIAMI CARE'	union
select 	'MIDWEST'	union
select 	'MIELE'	union
select 	'MISC'	union
select 	'MOBEL'	union
select 	'MOBEL/CF'	union
select 	'MODERN MAI'	union
select 	'MOEN'	union
select 	'MONTGOMERY'	union
select 	'NATURES'	union
select 	'NEEDLE'	union
select 	'NEWCENTURY'	union
select 	'NORCOLD'	union
select 	'NORGE'	union
select 	'NORWESCO'	union
select 	'NULLFURN'	union
select 	'NW'	union
select 	'OASIS'	union
select 	'OCI'	union
select 	'OHIO TABLE'	union
select 	'PACIFICRIM'	union
select 	'PANASONIC'	union
select 	'PARK DES'	union
select 	'PARKER SO'	union
select 	'PAUL B ZIM'	union
select 	'PEERLESS'	union
select 	'PELL-CHLOR'	union
select 	'PENTEK'	union
select 	'PEOPLOUNGE'	union
select 	'PERFORMANC'	union
select 	'PERLICK'	union
select 	'PETERSREV'	union
select 	'PICTUREIT'	union
select 	'PIERI'	union
select 	'PILGRIMSPR'	union
select 	'PLUMBING'	union
select 	'POMROY'	union
select 	'POULAN'	union
select 	'POWELL'	union
select 	'PREMIER'	union
select 	'PRESTIGE'	union
select 	'PRICEPFIST'	union
select 	'PROMINENT'	union
select 	'PROPAC'	union
select 	'PULASKI'	union
select 	'PURETEST'	union
select 	'QUASAR'	union
select 	'QUOIZEL'	union
select 	'RAINSOFT'	union
select 	'RANGE AIRE'	union
select 	'RANGEAIRE'	union
select 	'RCA'	union
select 	'REFERRAL'	union
select 	'RELAXOLOUN'	union
select 	'RENWIL'	union
select 	'RESTONIC'	union
select 	'REVCO'	union
select 	'REVERE'	union
select 	'ROPER'	union
select 	'ROWEFURN'	union
select 	'RUSCO'	union
select 	'RUSSELL'	union
select 	'SADEK'	union
select 	'SALEM'	union
select 	'SAMMOORE'	union
select 	'SAMSUNG'	union
select 	'SANDERS'	union
select 	'SANITRON'	union
select 	'SANYO'	union
select 	'SCOTSMAN'	union
select 	'SEARS'	union
select 	'SERTA'	union
select 	'SERVICESOF'	union
select 	'SHARP'	union
select 	'SHAW'	union
select 	'SHURFLO'	union
select 	'SIEMENS'	union
select 	'SIMMONS'	union
select 	'SJ ELECTRO'	union
select 	'SLIGH'	union
select 	'SOMERSET'	union
select 	'SOUTH TEX'	union
select 	'SPEED QUEE'	union
select 	'STABER'	union
select 	'STANLEY'	union
select 	'STARITE'	union
select 	'STATE'	union
select 	'STEINWORLD'	union
select 	'STENNER'	union
select 	'STERLING'	union
select 	'STRUCTURAL'	union
select 	'STUARTFURN'	union
select 	'STYLECRAFT'	union
select 	'SUB-ZERO'	union
select 	'SULLINS'	union
select 	'SUMMIT'	union
select 	'SUN RAY'	union
select 	'SUNBEAM'	union
select 	'SUNNYDESIG'	union
select 	'SUNROC'	union
select 	'SUPER/CF'	union
select 	'SUPERIOR'	union
select 	'TACHING'	union
select 	'TADIRAN'	union
select 	'TAPPAN'	union
select 	'TECHNETIC'	union
select 	'TEI'	union
select 	'TEMPLE'	union
select 	'TEMPO'	union
select 	'TENDER HRT'	union
select 	'THERMADOR'	union
select 	'TOMKINKADE'	union
select 	'TOWN HALL'	union
select 	'TOWN SQUAR'	union
select 	'TOWNES/CF'	union
select 	'TRADE'	union
select 	'TRAEGER'	union
select 	'TREEMASTER'	union
select 	'TRIP'	union
select 	'TROJAN'	union
select 	'TWINSTAR'	union
select 	'U-LINE'	union
select 	'ULTRASHIEL'	union
select 	'UMA ENTERP'	union
select 	'UNIVERSAL'	union
select 	'USED'	union
select 	'VALLEYVIEW'	union
select 	'VAUGHAN'	union
select 	'VAUGHN'	union
select 	'VENMAR'	union
select 	'VENT A HOO'	union
select 	'VESTA'	union
select 	'VIKING'	union
select 	'VINEYARD'	union
select 	'VORNADO'	union
select 	'WALKER'	union
select 	'WARRANTY'	union
select 	'WASTE KING'	union
select 	'WATERFORD'	union
select 	'WATR FACT'	union
select 	'WAVERLY'	union
select 	'WEBER'	union
select 	'WELBUILT'	union
select 	'WELL MATE'	union
select 	'WELLXTROL'	union
select 	'WESLEYALL'	union
select 	'WESTINGHSE'	union
select 	'WHIRLPOOL'	union
select 	'WINDCREST'	union
select 	'WOLF'	union
select 	'WOODCRAFT'	union
select 	'WOODMARC'	union
select 	'WOODS'	union
select 	'Y&TWOODCR'	union
select 	'ZEPHYR'	

select COUNT(*) from @test
select count(distinct column1) from @test
select count(distinct left(column1,4)) from @test

Open in new window

0
 
LVL 5

Author Comment

by:25112
ID: 36907510
there are 391 entries.. but when you put distinct on left (column1,4), it becomes down to 356. how you would work around to make sure all 391 have unique abbreviations? (preferably take out the spaces)
0
 
LVL 5

Author Comment

by:25112
ID: 36907536
the goal is to keep the abbreviation close to the original but there is no strict rules how it should be done.. and that entries should have a unique abbreviation.
0
Configuration Guide and Best Practices

Read the guide to learn how to orchestrate Data ONTAP, create application-consistent backups and enable fast recovery from NetApp storage snapshots. Version 9.5 also contains performance and scalability enhancements to meet the needs of the largest enterprise environments.

 
LVL 9

Expert Comment

by:chwong67
ID: 36907993
select count(distinct replace(column1,' ', '')) from @test
0
 
LVL 5

Author Comment

by:25112
ID: 36908689
we need to abbreviate to 4 characters..

so
select count(distinct left(replace(column1,' ', ''),4)) from @test
is only 353.
0
 
LVL 5

Author Comment

by:25112
ID: 36910245
these are the records where there is duplicates.. is there a logic to make it iterate and find a unique  abbreviation from the source?
column1	column2
AAMER	AAME
A AMERICA	AAME
ACCENT	ACCE
ACCENTTREA	ACCE
AMER HEIR	AMER
AMERICAN	AMER
A NOT/AVL	ANOT
A   NOT/AV	ANOT
ASHLEN	ASHL
ASHLEY	ASHL
BRADFORD	BRAD
BRADINGTON	BRAD
CANDLEBARN	CAND
CANDYM	CAND
CARPENTER	CARP
CARPENTREE	CARP
CHEST/CF	CHES
CHESTS UNL	CHES
COLONIAL H	COLO
COLORSHOP	COLO
COUNDESIGN	COUN
COUNTRY	COUN
COUNTRYV	COUN
CRESCENT	CRES
CRESTVIEW	CRES
DIMPLEX	DIMP
DIMPLEXE	DIMP
ELECTRICAL	ELEC
ELECTROLUX	ELEC
ENGLAND	ENGL
ENGLANDER	ENGL
FRANKE	FRAN
FRANKLIN	FRAN
FRIGIDAIRE	FRIG
FRIGO	FRIG
HARDEN	HARD
HARDWICK	HARD
HERITAGE	HERI
HERITAGE L	HERI
HERITAGE W	HERI
HERITAGEDI	HERI
HOMESPICE	HOME
HOMESTEAD	HOME
INTERACTIV	INTE
INTERIOREX	INTE
LITTLE VAL	LITT
LITTON	LITT
MAGNATROL	MAGN
MAGNUSSEN	MAGN
MOBEL	MOBE
MOBEL/CF	MOBE
PARK DES	PARK
PARKER SO	PARK
RANGE AIRE	RANG
RANGEAIRE	RANG
SUNROC	SUNR
SUN RAY	SUNR
SUPER/CF	SUPE
SUPERIOR	SUPE
TEMPLE	TEMP
TEMPO	TEMP
TOWN HALL	TOWN
TOWN SQUAR	TOWN
TOWNES/CF	TOWN
VAUGHAN	VAUG
VAUGHN	VAUG
WELL MATE	WELL
WELLXTROL	WELL
WOODCRAFT	WOOD
WOODMARC	WOOD
WOODS	WOOD

Open in new window

0
 
LVL 60

Expert Comment

by:Kevin Cross
ID: 36913604
I would maybe go with first initial of each word, then if not enough words are found then use the first and last of the last word. It may eliminate some of the conflicts.
0
 
LVL 5

Author Comment

by:25112
ID: 36913611
can you give an example with the below:

FRANKE      FRAN
FRANKLIN      FRAN

how does your logic work for the above?
0
 
LVL 60

Accepted Solution

by:
Kevin Cross earned 2000 total points
ID: 36913788
Here is an example.

Take a look at this: http://www.experts-exchange.com/Q_23786715.html
I would look at Brandon's solution, though mine works well. See his article for latest version and explanation: http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/SQL-Server-2005/A_192-Delimited-String-Parsing-in-SQL-Server-2005-and-later.html

Using Brandon's approach, you will see a requirement for a numbers table or view. Here is an approach for a concrete util_nums table which comes in handy:
create table util_nums(n int not null)
go
;with   cte0 as (select 1 as c union all select 1),
       cte1 as (select 1 as c from cte0 a, cte0 b),
       cte2 as (select 1 as c from cte1 a, cte1 b),
       cte3 as (select 1 as c from cte2 a, cte2 b),
       cte4 as (select 1 as c from cte3 a, cte3 b),
       cte5 as (select 1 as c from cte4 a, cte4 b),
       nums as (select row_number() over (order by c) as n from cte5)
 insert into util_nums(n)
       select n from nums
       where n <= 1000000
;
alter table util_nums add constraint 
   pk_util_nums primary key clustered(n)with fillfactor =100

Open in new window

Source: http://www.sqlservernation.com/home/the-numbers-table-round-2.html
See source article for further explanation.

Since I am opting for a physical util_nums table, please run [dbo].[fn_DelimitedToTable] DDL script Brandon provided with the minor change from dbo.vw_Nums to dbo.util_Nums.

With both of those created, I am only getting 5 collisions out of 391 with this, but am certain you can tweak the logic based on what you find an appropriate algorithm. Just wanted to show you how to apply the split function in the solution along with some assistance from ranking - http://www.experts-exchange.com/Programming/Languages/SQL_Syntax/A_1555-Analytical-SQL-Where-do-you-rank.html.

Using @table you declared above as test:
create table #words(sentence varchar(100), word varchar(100), pos int, pos_rev int, primary key(sentence, pos));

-- inserts each column1 value multiple times based on individual words determined by spaces
-- pos = position in original column1 value
-- pos_rev = position from the back, i.e., 1 == last word
insert into #words(sentence, word, pos, pos_rev)
select column1, theValue
     , row_number() over(partition by column1 order by ident)
	 , row_number() over(partition by column1 order by ident desc)
from @test
cross apply dbo.fn_DelimitedToTable(column1, ' ')
where theValue <> ''
;

-- take first letter of first word
-- take first letter of second word or second letter of first if only one
-- take middle character of last word (could add a join for third word)
-- take last letter of last word (could add a join for fourth word)
select w1.sentence,
       left(w1.word, 1)
	   +coalesce(left(w2.word, 1), substring(w1.word, 2, 1))
	   +substring(we.word, (len(we.word)+1)/2, 1)
	   +right(we.word, 1) as abbrv
from #words w1
join #words we on we.sentence = w1.sentence and we.pos_rev = 1
left join #words w2 on w2.sentence = w1.sentence and w2.pos = 2
where w1.pos = 1
order by abbrv
;

drop table #words;

Open in new window


I used the temp table to help with performance of the cross apply and then multiple self-joins. As indicated in the code, I did not show with 3rd and 4th word table joins, but could easily do similar coalesce() to grab left(w3.word, 1) or some other character, and so on.

Hope that helps!
0
 
LVL 60

Expert Comment

by:Kevin Cross
ID: 36913796
With your two test cases, the above algorithm returns:

FRANKE = FRAE
FRANKLIN = FRNN

Kevin
0
 
LVL 60

Expert Comment

by:Kevin Cross
ID: 36913852
As an enhancement, you can strip out special characters like the '/' and '&' and make those spaces which gives you more words.

dbo.fn_DelimitedToTable(replace(replace(column1, '/', ' '), '&', ''), ' ')

Also, if you data may just contain one letter, this is a problem:
substring(w1.word, 2, 1)

I tried a variation with replacing '/' and '&' as well as adding another join as #words w3 -- I can get to 388 distinct values, so again if you play with this based on other rules about your data you know to be true, i.e., if always X length guaranteed, then more substring(word, n, 1) formulas can be used without causing errors in nonexistent index passed to substring.

Hope that all makes sense.

Kevin
0
 
LVL 5

Author Comment

by:25112
ID: 36927757
truly helped.. thanks Kevin.
0

Featured Post

Veeam and MySQL: How to Perform Backup & Recovery

MySQL and the MariaDB variant are among the most used databases in Linux environments, and many critical applications support their data on them. Watch this recorded webinar to find out how Veeam Backup & Replication allows you to get consistent backups of MySQL databases.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

It is possible to export the data of a SQL Table in SSMS and generate INSERT statements. It's neatly tucked away in the generate scripts option of a database.
MSSQL DB-maintenance also needs implementation of multiple activities. However, unprecedented errors can hamper the database management. In that case, deploying Stellar SQL Database Toolkit ensures fast and accurate database and backup repair as wel…
Viewers will learn how to use the INSERT statement to insert data into their tables. It will also introduce the NULL statement, to show them what happens when no value is giving for any given column.
With just a little bit of  SQL and VBA, many doors open to cool things like synchronize a list box to display data relevant to other information on a form.  If you have never written code or looked at an SQL statement before, no problem! ...  give i…
Suggested Courses

864 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question