Solved

v2008 - enhancements tSQL (i hope)

Posted on 2009-05-05
45
383 Views
Last Modified: 2012-05-06
the best means of converting a varchar(16) string to a unique integer
function or otherwise

any ideas?
0
Comment
Question by:dbaSQL
  • 17
  • 12
  • 11
  • +2
45 Comments
 
LVL 60

Expert Comment

by:chapmandew
ID: 24308979
what are you trying to accomplish?  have you tried newid()
0
 
LVL 17

Author Comment

by:dbaSQL
ID: 24309470
yep. unfortunately newid isn't going to work.  at least what i know of it, it won't work
i have a mammoth dataset.  way back when it was designed they were unable to give me integer values for the record ID attributes.  instead, we went with varchar(100).  lovely.
i got them to trim it down over time, but the overhead/performance and accuracy of this ID is crap.

they've finally agreed upstream to push me a numeric unique ID.  this applies, of course, to the new data going forward, once my new design is in place.  for the legacy data, i need to dump the ID string into a table, gen a unique ID per record in the existing dataset, and use this in my new ID attribute, which will be a bigint.

mammoth = 3B records

fortunately, i'm in process of moving to 08.  i've built a partitioned structure, and am feeling many of the gains on the new engine, in terms of performance/speed/manageability, etc.  

i have done some digging on cast/convert.  today i fell upon several docs on the 'tSQL enhancements in 08'.   i know there are ample functions simply to do the conversion.  but i am unsure of how to do so uniquely.  
0
 
LVL 17

Author Comment

by:dbaSQL
ID: 24309501
but it can't just be random unique number.  it must be converted from the existing ID strings, per record.  this will then be used to map the data to the old string, from the new front end, for reporting/analysis/troubleshooting...  that sort of thing
0
 
LVL 7

Accepted Solution

by:
simonsabin earned 250 total points
ID: 24310390
insert the text values into a table with an identity colmn to generate the new value.

That allows you to then any child tables from this mapping table

create table mapping (oldValue varchar(20) primary key, newValue int identity(1,1))

insert into mapping
select ....
from ...
0
 
LVL 17

Author Comment

by:dbaSQL
ID: 24311072
yep, i thought of that first thing.  i've been asked, however, to do something different.  in retro, however, i just don't see why the identity wouldn't work.  i will revoice that tomorrow, we'll see what i get.  i'm being asked to ' just convert that string to a 64bit integer '
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24311869
The most important consideration will be : "Is anything dependnant on the values already established", if the answer is no, then would be inclined to rename the old one if needed historically, and create a new identity column (bigint).
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24311887
And if creating a new identity column, then consider that being the primary clustered index - but with 3billion rows, it is going to take a while with those types of restructures. If the current "value" is already numeric, then could try simply changing it to type bigint and making it an identity. Could do a select with a convert to bigint to see if it crashes and bombs and that would be the acid test... Make sure you have plenty of tempdb space if changing structure like that.
0
 
LVL 7

Expert Comment

by:simonsabin
ID: 24311911
Its just not posible

100 chars = 100bytes
64 bit int = 8 bytes

8 <> 100

If you have values 8 chars or less then you could try

cast(cast('12345678' as varbinary(8))as bigint)

0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24312735
@simonsabin: correct, but it is kinda dependant on what is in that column more so than the characteristics, can always kill trailing spaces...

declare @char char(20)   -- doing 20 for visual size sake
set @char = '123456'

select convert(bigint,@char),'|'+@char+'|'
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24312750
and is not really limited to 8 bytes as such, it is limited to values that can be represented... e.g.


declare @char char(20)   -- doing 20 for size sake
set @char = '1234567890123456'

select convert(bigint,@char),'|'+@char+'|'
0
 
LVL 7

Expert Comment

by:simonsabin
ID: 24317245
You are assuming that he has numbers in his char column. By the fact he wants to generate integers suggests he doesn't have integers to start with.
Therefore converting 'ABCDEFGH'  a number is more dificult

The converting to varbinary works with character data as well.


select cast(cast(cast(cast('ABCEDFGH' as varbinary(8))as bigint) as varbinary(8))as varchar(10))
0
 
LVL 17

Author Comment

by:dbaSQL
ID: 24317476
Thank you each for the input.  And yes, simonsabin, you are correct.  The string is mixed alpha numeric.  

I'm forcing the issue on identity.  I am being fought, so for now, I still need to pursue the conversion.  Unfortunately, I still don't have it.

I pumped two records into a temp table.  these are the ID values:

717JHNSZD695
717JHNSZD696

select IDvalue, cast(cast(IDvalue as varbinary(8))as bigint)    from temptable gives this:
717JHNSZD695          3977020738125058906
717JHNSZD696          3977020738125058906

Note, the casted value is the same, yet the source ID differs.

and simonsabin, your last post returns:   717JHNSZ

Surely there is a means of converting the alpha numeric string to integer.  Uniquely?  Hard to say... but, just the conversion?
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24317574
need to increase your scale...

select IDvalue, cast(cast(IDvalue as varbinary(12))as bigint)    
from (
select '717JHNSZD695' as idvalue union all
select '717JHNSZD696') a
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24317585
What is the 'longest' example ?

can always do a conversion, it is a question of will it fit...
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24317767
Nah that doesn't convert, it is automatically taking the last 8 bytes (or 16 of the varbinary)

bigint does go up to a value of 9,223,372,036,854,775,807 so big enough to cover 3 billion rows, it just depends on the content of that column.

0
 
LVL 17

Author Comment

by:dbaSQL
ID: 24318181
by 'longest', do you just mean data length, mark?  the attribute needing conversion is char(16)
0
 
LVL 7

Expert Comment

by:simonsabin
ID: 24318764
Without some compression its not possible.

A bigint is just not big enough.

Each char represents a value from A-Z and 0-9 thats 36 characters (assuming not case sensitive and only numbers or letters).To hold a value of 36 requires 6bits. a bigint has 8bytes and so you have 64bits allowing for 10 6 bit values. So the max you can hold is 10 chars. If you only have A-Z then the size required reduces to 5 bits so you can hold 12 characters.

The mapping table with an identity column is the correct solution.

Especially if you have Enterprise edition and are using 2008 as you can compress your small bigint values to less than 8 bytes
0
 
LVL 51

Assisted Solution

by:Mark Wills
Mark Wills earned 250 total points
ID: 24319138
Well, true, but at 16 chars, and in the range of 0-9 and upper(A-Z) then each digit can be represented by it's two digit ascii() value and fit in decimal(32,0)

But agree, indentity is still the best way, just looking for options because of the internal fighting dbaSQL has to face.

Could create a function - would be horribly slow and a 32 digit number is horrid to think about in comparison to a discrete compact identity...
ALTER function udf_convert_to_number(@str varchar(100))

returns decimal(32,0)

as

begin
 

   declare @s varchar(100)

   declare @x varchar(100)

   declare @i int
 

   set @x = upper(rtrim(left(@str,16)))
 

   if patindex('%[^0-9A-Z]%',@x) > 0 return -1
 

   set @s = ''

   set @i = 1
 

   while @i <= len(@x)

   begin

      set @s = @s+convert(varchar,ascii(substring(@x,@i,1)))

      set @i = @i + 1

   end

   return @s

end

Open in new window

0
 
LVL 7

Expert Comment

by:simonsabin
ID: 24319273
A decimal 32 takes up 17 bytes which is 1 more than using char(16) which defeats the reason for coverting to an integer
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24319450
I don't care about size (well yes, I do), all I am doing is giving a possible solution to converting from char to numeric as per:

"I'm forcing the issue on identity.  I am being fought, so for now, I still need to pursue the conversion.  Unfortunately, I still don't have it. "

I think everyone is in total agreement that identity is the single best option, purpose built for this type of scenario, but there is a battle to be fought...

You would think that having two MVP's (though chapmandew has gone quiet), and a mere expert giving advice that it would back up a string case for identity...

Cheers,
Mark Wills (not an MVP)
0
 
LVL 17

Author Comment

by:dbaSQL
ID: 24319869
Yes, there is a battle to be fought... unfortunately.  I am, however, actively creating the identity.  I've already created the mapping table, ID, String;  the ID is bigint identity(1,1), the string is going in now in batches of a number i am embarrassed to post.  (i'm the only one on the box right... no biggie....)

but.  I am a little surprised... SQL is SQL.  as you all know, it can do some pretty crazy/clever things, far beyond simply storing the data.  I will be very surprised if this is truly un-doable.

the function i have not looked at yet.  horribly slow and horrid is a problem, given the magnitude of this dataset.   but.  i will keep working at it and see what i can do.

worst case - or best, depending on where you're sitting -- the identity 'conversion' is being created now
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 24321025
I have not followed all of this, but if bigint is not big enough you can always define a numeric(38, 0) field as an IDENTITY column, you can then have up to:
99,999,999,999,999,999,999,999,999,999,999,999,999
0
Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

 
LVL 7

Expert Comment

by:simonsabin
ID: 24321259
The whole point is that using a char(16) column as a PK is not good a it is large (16 bytes). Changing to a bigint results in an 8 byte PK.

Use of numeric(38,0) uses 17 Bytes and so is worse than using char(16)
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 24321503
Fair enough.  As I stated, I did not read it all through.  I just wanted to make sure that people were not under the illusion that only integers could be IDENTITY.
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24321996
The identity conversion should be "easy", but why is there a need to map ?

select indentity(bigint,1,1),  idvalue into my_idvalue_identity_map from my_blud_dee_big_table    -- (excuse the nomenclature - seemed to fit at the time)

As for the function in terms of 'horrid' refers to the resulting number, in terms of speed it depends on the content, try it out...

select top 30000 IDvalue, dbo.udf_convert_to_number(idvalue) from my_same_table_as_above    -- (see, context sensitive table names :)

But then there is the size problem, and being too big is not good...

Suppose you could run through that function, pickup the lowest calculated number and use that as number '1' so long as all that is required is uniqueness. Or, if there is at least 4 numeric digits, can subtract 48 and that would take it from a potentially 32 digit to a 28 digit and that is 13bytes ( and fit into a page a bit better )


0
 
LVL 7

Expert Comment

by:simonsabin
ID: 24322832
User define functions are evil http://sqlblogcasts.com/blogs/simons/archive/2008/11/03/TSQL-Scalar-functions-are-evil-.aspx from a performance perspective.

This will be especially true in a bulk operation.

The map is essential if you have child tables.
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24323214
Nice article... Did you just back date that ?

While it might be true, sometimes a scaler function is worth the cost, maybe for no other reason than to prototype. Unless of course you are a developer as well and can do CLR off the top of your head, an area still largely unvisited by a lot of DBA's (sounds like an article in the making)

Surely in a one-off event such as this you might consider toning down the "evil" aspect ;)

The child tables / dependancies still worry a lot, don't think we got an answer from that comment back up toward the top...
0
 
LVL 17

Author Comment

by:dbaSQL
ID: 24324549
'....from my_blud_dee_big_table ...'
I just love that.


Still on it... back soon with status.
0
 
LVL 7

Expert Comment

by:simonsabin
ID: 24329809
http://sqlblogcasts.com/blogs/simons/archive/2005/01/17/User-Defined-Function-performance-comparison.aspx

Doing a CLR function is very very easy?

I would never touch a multi-statement function they really are evil.

Anything involving more than one row I would never use a user defined function unless you can use the table valued function sub query trick.
0
 
LVL 17

Author Comment

by:dbaSQL
ID: 24329934
production outage brewing with my v2008 instance.  back whenever i have resolved
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24329955
Simon, totally capitulate on UDF's...

Not all DBA's can do CLR - have asked several and they basically shrug. A lot of them are pissed that they even have to get into VS for a variety of things...

How are you on linked servers ? 2008 64bit to 2000 and getting intermittant timeouts (deadlocks) increasing in frequency...

dbaSQL has another thread running, and it is nearly 6:00am down here and need some shut eye, maybe you could pop in and help ?

http://www.experts-exchange.com/Microsoft/Development/MS-SQL-Server/Q_24389116.html?cid=1066#a24329844
0
 
LVL 7

Expert Comment

by:simonsabin
ID: 24330287
That just bugs be. Its soooo easy and they will get better performance. Its like someone saying they are going to travel by horse back because they can't be bothered to learn to drive a car.

Spend a little time and reap the rewards.
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 24331507
>>That just bugs be. Its soooo easy and they will get better performance.<<
Then you have not been in this industry very long:  You may have discovered the best thing since sliced bread, but if it does not sell ...

CLR in SQL Server has been out for nearly 5 years now and has had zero traction.  Just do a search on this site for CLR questions and see for yourself.

I hate to be the harbinger of bad news, but as they say around here, perhaps you should wake up and smell the coffee.
0
 
LVL 7

Expert Comment

by:simonsabin
ID: 24331658
Why if something bugs me indicates I haven't been in this industry very long. I was No 1 in this forum 10 years ago.

I never said it had traction I just said that it if you want performance then CLR functions are what you should be using and not TSQL user defined functions.

and just to split hairs its coming up for being out for 4 years in November this year.
0
 
LVL 75

Expert Comment

by:Anthony Perkins
ID: 24331891
>>Why if something bugs me indicates I haven't been in this industry very long. <<
Is that what you understood?  Perhaps you should re-read my comments.

>>I was No 1 in this forum 10 years ago.<<
I know I participated in some of your questions back then and then you disappeared for a long time only to show up recently like Rip Van Winkle. :)

>>and just to split hairs its coming up for being out for 4 years in November this year.<<
A few of us don't wait for the RC to come out before we explore all the new features.
0
 
LVL 7

Expert Comment

by:simonsabin
ID: 24331959
I remember your handle now.

I lost interest, I couldn't compete with the likes of angel. Wasn't enjoying it and I felt I should focus on my day job. Was interesting that EE wasn't noticed by Microsoft back then.

I was just being pedantic :) Whilst many get there hands dirty many don't until SP1 is out.

I would be interested to see how long it took, if it ever has, for java to be taken on board in Oracle systems.

I think MS have missed the boat for this and other feature like Service Broker. Although the latter has more traction than CLR IMHO.

I also still find companies on 2000 and I think that is hurting 2005 features. Many apps are also just upgraded and not adapted to new technologies.
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24333441
Hi guys, back again....

There are also some companies still running NT.  I was a gold partner a while back, and we had various incentives to get people to start using the new technology. A lot were licensed, but never installed, and very reluctant to actually take the next step. MS do not make it overly easy, and in fact dbaSQL is suffering as a result in trying to keep a production site up and running while trickle migrating to SQL2008 from 2000. Those stories and experiences are all too common place - regardless of cause, regardless of ability, irregardless of technology.

The main problem is DBA's have been fighting Programmers since the day dot (maybe not, but seems that way), and now, MS are saying thou shalt use Programmer Technology to get thy DBA job done. Sometimes it is "easier" to stick to ones knitting on tried and proven (albeit without performance gains) approaches and methodologies. CLR does not have the takeup it should. There are a lot of technologies within the MS world that fall into the same category. Cool features, but hey, let someone else do the hard yards first, and then if we really need to, we can then think about it. MS do not do themselves any great favours that way. Remember back the fist time you went to do a DTS new thingy in SQL Server - bet any money (have none) that the temptation or first move was fire up this great all encompassing Management Studio.

I don't think it is horse and cart, but more like a point a to point b school and shops jallopy versus a ferrari.

P.S. simon, thanks for getting involved in that other thread :)

0
 
LVL 17

Author Comment

by:dbaSQL
ID: 24354678
I've gone w/the IDENTITY approach.  i have the udf in place, too, but am unable to really validate its use.
no biggie... we all know the IDENTITY is more appropriate anyway

(mr. wills, if you could give me a pointer or two on testing the udf, i would be grateful)

I will split amongst simonsabin and mark wills, and close the inquiry
thank you for all of the input
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24355120
Sure, always happy to help...




-- modify function to include an offset so numbers remain at single digit. 
 

ALTER function udf_convert_to_number(@str varchar(100),@offset char(1))

returns decimal(32,0)

as

begin
 

   declare @s varchar(100)

   declare @x varchar(100)

   declare @i int

   declare @o int
 

   set @x = upper(rtrim(left(@str,16)))
 

   if patindex('%[^0-9A-Z]%',@x) > 0 return -1
 

   if isnull(@offset,'N') = 'Y' set @o = 48 else set @o = 0
 

   set @s = ''

   set @i = 1
 

   while @i <= len(@x)

   begin

      set @s = @s+convert(varchar,ascii(substring(@x,@i,1)) - @o)

      set @i = @i + 1

   end

   return @s

end

GO
 

-- now create our mapping table from the source data
 

if object_id('my_idvalue_identity_map','U') is not null drop table my_idvalue_identity_map  -- WARNING drops a table
 

select identity(bigint,1,1) as id,idvalue into my_idvalue_identity_map

from (

select '012-3456789' as idvalue union all

select '0123456789' as idvalue union all

select 'ABCDEFGHIJKLM' as idvalue union all

select 'NOPQRSTUVWXYZ' as idvalue union all

select 'SZD695' as idvalue union all

select '21SZD695' as idvalue union all

select 'ZNSZD695' as idvalue union all

select 'HNSZD695' as idvalue union all

select '87HNSZD695' as idvalue union all

select 'qas21HNSZD695' as idvalue union all

select '717JHNSZD695' as idvalue union all

select '717JHNSZD696' as idvalue

) a

GO
 

-- to populate the above, could also select from the real table "from my_real_table" instead of "from (...union query...) a"

-- then afterwards if it is going to remain, create the clustered primary key on the ID column, and an index on idvalue with include id
 
 

-- now test / use the function
 

select *,dbo.udf_convert_to_number(idvalue,'N') from my_idvalue_identity_map
 
 
 

-- could also use that function as a third column in the building of my_idvalue_identity_map

Open in new window

0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24355142
NB I did add that offset thingy just to see what happen. use 'N' to ignore - meaning each char = 2 digit number, 'Y' to use meaning digits remain single digit, chars are 2 digit
0
 
LVL 17

Author Comment

by:dbaSQL
ID: 24355299
aaah... I hadn't used the N before.  that's where i was tripping.  
just pumped two of my records into a temp table... it works:

717JHNSZD695          554955747278839068545753
717JHNSZD696          554955747278839068545754

good to have in my back pocket.  i know they're going to come back to me on this.
what do you think about unique-ness, mark?  say I ran this on a gazillion records (each with a unique string id)... any problem w/the converted values duplicating?  
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24355543
If you use the 'N' option then no, it is completely reversable back to original character by doiing an ascii() on every 2 digits

e.g. 717H5 = 5549557253     where 55 = 7, 49 = 1, 55 = 7, 72 = H, 53 = 5

Using the 'Y' option, then it depends on the format of the strings in the first plac. If they have a pattern like 999XXXXXX999 for each an every code, then it will be OK, in terms of being unique, but if there are different formats, then it is unpredictable. e.g.  9999 give a 4 digit result, but so does 99X.

Does that make sense ?
0
 
LVL 17

Author Comment

by:dbaSQL
ID: 24355557
Perfect sense.
Thank you, sir.
0
 
LVL 17

Author Comment

by:dbaSQL
ID: 24356093
no i just gotta figure out how to use the darned mapping ID to load the new table

>>for the legacy data, i need to dump the ID string into a table, gen a unique ID per record in the existing dataset, and use this in my new ID attribute, which will be a bigint.

I'll get there....
0
 
LVL 51

Expert Comment

by:Mark Wills
ID: 24356517
have a look at that select ... into

as an example...

select top 1000 * into  tst_idvalue_table
from my_blud_dee_big_table

go

select identity(bigint,1,1) as id,idvalue,dbo.udf_convert_to_number(idvalue,'N') as hashed_idvalue into my_idvalue_identity_map
from tst_idvalue_table

go

select * from my_idvalue_identity_map

go

alter table tst_idvalue_table add new_idvalue decimal(32)

go

update tst_idvalue_table set new_idvalue = hashed_idvalue
from tst_idvalue_table
inner join my_idvalue_identity_map m on m.idvalue = tst_idvalue_table.idvalue

go

select * from tst_idvalue_table

-- let me know if I can help any more... But we are really starting to get into the realm of a new question maybe :)
0

Featured Post

PRTG Network Monitor: Intuitive Network Monitoring

Network Monitoring is essential to ensure that computer systems and network devices are running. Use PRTG to monitor LANs, servers, websites, applications and devices, bandwidth, virtual environments, remote systems, IoT, and many more. PRTG is easy to set up & use.

Join & Write a Comment

Suggested Solutions

Title # Comments Views Activity
MS SQL Backup 24 70
Help with SQL Query 23 39
Contained Database Collations 6 20
SQL Server can be started but not accessed 1 17
In this article I will describe the Detach & Attach method as one possible migration process and I will add the extra tasks needed for an upgrade when and where is applied so it will cover all.
For both online and offline retail, the cross-channel business is the most recent pattern in the B2C trade space.
Using examples as well as descriptions, and references to Books Online, show the different Recovery Models available in SQL Server and explain, as well as show how full, differential and transaction log backups are performed
Via a live example, show how to backup a database, simulate a failure backup the tail of the database transaction log and perform the restore.

747 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now