HEX in varchar to int or bigint conversion

Hi all,

Im desperately trying to convert a hex value into either an int or big int.

The snag is that the hex number is bulk inserted and the sourcefile does not stipulate 0x infront of the number so bulk insert blows up if i use varbinary to import the field.


Thus im bulk inserting the 10 million+ records into a varchar field, I then need to convert this to int.

I cant afford any form of cursors or string manupulation to do this.   because the operation nees to complete within a few seconds.


In short....  I want to be able to convert to an integer a hex value stored in a varchar where the varchar looks something like this:

MyField
---------
f575e
524eea

etc.
LVL 18
ShogunWadeAsked:
Who is Participating?
 
Scott PletcherSenior DBACommented:
Now that I look at it, I think >= is clearer than >, and I finished the full 8, something like this:

UPDATE ...
SET ... =
      (CHARINDEX(SUBSTRING(@hexColumn, LEN(@hexColumn), 1), '0123456789ABCDEF') - 1) +
      CASE WHEN LEN(@hexColumn) >= 2 THEN
            (CHARINDEX(SUBSTRING(@hexColumn, LEN(@hexColumn) - 1, 1), '0123456789ABCDEF') - 1) * 16
            ELSE 0 END +
      CASE WHEN LEN(@hexColumn) >= 3 THEN
            (CHARINDEX(SUBSTRING(@hexColumn, LEN(@hexColumn) - 2, 1), '0123456789ABCDEF') - 1) * 16 * 16
            ELSE 0 END +
      CASE WHEN LEN(@hexColumn) >= 4 THEN
            (CHARINDEX(SUBSTRING(@hexColumn, LEN(@hexColumn) - 3, 1), '0123456789ABCDEF') - 1) * 16 * 16 * 16
            ELSE 0 END +
      CASE WHEN LEN(@hexColumn) >= 5 THEN
            (CHARINDEX(SUBSTRING(@hexColumn, LEN(@hexColumn) - 4, 1), '0123456789ABCDEF') - 1) * 16 * 16 * 16 * 16
            ELSE 0 END +
      CASE WHEN LEN(@hexColumn) >= 6 THEN
            (CHARINDEX(SUBSTRING(@hexColumn, LEN(@hexColumn) - 5, 1), '0123456789ABCDEF') - 1) * 16 * 16 * 16 * 16 * 16
            ELSE 0 END +
      CASE WHEN LEN(@hexColumn) >= 7 THEN
            (CHARINDEX(SUBSTRING(@hexColumn, LEN(@hexColumn) - 6, 1), '0123456789ABCDEF') - 1) * 16 * 16 * 16 * 16 * 16 * 16
            ELSE 0 END +
      CASE WHEN LEN(@hexColumn) >= 8 THEN
            (CHARINDEX(SUBSTRING(@hexColumn, LEN(@hexColumn) - 7, 1), '0123456789ABCDEF') - 1) * 16 * 16 * 16 * 16 * 16 * 16 * 16
            ELSE 0 END
0
 
Scott PletcherSenior DBACommented:
I suggest a "brute force" method, similar to:

UPDATE yourTable
SET intColumn =
      (CHARINDEX(SUBSTRING(hexColumn, LEN(hexColumn), 1), '0123456789ABCDEF') - 1) +
      CASE WHEN LEN(hexColumn) > 1 THEN
            CHARINDEX(SUBSTRING(hexColumn, LEN(hexColumn) - 1, 1), '0123456789ABCDEF') - 1) * 16
            ELSE 0 END +
      CASE WHEN LEN(hexColumn) > 2 THEN
            (CHARINDEX(SUBSTRING(hexColumn, LEN(hexColumn) - 2, 1), '0123456789ABCDEF') - 1) * 16 * 16
            ELSE 0 END +
... and so on, for up to 8 chars ...


This assumes a length of at least one; if needed, add a CASE check to the first calc as well.

This also assumes that no bad hex chars are present.  If necessary, add a WHERE to the UPDATE to insure that no invalid values are converted.

0
 
Scott PletcherSenior DBACommented:
Obviously you can normalize the 16 * 16 ... to a fixed number; I was just in a hurry :-).
0
Ultimate Tool Kit for Technology Solution Provider

Broken down into practical pointers and step-by-step instructions, the IT Service Excellence Tool Kit delivers expert advice for technology solution providers. Get your free copy now.

 
ShogunWadeAuthor Commented:
Thanks for the suggestion Scott, however I tried a very similar technique but it was to slow.   The thing is that this 10 million records is just from one of many files received to by the server every day.  all the data is denormalised and unvalidates and gets cleaned normalised aggregated and archived all within a very time timeline, I was kind of hoping that there would be some form of cast or convert type method of solving the problem.
0
 
ShogunWadeAuthor Commented:
If there is no faster alternative then I will accept your answer.
0
 
Scott PletcherSenior DBACommented:
Hmm, given today's CPUs, I wouldn't have that logic would be *that* slow.  You could use C++ (or assembler) to create a .DLL and use that as an extended SP to do the conversion.  That approach seems to follow what MS does when they need more speed :-)
0
 
ShogunWadeAuthor Commented:
I agree that it is not v slow but given the very tight timeline to perform all the processing the task must complete within a few seconds max.
0
 
ShogunWadeAuthor Commented:
Having re-thought over the problem what i actually need is not necesarily a true conversion....

Il try to explain....

Each record in the file consists of a number of fields   the combination of 4 of them must be unique for a "dialogue" (ie a series of records that relate to each other)   one field is a hex value but it is stored purely for "threading the dialogues together" (ie joining) but must be availble or recalculatable for queries later.    Each "dialogue" is assigned a newid() upon initial normalisation and thus the hex reference is purely informational after this stage.

I thought maybe doing CONVERT(varbinary(20),myhexfield)  

myhexfield is max 9 hex characters at present

however the ultimately I need to be able to store the resultant value in an int datatype.   if i do CONVERT(int,CONVERT(varbinary(20),myhexfield)  
)  i will overflow max int.

In other words I need a crude and quick method of taking a hex value in a varchar field and converting it to any int value that is reversable.   It doesntmatter how slow the reversal is so long as it can be done and the initial convertion is lightning fast
0
 
ShogunWadeAuthor Commented:
The chance of the other three fields in a given dialogue colliding is is small  about 0.5%

0
 
Anthony PerkinsCommented:
I strongly suspect that I am oversimplifying the problem, but perhaps you could resort to dynamic SQL as in:

exec('select cast(' + myhexfield + ' as bigint) from table1')

Anthony
0
 
Scott PletcherSenior DBACommented:
When you said you were storing the result in an INT, I assumed the max length was 8, since an INT can only handle up to x'7FFFFFFF'.
0
 
Scott PletcherSenior DBACommented:
You could use dynamic SQL, but whether or not that would be quicker is problematic, because I think you would have to use sp_executeSQL with an OUTPUT parameter to get the translated value back.
0
 
ShogunWadeAuthor Commented:
Sorry,  im suffereing from brain death today i think......  the hex values i get are unsigned so i planned to offset the value by min int but event still, U are correct my value wont fit into an int so no algorthm applied to it can possibly be truely reversable.

thus i must either use a "lossy" function" or a bigint.

AC, let me thing for a touch on what you are suggesting and see if it will do the trick.
0
 
Scott PletcherSenior DBACommented:
You could use a BIGINT.  And it's easy to convert the value from INT back to HEX, you just reverse the process above.  It would be slower, but you said that's not an issue during reversal.
0
 
Scott PletcherSenior DBACommented:
Oh, btw, BIGINT will slow it down :-(
0
 
ShogunWadeAuthor Commented:
AC, im not quite following,    

Myhexfield is a field within table1.. so i dont entirely follow how this would work.
0
 
ShogunWadeAuthor Commented:
A bit more info and a bit more thought.......


The most significant bits of the hex number can be lossed without to much of a deremental effect in order to squeze the result into an int.  

Perhaps I should have mentioned also that disk space is at a premium,  though cheap, I filled 1.5TB since sepember so saving 4bytes per record is pretty important.
0
 
Scott PletcherSenior DBACommented:
If you can strip the value down to eight chars, then of course you will be OK going to an int.  You could even convert eight chars to an int, and the remaining char to a tinyint -- 5 bytes instead of 8 and likely just as fast, or faster, than a BIGINT.

0
 
Anthony PerkinsCommented:
As Scott has suggested you would have to use sp_executeSQL with OUTPUT, but this is sort of what I had in mind:

declare @X as varchar(20)
Set @X = '0xffffffffff'
exec('select cast(' + @X + ' as bigint)')

Again, I am sure I am not fully understanding what you need, so feel free to ignore this comment <g>

Anthony
0
 
Scott PletcherSenior DBACommented:
You could also, of course, create a table with hex values between 00 and FF and the corresponding decimal values and use a join to get the values.  I originally thought that this would be slower than the calculation method because it would require a join for every group of two bytes in the hex string, in this case 5 joins (since it's 9 chars, the last join would of course be '0' + the ninth char).  But I guess you never know, in theory I guess it could be quicker, since it would be only one data page, although I'd be very surprised.
0
 
ShogunWadeAuthor Commented:
Hmm,  I must admit the same thought crossed my mind, but i havent tested it.    

Ill give it a try in the morning (UK time) and let you know how it goes.   Thanks for the input so far will pick up on this tomorrow.

Ta
0
 
HilaireCommented:
Hi,

I've been following this thread with much interest since the beginning.

Beeing very curious about extending SQL server capabilities, and having previous experience in creating open server in Sybase (looks like SQL server's ODS) to add interface with MOMs (IBM MQseries), i tried to write a piece of C code that could do the job.

It should be quite fast since it uses >> bit-shift operators to build and int value from a HEX string (length 0-8 cars)

The problem I'm facing now is that I don't know how to make this function available in SQL server.
From what I have understood so far, xp_* user-defined procedures are supposed to return recordsets, but I'm not sure they can be used like a normal function in a statement like

update mytable set intvalue=xp_HexToInt(strHexValue)

If the workaround is to call the sp with output parameter inside a cursor, it will take ages and It's not compatible with ShogunWade's requirements

ScottPletcher was suggesting to use a dll, any ideas how to implement it ??

If anybody cares, I can post the C code

Hilaire
0
 
HilaireCommented:
Hi,

from what I have read so far,
it sounds like using an extended stored procedures is not a suitable solution in this situation, since the only way to call it would be

exec xp_strHexToInt(@strIn varchar(8), @intValue int output)

for each row .... :((

I also tried to do the job with a function declared as a method in a active X dll. Calling it using sp_OA* is even worse ...

I think (I may be wrong) that creating user-defined external scalar functions is not possible for the moment.

It seems that the new SQL server version (Yukon ?) due in 2004 will allow to extend SQL Server built-in scalar functions with user libraries made with .Net Studio (.net Framework).

The syntax should be something like

Create assembly MyLib from '\myPath\myDll'

Create function ext_udf_MyFunction(@param1 type, ....) returns output_datatype
external name 'MyLib.MyFunction' deterministic

We'll have to wait ...

Nobody seems to care, but here's my code (its' written anyway)
(no #includes needed (just <stdio.h>))

int strHexToInt(char *val) {
      int retval;
      retval=0;
      switch(*val){
            case 'A':
            case 'a':
                  retval=10; break;
            case 'B':
            case 'b':
                  retval=11; break;
            case 'C':
            case 'c':
                  retval=12; break;
            case 'D':
            case 'd':
                  retval=13; break;
            case 'E':
            case 'e':
                  retval=14; break;
            case 'F':
            case 'f':
                  retval=15; break;
            case '0':
            case '1':
            case '2':
            case '3':
            case '4':
            case '5':
            case '6':
            case '7':
            case '8':
            case '9':
                  retval=(*val)-48; break;
            default:
                  return 0;
      }
      while (*(++val))
      {
            retval <<= 4;
            switch(*val){
                  case 'A':
                  case 'a':
                        retval+=10; break;
                  case 'B':
                  case 'b':
                        retval+=11; break;
                  case 'C':
                  case 'c':
                        retval+=12; break;
                  
                  case 'D':
                  case 'd':
                        retval+=13; break;
                  case 'E':
                  case 'e':
                        retval+=14; break;
                  case 'F':
                  case 'f':
                        retval+=15; break;
                  case '0':
                  case '1':
                  case '2':
                  case '3':
                  case '4':
                  case '5':
                  case '6':
                  case '7':
                  case '8':
                  case '9':
                        retval+=(*val)-48; break;
                  default:
                        return 0;
            }
      }
      return retval;
}





I hope my contribution is not completely off-topic.
Forgive me and my crappy english if it is.

Cheers

Hilaire
0
 
ShogunWadeAuthor Commented:
Hi Hilaire,

Sorry i didnt respond a yesterday,   I was off sick.   Im just catching up on a few things and hopefully in a couple of hours or so will get back onto the thread

Regards

Mike
0
 
arbertCommented:
Just a suggestion, if you only really need to return and integer to link the original "hex" values together couldn't you use the CHECKSUM function around the field you want to get the int/bigint for?  Not sure how slow it would be....

Brett
0
 
ShogunWadeAuthor Commented:
Hi All,

I havent forgotten about this thread,  Im just a touch tied up with other stuff at the mo.

Arbert,   as a temporary workaround I have been using checksum, but unfortunately I an getting to many colisions and it takes about 3 mins to run.  

Havent benchmarked all the other suggestions yet,  hopefully may have time to get back onto this later today.
0
 
ShogunWadeAuthor Commented:
Hi All,

ive been doing some benchmarking and am still struggling with the performance issue.  I suspect that I may need to rethink the problem.

There are some good suggestions and help you have given me. Rather than keeping the question open,  I thought I would close it and split the pointsas below
If this is acceptable to you all, I would suggest I give:

100 to Scott
and 25 to Hilaire

0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.