nebb-tgr
asked on
SQL convert hex stored as image to plain text
Hello,
We have a table where hex data is stored as image datatype. I know that this is text, and I need to convert it. If I search google for "HEX to string convert" and input the data from SQL, I get the correct string. Here is an example:
0x8600020000006C00050C0000 0008000000 4C006F0077 0000006D00 0E22000000 00BD20661A 0000000000 0000000000 0002009000 0000000086 006C000000 00000000
The string includes a lot of padding, but the actual data is in there somewhere. I can strip the padding around it, but I need to know how to get the same result in SQL as the search results on google.
The result I'm getting currently, is the same as when I use this function:
exec ( 'SELECT CONVERT(nvarchar(100),' + '0x8600020000006C00050C000 0000800000 04C006F007 70000006D0 00E2200000 000BD20661 A000000000 0000000000 0000200900 0000000008 6006C00000 000000000' + ')')
We have a table where hex data is stored as image datatype. I know that this is text, and I need to convert it. If I search google for "HEX to string convert" and input the data from SQL, I get the correct string. Here is an example:
0x8600020000006C00050C0000
The string includes a lot of padding, but the actual data is in there somewhere. I can strip the padding around it, but I need to know how to get the same result in SQL as the search results on google.
The result I'm getting currently, is the same as when I use this function:
exec ( 'SELECT CONVERT(nvarchar(100),' + '0x8600020000006C00050C000
We might be able to provide more help if you explain what you're trying to accomplish.
Is it a smaller database, or for indexing or ...
Please elaborate
Regards Marten
Is it a smaller database, or for indexing or ...
Please elaborate
Regards Marten
ASKER
I was unclear. I'm talking about an ascii string encoded and stored as a binary string. The reason for doing so might be because the column needs to house various datatypes, and binary string is the best format to convert all datatypes to. In this database, the datatypes that are stored include integers, decimals, booleans, datetime and strings. I only need to decode the strings..
The string in the example contains the word "Low".
The following string contains the word "55_LT_001A"
0x880001000000650008420000 0008000081 2D002D002D 0000000000 0000000000 0016000000 350035005F 004C005400 5F00300030 0031004100 0000000000 0000000000 0000000000 0000000000 0001
Like I said, search for "hex to string convert" on google and you will find the same. I should have mentioned that we are talking about ascii strings, it would probably explain things better :)
The string in the example contains the word "Low".
The following string contains the word "55_LT_001A"
0x880001000000650008420000
Like I said, search for "hex to string convert" on google and you will find the same. I should have mentioned that we are talking about ascii strings, it would probably explain things better :)
Ok, I recognize the Unicode characters that represent 55_LT_001A. I emphasize Unicode characters are here, not ASCII - you can tell because they are two-byte words, and all the ASCII characters have 00 as the top byte of the Unicode charater (numerically, the first 128 characters of the Unicode alphabet are 16-bit versions of the 7-bit basic ASCII set).
The thing is, from this string I can't tell if the characters are stored in big-endian or little-endian form (that is, low byte followed by high byte, or high byte followed by low byte). They should be little-endian if they follow Intel's standard for integers (which seems reasonable) but I have no idea what stored them. If you can find out, we can help you.
The thing is, from this string I can't tell if the characters are stored in big-endian or little-endian form (that is, low byte followed by high byte, or high byte followed by low byte). They should be little-endian if they follow Intel's standard for integers (which seems reasonable) but I have no idea what stored them. If you can find out, we can help you.
Furthermore, is the string you are attempting to pull always at a specific starting position in the data, and a specific length, or can it vary?
ASKER
I did som manipulation of the data and found that the first portion can be stripped entirely for this purpose.
Complete: 8600020000006C00050C000000 080000004C 006F007700 00006D000E 2200000000 BD20661A00 0000000000 0000000000 0200900000 0000008600 6C00000000 000000
Can be stripped:
8600020000006C00050C000000 08000000
Portion to be decoded:
4C006F00770000006D000E2200 000000BD20 661A000000 0000000000 0000000200 9000000000 0086006C00 0000000000 00
The length of the data to be stripped is always equal. The length of the data behind varies in length.
Don't know if this helps. I really thought this would be an easy task, since it is so simple to translate it manually. If you have a look at the first part of the "Portion to be decoded". 4C -> L
00 -> removed
6F -> o
00 -> removed
77 -> w
More than three 0 in a row -> remove rest of data.
Complete: 8600020000006C00050C000000
Can be stripped:
8600020000006C00050C000000
Portion to be decoded:
4C006F00770000006D000E2200
The length of the data to be stripped is always equal. The length of the data behind varies in length.
Don't know if this helps. I really thought this would be an easy task, since it is so simple to translate it manually. If you have a look at the first part of the "Portion to be decoded". 4C -> L
00 -> removed
6F -> o
00 -> removed
77 -> w
More than three 0 in a row -> remove rest of data.
So, the portion you are stripping away (if you look at this as a character string) is 34-characters long. The '4C', which starts the string you are interested in, is the ASCII representation of the letter 'M', but you are not including it in your decoding, so for clarification, are you stripping the first 34-characters, or are you stripping away until '4C00' and then decoding the rest (until '000000')?
Also, how much data are you working with (e.g. how many rows will be queried, on average, to give you the results you are looking for)?
Also, how much data are you working with (e.g. how many rows will be queried, on average, to give you the results you are looking for)?
Assuming that the binary image has been converted into a varchar(of suffiient size) called @x, this should parse out the text you need:
Your prefix is 34 characters long; CHARINDEX starts with the 35th character, searches for the matching string (six zeroes in sequence) then the results of CHARINDEX are reduced by 1 to put the final character of the returned string at the last character before the sequence of zeroes. Try it out.
select
substring(
substring(@x, 35, LEN(@x)),
1,
CHARINDEX(
'000000',
substring(@x, 35, LEN(@x)
)
) - 1
)
Your prefix is 34 characters long; CHARINDEX starts with the 35th character, searches for the matching string (six zeroes in sequence) then the results of CHARINDEX are reduced by 1 to put the final character of the returned string at the last character before the sequence of zeroes. Try it out.
We have a table where hex data is stored as image datatype.
...
The reason for doing so might be because the column needs to house various datatypes, and binary string is the best format to convert all datatypes to.
Unrelated, but there are two problems with this:
1. The Image data type is deprecated.
2. Even if it was not, SQL Variant is a better choice.
...
The reason for doing so might be because the column needs to house various datatypes, and binary string is the best format to convert all datatypes to.
Unrelated, but there are two problems with this:
1. The Image data type is deprecated.
2. Even if it was not, SQL Variant is a better choice.
ASKER
dbbishop: 34 first characters are peeled of. Since different strings are stored, they might not start with '4C'. Results will vary. Some times 100, some times 1000. This is a one time thing for each project during analysis. So it's not like it will have to be optimized to run perfectly on a daily basis..
cpkilekofp: Your function returns 0, which means it didn't find 000000 in the string.
acperkins: I have recently become aware of this. We didn't make this system, we just use it. We still have to support older systems.
Here is my starting point:
In order to solve this problem, the result out should be "Low"
cpkilekofp: Your function returns 0, which means it didn't find 000000 in the string.
acperkins: I have recently become aware of this. We didn't make this system, we just use it. We still have to support older systems.
Here is my starting point:
DECLARE @varb varbinary(max)
DECLARE @varbTrim varchar(max)
SET @varb = 0x8600020000006C00050C000000080000004C006F00770000006D000E2200000000BD20661A0000000000000000000000020090000000000086006C00000000000000
SELECT @varb AS varb
SET @varbTrim = (@varb)
SELECT @varbTrim AS varbTrim
In order to solve this problem, the result out should be "Low"
You are not initializing the string @x properly. When I initiliaze @x with your complete value, I get '4C006F0077'. I note now that this is insufficient, as the back half (the high-order bits) of the Unicode character have now been chopped from the message, so I have to rethink this a bit...and I have to leave for an appointment in ten minutes. Let's see...
I suggest that it is a sequence of FOUR zeroes you actually need - this would be the equivalent of an ASCII NUL terminating a C-style string. However, you SEARCH for six zeroes, then instead of subtracting one from the result of CHARINDEX, we add one which results in an offset of two characters: the high-order bytes we're currently missing.
Applying our changes, the new code becomes:
and the new result is '4C006F007700'.
I suggest that it is a sequence of FOUR zeroes you actually need - this would be the equivalent of an ASCII NUL terminating a C-style string. However, you SEARCH for six zeroes, then instead of subtracting one from the result of CHARINDEX, we add one which results in an offset of two characters: the high-order bytes we're currently missing.
Applying our changes, the new code becomes:
DECLARE @x VARCHAR(MAX)
SET @x = '8600020000006C00050C000000080000004C006F00770000006D000E2200000000BD20661A0000000000000000000000020090000000000086006C00000000000000'
select
substring(
substring(@x, 35, LEN(@x)),
1,
CHARINDEX(
'000000',
substring(@x, 35, LEN(@x)
)
) + 1
)
and the new result is '4C006F007700'.
Still working on this as time permits. However, I have a question.
In your post here the first 34 characters are '8800010000006500084200000 008000081' . This leaves '2D002D002D000000000000000 0000000160 0000035003 5005F004C0 054005F003 0003000310 0410000000 0000000000 0000000000 0000000000 000000001'
If you use your 'rule' that you pull all data up to three '00's in sequence, that leaves '2D002D002D' and never even pulls the data you are looking for '55_LT_001A'.
Is the above string valid? If so, a lot more than 34 characters need to be discarded to get to '350035005F004C0054005F003 0003000310 041' which is what you state you are looking to decode.
In your post here the first 34 characters are '8800010000006500084200000
If you use your 'rule' that you pull all data up to three '00's in sequence, that leaves '2D002D002D' and never even pulls the data you are looking for '55_LT_001A'.
Is the above string valid? If so, a lot more than 34 characters need to be discarded to get to '350035005F004C0054005F003
Also, the solutions I've seen posted so far are taking a single string (using SQL variables) as a RBAR solution. Can I assume you have a table of these values and you want to view the results as rows of data (possibly along with data from other columns in the table)? What version of SQL Server are you using?
ASKER
You might be correct about the length of the string. I will have to look further into other strings to find out exactly what kind of information is stored in it.
But have you found a way to actually read the string now? All the way from image datatype to plain text? If so, that is good progress. :)
But have you found a way to actually read the string now? All the way from image datatype to plain text? If so, that is good progress. :)
nebb-tgr,
Converting the image to varchar is trivial:
Starting from here, the filter code becomes:
Note that this expression is usable as a field in any query based on a data set that contaings the image.
Converting the image to varchar is trivial:
DECLARE @img IMAGE
DECLARE @b1 VARBINARY(MAX)
DECLARE @x VARCHAR(MAX)
SET @x = CONVERT(VARCHAR(MAX), CONVERT(VARBINARY(MAX),img))
-- @x now contains the hex string representing what was in the image
Starting from here, the filter code becomes:
select
substring(
substring(@x, 35, LEN(@x)),
1,
CHARINDEX(
'000000',
substring(@x, 35, LEN(@x)
)
) + 1
)
Note that this expression is usable as a field in any query based on a data set that contaings the image.
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
Solves the question at hand, but does not work for every case. I will use this function and continue working on the solution.
What is the padding, how are you going to identify it, and strip it away?