Link to home
Start Free TrialLog in
Avatar of nebb-tgr
nebb-tgr

asked on

SQL convert hex stored as image to plain text

Hello,

We have a table where hex data is stored as image datatype. I know that this is text, and I need to convert it. If I search google for "HEX to string convert" and input the data from SQL, I get the correct string. Here is an example:

0x8600020000006C00050C000000080000004C006F00770000006D000E2200000000BD20661A0000000000000000000000020090000000000086006C00000000000000

The string includes a lot of padding, but the actual data is in there somewhere. I can strip the padding around it, but I need to know how to get the same result in SQL as the search results on google.

The result I'm getting currently, is the same as when I use this function:
exec ( 'SELECT CONVERT(nvarchar(100),' + '0x8600020000006C00050C000000080000004C006F00770000006D000E2200000000BD20661A0000000000000000000000020090000000000086006C00000000000000' + ')')
Avatar of D B
D B
Flag of United States of America image

I am not completely certain what you are asking for. This is NOT text. It is a binary string and as such, cannot really be presented as a string, other than how you are doing it above.

What is the padding, how are you going to identify it, and strip it away?
Avatar of Marten Rune
We might be able to provide more help if you explain what you're trying to accomplish.

Is it a smaller database, or for indexing or ...

Please elaborate

Regards Marten
Avatar of nebb-tgr
nebb-tgr

ASKER

I was unclear. I'm talking about an ascii string encoded and stored as a binary string. The reason for doing so might be because the column needs to house various datatypes, and binary string is the best format to convert all datatypes to. In this database, the datatypes that are stored include integers, decimals, booleans, datetime and strings. I only need to decode the strings..

The string in the example contains the word "Low".

The following string contains the word "55_LT_001A"
0x88000100000065000842000000080000812D002D002D000000000000000000000016000000350035005F004C0054005F00300030003100410000000000000000000000000000000000000000000001

Like I said, search for "hex to string convert" on google and you will find the same. I should have mentioned that we are talking about ascii strings, it would probably explain things better :)
Ok, I recognize the Unicode characters that represent 55_LT_001A.  I emphasize Unicode characters are here, not ASCII - you can tell because they are two-byte words, and all the ASCII characters have 00 as the top byte of the Unicode charater (numerically, the first 128 characters of the Unicode alphabet are 16-bit versions of the 7-bit basic ASCII set).

The thing is, from this string I can't tell if the characters are stored in big-endian or little-endian form (that is, low byte followed by high byte, or high byte followed by low byte).  They should be little-endian if they follow Intel's standard for integers (which seems reasonable) but I have no idea what stored them.  If you can find out, we can help you.
Furthermore, is the string you are attempting to pull always at a specific starting position in the data, and a specific length, or can it vary?
I did som manipulation of the data and found that the first portion can be stripped entirely for this purpose.

Complete: 8600020000006C00050C000000080000004C006F00770000006D000E2200000000BD20661A0000000000000000000000020090000000000086006C00000000000000

Can be stripped:
8600020000006C00050C00000008000000

Portion to be decoded:
4C006F00770000006D000E2200000000BD20661A0000000000000000000000020090000000000086006C00000000000000

The length of the data to be stripped is always equal. The length of the data behind varies in length.

Don't know if this helps. I really thought this would be an easy task, since it is so simple to translate it manually. If you have a look at the first part of the "Portion to be decoded". 4C -> L
00 -> removed
6F -> o
00 -> removed
77 -> w
More than three 0 in a row -> remove rest of data.
So, the portion you are stripping away (if you look at this as a character string) is 34-characters long. The '4C', which starts the string you are interested in, is the ASCII representation of the letter 'M', but you are not including it in your decoding, so for clarification, are you stripping the first 34-characters, or are you stripping away until '4C00' and then decoding the rest (until '000000')?

Also, how much data are you working with (e.g. how many rows will be queried, on average, to give you the results you are looking for)?
Assuming that the binary image has been converted into a varchar(of suffiient size) called @x, this should parse out the text you need:

select 
substring(
	substring(@x, 35, LEN(@x)), 
	1,
	CHARINDEX(
		'000000', 
		substring(@x, 35, LEN(@x)
		)
	) - 1
)

Open in new window


Your prefix is 34 characters long; CHARINDEX starts with the 35th character, searches for the matching string (six zeroes in sequence) then the results of CHARINDEX are reduced by 1 to put the final character of the returned string at the last character before the sequence of zeroes.  Try it out.
We have a table where hex data is stored as image datatype.
...

The reason for doing so might be because the column needs to house various datatypes, and binary string is the best format to convert all datatypes to.


Unrelated, but there are two problems with this:
1.  The Image data type is deprecated.
2.  Even if it was not, SQL Variant is a better choice.
dbbishop: 34 first characters are peeled of. Since different strings are stored, they might not start with '4C'. Results will vary. Some times 100, some times 1000. This is a one time thing for each project during analysis. So it's not like it will have to be optimized to run perfectly on a daily basis..

cpkilekofp: Your function returns 0, which means it didn't find 000000 in the string.

acperkins: I have recently become aware of this. We didn't make this system, we just use it. We still have to support older systems.

Here is my starting point:
DECLARE @varb varbinary(max)
DECLARE @varbTrim varchar(max)
SET @varb = 0x8600020000006C00050C000000080000004C006F00770000006D000E2200000000BD20661A0000000000000000000000020090000000000086006C00000000000000
SELECT @varb AS varb
SET @varbTrim = (@varb)
SELECT @varbTrim AS varbTrim

Open in new window


In order to solve this problem, the result out should be "Low"
You are not initializing the string @x properly.  When I initiliaze @x with your complete value, I get '4C006F0077'.  I note now that this is insufficient, as the back half (the high-order bits) of the Unicode character have now been chopped from the message, so I have to rethink this a bit...and I have to leave for an appointment in ten minutes.  Let's see...

I suggest that it is a sequence of FOUR zeroes you actually need - this would be the equivalent of an ASCII NUL terminating a C-style string.  However, you SEARCH for six zeroes, then instead of subtracting one from the result of CHARINDEX, we add one which results in an offset of two characters:  the high-order bytes we're currently missing.

Applying our changes, the new code becomes:

DECLARE @x VARCHAR(MAX)
SET @x = '8600020000006C00050C000000080000004C006F00770000006D000E2200000000BD20661A0000000000000000000000020090000000000086006C00000000000000'
select 
substring(
	substring(@x, 35, LEN(@x)), 
	1,
	CHARINDEX(
		'000000', 
		substring(@x, 35, LEN(@x)
		)
	) + 1
) 

Open in new window


and the new result is '4C006F007700'.
Still working on this as time permits. However, I have a question.

In your post here the first 34 characters are '8800010000006500084200000008000081'. This leaves '2D002D002D000000000000000000000016000000350035005F004C0054005F00300030003100410000000000000000000000000000000000000000000001'

If you use your 'rule' that you pull all data up to three '00's in sequence, that leaves '2D002D002D' and never even pulls the data you are looking for '55_LT_001A'.

Is the above string valid? If so, a lot more than 34 characters need to be discarded to get to '350035005F004C0054005F0030003000310041' which is what you state you are looking to decode.
Also, the solutions I've seen posted so far are taking a single string (using SQL variables) as a RBAR solution. Can I assume you have a table of these values and you want to view the results as rows of data (possibly along with data from other columns in the table)? What version of SQL Server are you using?
You might be correct about the length of the string. I will have to look further into other strings to find out exactly what kind of information is stored in it.

But have you found a way to actually read the string now? All the way from image datatype to plain text? If so, that is good progress. :)
nebb-tgr,

Converting the image to varchar is trivial:
DECLARE @img IMAGE
DECLARE @b1 VARBINARY(MAX)
DECLARE @x VARCHAR(MAX)

SET @x = CONVERT(VARCHAR(MAX), CONVERT(VARBINARY(MAX),img))
-- @x now contains the hex string representing what was in the image

Open in new window


Starting from here, the filter code becomes:

select 
substring(
	substring(@x, 35, LEN(@x)), 
	1,
	CHARINDEX(
		'000000', 
		substring(@x, 35, LEN(@x)
		)
	) + 1
)  

Open in new window


Note that this expression is usable as a field in any query based on a data set that contaings the image.
ASKER CERTIFIED SOLUTION
Avatar of D B
D B
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Solves the question at hand, but does not work for every case. I will use this function and continue working on the solution.