We help IT Professionals succeed at work.

Check out our new AWS podcast with Certified Expert, Phil Phillips! Listen to "How to Execute a Seamless AWS Migration" on EE or on your favorite podcast platform. Listen Now

x

how to get native Microsoft characters in SQL Server from sqljdbc

Mark
Mark asked
on
Medium Priority
654 Views
Last Modified: 2012-06-22
I have varchar fields in a SQL Server 2005 database containing text originating from VB forms. This text has extended characters (I don't know if they're unicode or ISO). These characters are things like bullets, apostrophes, etc.

When I view the text using VB, I see the character OK. When I retrieve this text using sqljdbc for presentation on a webpage, I get '?' characters. The '?' characters are returned by sqljdbc. It is not a function of webpage rendering.

How do I get the actual character values stored in the database?
Comment
Watch Question

CERTIFIED EXPERT
Top Expert 2016

Commented:
For known string 's' containing special chars, can you tell me what you expect and then what the following prints?


System.out.println(new sun.misc.HexDumpEncoder(s.getBytes()));

// Also

System.out.println(System.getProperty("file.encoding"));

Open in new window

Mick BarryJava Developer
CERTIFIED EXPERT
Top Expert 2010

Commented:

you need to know what the encoding is and specify it when reading the string from the database.

Author

Commented:
CEHJ: I'd have to add the class for the HexDumpEncoder to do your 1st suggestion, but I have run this through my own hexdumper and the String e.g. "we?re" dumps to 77 65 3F 72 65, so it is putting an actual "?" in the string. Viewed in VB, this string is "we're".

your "file.encoding" gives: ISO-8859-1

objects: how would I determine the encoding in the database? and how would I specify the retrieval encoding in sqljdbc?
Mick BarryJava Developer
CERTIFIED EXPERT
Top Expert 2010

Commented:
check the database collation setting

Author

Commented:
from SQL Server Management Studion, collating sequence: SQL_Latin1_General_CP1_CI_AS

So, how do I specify this from sqljdbc? I don't see a parameter in the connection properties.
CERTIFIED EXPERT
Top Expert 2016

Commented:
>>String e.g. "we?re" dumps to 77 65 3F 72 65, so it is putting an actual "?" in the string.

Can you do the dump directly on the database too please? It could be that someone tried to insert U+2019 instead of the Latin1 apostrophe

Author

Commented:
sorry to be away from this question ... other fires flared up.

CEHJ: Sorry, but how would I do a "dump directly on the database"? Is this something I can do in SQL Server Management Studio with a query? Please advise.

The one I did was the contents of an rs.getString("columnName") in java.
CERTIFIED EXPERT
Top Expert 2016

Commented:
I'm not sure i'm afraid. You could try the following:


System.out.println(new sun.misc.HexDumpEncoder().encode(rs.getBytes("columnName")));

Open in new window

Author

Commented:
OK, a bit of a breakthrough. To get the native code in SQL server do:

select master.dbo.fn_varbintohexstr(cast(COMMENT as varbinary)) as blah from MYTABLE where MEMBERID = '519050' and SEQID = 19002

of course, the table and column names are mine (in uppercase). The results of this yields:

0x416e7920776f726420796574206f6e20686f7720776592726520676f696e

for the string:

Any word yet on how we’re going to proceed?

the apostrohpe in "we're" ends up being 0x92.

BUT, when I retrieve this string in my java program it gets converted to '?'.

so -- full circle ... Is there something I can specify in my jdbc connect string that tells up-stream to give me the 0x92 and not convert to '?'  ? If I can get that I can probably do string.replace() in my java program.

I could specify the following query:

select replace(COMMENT,char(146),'’') from MYTABLE ....

but for various reasons, I'm interested in the jdbc/java solution, if there is one. (For example, the query might be a stored procedure in which case I don't get to specify select columns)
Mick BarryJava Developer
CERTIFIED EXPERT
Top Expert 2010

Commented:
> BUT, when I retrieve this string in my java program it gets converted to '?'.

the ? is typically just a display issue caused by the font not having a glyph for the character
its not actually a ?
CERTIFIED EXPERT
Top Expert 2016

Commented:
>>Any word yet on how we’re goin

We're getting there! (notice the difference in the two apostrophes [if this site will take it])

This suggests to me that the encoding is windows-1252 (Cp1252) since that's the first one i tried with the binary string, and it correctly rendered the apostrophe.

You need to ensure that encoding is used by the driver.
CERTIFIED EXPERT
Top Expert 2016

Commented:
>>if this site will take it

It did, fortunately
Mick BarryJava Developer
CERTIFIED EXPERT
Top Expert 2010

Commented:
we already knew it was cp1252
CERTIFIED EXPERT
Top Expert 2016

Commented:
>>we already knew it was cp1252

Where?

Author

Commented:
objects: > the ? is typically just a display issue caused by the font not having a glyph for the character
its not actually a ?

Actually, the ? is what is retrieved from the query. I did hex dump on the actual getString("col") returned. I also indexOf'd the string for 0x92 and came up with nothing. I am sure the the character is being converter by jdbc before I get access to it in the program.

CEHJ: > You need to ensure that encoding [cp1252] is used by the driver.

Yes, I agree. ... how? I don't really see a connection string property for this.
CERTIFIED EXPERT
Top Expert 2016

Commented:
Just as a tester, start the vm with the parameter below and see what happens
-Dfile.encoding=windows-1252

Open in new window

Mick BarryJava Developer
CERTIFIED EXPERT
Top Expert 2010

Commented:
the driver should be using cp1252 already.
can you post your actual code that is reading the value

Author

Commented:
rs.getString("comment")

to render it for the webpage I do:

rs.getString("comment").replace("\r","").replace("\n","\n<br>"))
CERTIFIED EXPERT
Top Expert 2016

Commented:
Did you try my last suggestion?
Mick BarryJava Developer
CERTIFIED EXPERT
Top Expert 2010

Commented:
> Did you try my last suggestion?


how will that make a difference?

Author

Commented:
CEHJ: OK, I believe I've added that option. I have the catalina.sh script output the JAVA_OPTS when it is run and it shows:

JAVA_OPTS: -Dfile.encoding=windows-1252 -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.util.logging.config.file=/srv/tomcat/conf/logging.properties

Unfortunately, this doesn't fix it. I still get a 0x3f instead of 0x92 for the rouge apostrophe.

I don't think is a java thing. I think the jdbc driver is converting it when retrieving the data.
CERTIFIED EXPERT
Top Expert 2016
Commented:
Unlock this solution and get a sample of our free trial.
(No credit card required)
UNLOCK SOLUTION
CERTIFIED EXPERT
Top Expert 2016

Commented:
Might be worth trying the JTds driver. Could be worth calling the below in conjunction with an insert Statement using setBytes. It would be interesting to see if the driver could then read the value without any further problems
public static byte[] getUtf8() {
	byte[] result= null;
	String s = "Any word yet on how we\u2019re goin";
	try {
	    result = s.getBytes("UTF-8");
	}
	catch(UnsupportedEncodingException e) {
	    e.printStackTrace();	
	}
	return result;
    }

Open in new window

Commented:
Unlock this solution and get a sample of our free trial.
(No credit card required)
UNLOCK SOLUTION

Author

Commented:
The only practical solution is to replace the funky characters at the SQL Server host side.
Unlock the solution to this question.
Thanks for using Experts Exchange.

Please provide your email to receive a sample view!

*This site is protected by reCAPTCHA and the Google Privacy Policy and Terms of Service apply.

OR

Please enter a first name

Please enter a last name

8+ characters (letters, numbers, and a symbol)

By clicking, you agree to the Terms of Use and Privacy Policy.