asked on

how to get native Microsoft characters in SQL Server from sqljdbc

I have varchar fields in a SQL Server 2005 database containing text originating from VB forms. This text has extended characters (I don't know if they're unicode or ISO). These characters are things like bullets, apostrophes, etc.

When I view the text using VB, I see the character OK. When I retrieve this text using sqljdbc for presentation on a webpage, I get '?' characters. The '?' characters are returned by sqljdbc. It is not a function of webpage rendering.

How do I get the actual character values stored in the database?

CEHJ

For known string 's' containing special chars, can you tell me what you expect and then what the following prints?

System.out.println(new sun.misc.HexDumpEncoder(s.getBytes()));

// Also

System.out.println(System.getProperty("file.encoding"));

Open in new window

Mick Barry

you need to know what the encoding is and specify it when reading the string from the database.

Mark

ASKER

CEHJ: I'd have to add the class for the HexDumpEncoder to do your 1st suggestion, but I have run this through my own hexdumper and the String e.g. "we?re" dumps to 77 65 3F 72 65, so it is putting an actual "?" in the string. Viewed in VB, this string is "we're".

your "file.encoding" gives: ISO-8859-1

objects: how would I determine the encoding in the database? and how would I specify the retrieval encoding in sqljdbc?

Mick Barry

check the database collation setting

Mark

ASKER

from SQL Server Management Studion, collating sequence: SQL_Latin1_General_CP1_CI_AS

So, how do I specify this from sqljdbc? I don't see a parameter in the connection properties.

CEHJ

>>String e.g. "we?re" dumps to 77 65 3F 72 65, so it is putting an actual "?" in the string.

Can you do the dump directly on the database too please? It could be that someone tried to insert U+2019 instead of the Latin1 apostrophe

Mark

ASKER

sorry to be away from this question ... other fires flared up.

CEHJ: Sorry, but how would I do a "dump directly on the database"? Is this something I can do in SQL Server Management Studio with a query? Please advise.

The one I did was the contents of an rs.getString("columnName") in java.

CEHJ

I'm not sure i'm afraid. You could try the following:

System.out.println(new sun.misc.HexDumpEncoder().encode(rs.getBytes("columnName")));

Open in new window

Mark

ASKER

OK, a bit of a breakthrough. To get the native code in SQL server do:

select master.dbo.fn_varbintohexstr(cast(COMMENT as varbinary)) as blah from MYTABLE where MEMBERID = '519050' and SEQID = 19002

of course, the table and column names are mine (in uppercase). The results of this yields:

0x416e7920776f726420796574206f6e20686f7720776592726520676f696e

for the string:

Any word yet on how we’re going to proceed?

the apostrohpe in "we're" ends up being 0x92.

BUT, when I retrieve this string in my java program it gets converted to '?'.

so -- full circle ... Is there something I can specify in my jdbc connect string that tells up-stream to give me the 0x92 and not convert to '?' ? If I can get that I can probably do string.replace() in my java program.

I could specify the following query:

select replace(COMMENT,char(146),'’') from MYTABLE ....

but for various reasons, I'm interested in the jdbc/java solution, if there is one. (For example, the query might be a stored procedure in which case I don't get to specify select columns)

Mick Barry

> BUT, when I retrieve this string in my java program it gets converted to '?'.

the ? is typically just a display issue caused by the font not having a glyph for the character
its not actually a ?

CEHJ

>>Any word yet on how we’re goin

We're getting there! (notice the difference in the two apostrophes [if this site will take it])

This suggests to me that the encoding is windows-1252 (Cp1252) since that's the first one i tried with the binary string, and it correctly rendered the apostrophe.

You need to ensure that encoding is used by the driver.

CEHJ

>>if this site will take it

It did, fortunately

Mick Barry

we already knew it was cp1252

CEHJ

>>we already knew it was cp1252

Where?

Mark

ASKER

objects: > the ? is typically just a display issue caused by the font not having a glyph for the character
its not actually a ?

Actually, the ? is what is retrieved from the query. I did hex dump on the actual getString("col") returned. I also indexOf'd the string for 0x92 and came up with nothing. I am sure the the character is being converter by jdbc before I get access to it in the program.

CEHJ: > You need to ensure that encoding [cp1252] is used by the driver.

Yes, I agree. ... how? I don't really see a connection string property for this.

CEHJ

Just as a tester, start the vm with the parameter below and see what happens

-Dfile.encoding=windows-1252

Open in new window

Mick Barry

the driver should be using cp1252 already.
can you post your actual code that is reading the value

Mark

ASKER

rs.getString("comment")

to render it for the webpage I do:

rs.getString("comment").replace("\r","").replace("\n","\n<br>"))

CEHJ

Did you try my last suggestion?

Mick Barry

> Did you try my last suggestion?

how will that make a difference?

Mark

ASKER

CEHJ: OK, I believe I've added that option. I have the catalina.sh script output the JAVA_OPTS when it is run and it shows:

JAVA_OPTS: -Dfile.encoding=windows-1252 -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.util.logging.config.file=/srv/tomcat/conf/logging.properties

Unfortunately, this doesn't fix it. I still get a 0x3f instead of 0x92 for the rouge apostrophe.

I don't think is a java thing. I think the jdbc driver is converting it when retrieving the data.

SOLUTION

CEHJ

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

CEHJ

Might be worth trying the JTds driver. Could be worth calling the below in conjunction with an insert Statement using setBytes. It would be interesting to see if the driver could then read the value without any further problems

public static byte[] getUtf8() {
	byte[] result= null;
	String s = "Any word yet on how we\u2019re goin";
	try {
	    result = s.getBytes("UTF-8");
	}
	catch(UnsupportedEncodingException e) {
	    e.printStackTrace();	
	}
	return result;
    }

Open in new window

ASKER CERTIFIED SOLUTION

Mark

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

Mark

ASKER

The only practical solution is to replace the funky characters at the SQL Server host side.