[Last Call] Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 613
  • Last Modified:

how to get native Microsoft characters in SQL Server from sqljdbc

I have varchar fields in a SQL Server 2005 database containing text originating from VB forms. This text has extended characters (I don't know if they're unicode or ISO). These characters are things like bullets, apostrophes, etc.

When I view the text using VB, I see the character OK. When I retrieve this text using sqljdbc for presentation on a webpage, I get '?' characters. The '?' characters are returned by sqljdbc. It is not a function of webpage rendering.

How do I get the actual character values stored in the database?
0
jmarkfoley
Asked:
jmarkfoley
  • 10
  • 9
  • 6
2 Solutions
 
CEHJCommented:
For known string 's' containing special chars, can you tell me what you expect and then what the following prints?


System.out.println(new sun.misc.HexDumpEncoder(s.getBytes()));

// Also

System.out.println(System.getProperty("file.encoding"));

Open in new window

0
 
objectsCommented:

you need to know what the encoding is and specify it when reading the string from the database.
0
 
jmarkfoleyAuthor Commented:
CEHJ: I'd have to add the class for the HexDumpEncoder to do your 1st suggestion, but I have run this through my own hexdumper and the String e.g. "we?re" dumps to 77 65 3F 72 65, so it is putting an actual "?" in the string. Viewed in VB, this string is "we're".

your "file.encoding" gives: ISO-8859-1

objects: how would I determine the encoding in the database? and how would I specify the retrieval encoding in sqljdbc?
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
objectsCommented:
check the database collation setting
0
 
jmarkfoleyAuthor Commented:
from SQL Server Management Studion, collating sequence: SQL_Latin1_General_CP1_CI_AS

So, how do I specify this from sqljdbc? I don't see a parameter in the connection properties.
0
 
CEHJCommented:
>>String e.g. "we?re" dumps to 77 65 3F 72 65, so it is putting an actual "?" in the string.

Can you do the dump directly on the database too please? It could be that someone tried to insert U+2019 instead of the Latin1 apostrophe
0
 
jmarkfoleyAuthor Commented:
sorry to be away from this question ... other fires flared up.

CEHJ: Sorry, but how would I do a "dump directly on the database"? Is this something I can do in SQL Server Management Studio with a query? Please advise.

The one I did was the contents of an rs.getString("columnName") in java.
0
 
CEHJCommented:
I'm not sure i'm afraid. You could try the following:


System.out.println(new sun.misc.HexDumpEncoder().encode(rs.getBytes("columnName")));

Open in new window

0
 
jmarkfoleyAuthor Commented:
OK, a bit of a breakthrough. To get the native code in SQL server do:

select master.dbo.fn_varbintohexstr(cast(COMMENT as varbinary)) as blah from MYTABLE where MEMBERID = '519050' and SEQID = 19002

of course, the table and column names are mine (in uppercase). The results of this yields:

0x416e7920776f726420796574206f6e20686f7720776592726520676f696e

for the string:

Any word yet on how we’re going to proceed?

the apostrohpe in "we're" ends up being 0x92.

BUT, when I retrieve this string in my java program it gets converted to '?'.

so -- full circle ... Is there something I can specify in my jdbc connect string that tells up-stream to give me the 0x92 and not convert to '?'  ? If I can get that I can probably do string.replace() in my java program.

I could specify the following query:

select replace(COMMENT,char(146),'’') from MYTABLE ....

but for various reasons, I'm interested in the jdbc/java solution, if there is one. (For example, the query might be a stored procedure in which case I don't get to specify select columns)
0
 
objectsCommented:
> BUT, when I retrieve this string in my java program it gets converted to '?'.

the ? is typically just a display issue caused by the font not having a glyph for the character
its not actually a ?
0
 
CEHJCommented:
>>Any word yet on how we’re goin

We're getting there! (notice the difference in the two apostrophes [if this site will take it])

This suggests to me that the encoding is windows-1252 (Cp1252) since that's the first one i tried with the binary string, and it correctly rendered the apostrophe.

You need to ensure that encoding is used by the driver.
0
 
CEHJCommented:
>>if this site will take it

It did, fortunately
0
 
objectsCommented:
we already knew it was cp1252
0
 
CEHJCommented:
>>we already knew it was cp1252

Where?
0
 
jmarkfoleyAuthor Commented:
objects: > the ? is typically just a display issue caused by the font not having a glyph for the character
its not actually a ?

Actually, the ? is what is retrieved from the query. I did hex dump on the actual getString("col") returned. I also indexOf'd the string for 0x92 and came up with nothing. I am sure the the character is being converter by jdbc before I get access to it in the program.

CEHJ: > You need to ensure that encoding [cp1252] is used by the driver.

Yes, I agree. ... how? I don't really see a connection string property for this.
0
 
CEHJCommented:
Just as a tester, start the vm with the parameter below and see what happens
-Dfile.encoding=windows-1252

Open in new window

0
 
objectsCommented:
the driver should be using cp1252 already.
can you post your actual code that is reading the value
0
 
jmarkfoleyAuthor Commented:
rs.getString("comment")

to render it for the webpage I do:

rs.getString("comment").replace("\r","").replace("\n","\n<br>"))
0
 
CEHJCommented:
Did you try my last suggestion?
0
 
objectsCommented:
> Did you try my last suggestion?


how will that make a difference?
0
 
jmarkfoleyAuthor Commented:
CEHJ: OK, I believe I've added that option. I have the catalina.sh script output the JAVA_OPTS when it is run and it shows:

JAVA_OPTS: -Dfile.encoding=windows-1252 -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager -Djava.util.logging.config.file=/srv/tomcat/conf/logging.properties

Unfortunately, this doesn't fix it. I still get a 0x3f instead of 0x92 for the rouge apostrophe.

I don't think is a java thing. I think the jdbc driver is converting it when retrieving the data.
0
 
CEHJCommented:
>>Unfortunately, this doesn't fix it. I still get a 0x3f instead of 0x92 for the rouge apostrophe.

OK - it was worth a try. The binary dump shows it's stored in windows-1252. Maybe it shouldn't be,  i don't know.

>>I think the jdbc driver is converting it when retrieving the data.

Mmm, possibly because it expects a different binary encoding - i don't know
0
 
CEHJCommented:
Might be worth trying the JTds driver. Could be worth calling the below in conjunction with an insert Statement using setBytes. It would be interesting to see if the driver could then read the value without any further problems
public static byte[] getUtf8() {
	byte[] result= null;
	String s = "Any word yet on how we\u2019re goin";
	try {
	    result = s.getBytes("UTF-8");
	}
	catch(UnsupportedEncodingException e) {
	    e.printStackTrace();	
	}
	return result;
    }

Open in new window

0
 
jmarkfoleyAuthor Commented:
CEHJ: Thanks for sticking with this question. I don't think I'm going to try a different driver. I did that once before with a different version of the JDBC driver and had all kinds of problems. In any case, that is probably more development work than this problem justified. Currently I "fix the glitch" as follows:

replace(replace(n.comment,char(146),'&rsquo;'),char(149),'&bull;') comment

This will replace the Microsoft Codepage 1252, character 146 with the ISO designation for right-single-quote. It also replaces the 1252 bullet character with the ISO bullet string. I think it will be easier to all translations to this query than mess with changing drivers, etc. Moreover, this problem should eventually go away as future input to this table will be through webpage entry, not VB, so we shouldn't really see to many CP1252 characters (except, if someone pastes from e.g. Word into IE it *does* come in as MS characters -- this can be fixed by doing the translation *before* storing in the database).
0
 
jmarkfoleyAuthor Commented:
The only practical solution is to replace the funky characters at the SQL Server host side.
0

Featured Post

[Webinar] Cloud and Mobile-First Strategy

Maybe you’ve fully adopted the cloud since the beginning. Or maybe you started with on-prem resources but are pursuing a “cloud and mobile first” strategy. Getting to that end state has its challenges. Discover how to build out a 100% cloud and mobile IT strategy in this webinar.

  • 10
  • 9
  • 6
Tackle projects and never again get stuck behind a technical roadblock.
Join Now