Mark
asked on
how to get native Microsoft characters in SQL Server from sqljdbc
I have varchar fields in a SQL Server 2005 database containing text originating from VB forms. This text has extended characters (I don't know if they're unicode or ISO). These characters are things like bullets, apostrophes, etc.
When I view the text using VB, I see the character OK. When I retrieve this text using sqljdbc for presentation on a webpage, I get '?' characters. The '?' characters are returned by sqljdbc. It is not a function of webpage rendering.
How do I get the actual character values stored in the database?
When I view the text using VB, I see the character OK. When I retrieve this text using sqljdbc for presentation on a webpage, I get '?' characters. The '?' characters are returned by sqljdbc. It is not a function of webpage rendering.
How do I get the actual character values stored in the database?
you need to know what the encoding is and specify it when reading the string from the database.
ASKER
CEHJ: I'd have to add the class for the HexDumpEncoder to do your 1st suggestion, but I have run this through my own hexdumper and the String e.g. "we?re" dumps to 77 65 3F 72 65, so it is putting an actual "?" in the string. Viewed in VB, this string is "we're".
your "file.encoding" gives: ISO-8859-1
objects: how would I determine the encoding in the database? and how would I specify the retrieval encoding in sqljdbc?
your "file.encoding" gives: ISO-8859-1
objects: how would I determine the encoding in the database? and how would I specify the retrieval encoding in sqljdbc?
check the database collation setting
ASKER
from SQL Server Management Studion, collating sequence: SQL_Latin1_General_CP1_CI_ AS
So, how do I specify this from sqljdbc? I don't see a parameter in the connection properties.
So, how do I specify this from sqljdbc? I don't see a parameter in the connection properties.
>>String e.g. "we?re" dumps to 77 65 3F 72 65, so it is putting an actual "?" in the string.
Can you do the dump directly on the database too please? It could be that someone tried to insert U+2019 instead of the Latin1 apostrophe
Can you do the dump directly on the database too please? It could be that someone tried to insert U+2019 instead of the Latin1 apostrophe
ASKER
sorry to be away from this question ... other fires flared up.
CEHJ: Sorry, but how would I do a "dump directly on the database"? Is this something I can do in SQL Server Management Studio with a query? Please advise.
The one I did was the contents of an rs.getString("columnName") in java.
CEHJ: Sorry, but how would I do a "dump directly on the database"? Is this something I can do in SQL Server Management Studio with a query? Please advise.
The one I did was the contents of an rs.getString("columnName")
I'm not sure i'm afraid. You could try the following:
System.out.println(new sun.misc.HexDumpEncoder().encode(rs.getBytes("columnName")));
ASKER
OK, a bit of a breakthrough. To get the native code in SQL server do:
select master.dbo.fn_varbintohexs tr(cast(CO MMENT as varbinary)) as blah from MYTABLE where MEMBERID = '519050' and SEQID = 19002
of course, the table and column names are mine (in uppercase). The results of this yields:
0x416e7920776f726420796574 206f6e2068 6f77207765 9272652067 6f696e
for the string:
Any word yet on how we’re going to proceed?
the apostrohpe in "we're" ends up being 0x92.
BUT, when I retrieve this string in my java program it gets converted to '?'.
so -- full circle ... Is there something I can specify in my jdbc connect string that tells up-stream to give me the 0x92 and not convert to '?' ? If I can get that I can probably do string.replace() in my java program.
I could specify the following query:
select replace(COMMENT,char(146), '’') from MYTABLE ....
but for various reasons, I'm interested in the jdbc/java solution, if there is one. (For example, the query might be a stored procedure in which case I don't get to specify select columns)
select master.dbo.fn_varbintohexs
of course, the table and column names are mine (in uppercase). The results of this yields:
0x416e7920776f726420796574
for the string:
Any word yet on how we’re going to proceed?
the apostrohpe in "we're" ends up being 0x92.
BUT, when I retrieve this string in my java program it gets converted to '?'.
so -- full circle ... Is there something I can specify in my jdbc connect string that tells up-stream to give me the 0x92 and not convert to '?' ? If I can get that I can probably do string.replace() in my java program.
I could specify the following query:
select replace(COMMENT,char(146),
but for various reasons, I'm interested in the jdbc/java solution, if there is one. (For example, the query might be a stored procedure in which case I don't get to specify select columns)
> BUT, when I retrieve this string in my java program it gets converted to '?'.
the ? is typically just a display issue caused by the font not having a glyph for the character
its not actually a ?
the ? is typically just a display issue caused by the font not having a glyph for the character
its not actually a ?
>>Any word yet on how we’re goin
We're getting there! (notice the difference in the two apostrophes [if this site will take it])
This suggests to me that the encoding is windows-1252 (Cp1252) since that's the first one i tried with the binary string, and it correctly rendered the apostrophe.
You need to ensure that encoding is used by the driver.
We're getting there! (notice the difference in the two apostrophes [if this site will take it])
This suggests to me that the encoding is windows-1252 (Cp1252) since that's the first one i tried with the binary string, and it correctly rendered the apostrophe.
You need to ensure that encoding is used by the driver.
>>if this site will take it
It did, fortunately
It did, fortunately
we already knew it was cp1252
>>we already knew it was cp1252
Where?
Where?
ASKER
objects: > the ? is typically just a display issue caused by the font not having a glyph for the character
its not actually a ?
Actually, the ? is what is retrieved from the query. I did hex dump on the actual getString("col") returned. I also indexOf'd the string for 0x92 and came up with nothing. I am sure the the character is being converter by jdbc before I get access to it in the program.
CEHJ: > You need to ensure that encoding [cp1252] is used by the driver.
Yes, I agree. ... how? I don't really see a connection string property for this.
its not actually a ?
Actually, the ? is what is retrieved from the query. I did hex dump on the actual getString("col") returned. I also indexOf'd the string for 0x92 and came up with nothing. I am sure the the character is being converter by jdbc before I get access to it in the program.
CEHJ: > You need to ensure that encoding [cp1252] is used by the driver.
Yes, I agree. ... how? I don't really see a connection string property for this.
Just as a tester, start the vm with the parameter below and see what happens
-Dfile.encoding=windows-1252
the driver should be using cp1252 already.
can you post your actual code that is reading the value
can you post your actual code that is reading the value
ASKER
rs.getString("comment")
to render it for the webpage I do:
rs.getString("comment").re place("\r" ,"").repla ce("\n","\ n<br>"))
to render it for the webpage I do:
rs.getString("comment").re
Did you try my last suggestion?
> Did you try my last suggestion?
how will that make a difference?
how will that make a difference?
ASKER
CEHJ: OK, I believe I've added that option. I have the catalina.sh script output the JAVA_OPTS when it is run and it shows:
JAVA_OPTS: -Dfile.encoding=windows-12 52 -Djava.util.logging.manage r=org.apac he.juli.Cl assLoaderL ogManager -Djava.util.logging.config .file=/srv /tomcat/co nf/logging .propertie s
Unfortunately, this doesn't fix it. I still get a 0x3f instead of 0x92 for the rouge apostrophe.
I don't think is a java thing. I think the jdbc driver is converting it when retrieving the data.
JAVA_OPTS: -Dfile.encoding=windows-12
Unfortunately, this doesn't fix it. I still get a 0x3f instead of 0x92 for the rouge apostrophe.
I don't think is a java thing. I think the jdbc driver is converting it when retrieving the data.
SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Might be worth trying the JTds driver. Could be worth calling the below in conjunction with an insert Statement using setBytes. It would be interesting to see if the driver could then read the value without any further problems
public static byte[] getUtf8() {
byte[] result= null;
String s = "Any word yet on how we\u2019re goin";
try {
result = s.getBytes("UTF-8");
}
catch(UnsupportedEncodingException e) {
e.printStackTrace();
}
return result;
}
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
ASKER
The only practical solution is to replace the funky characters at the SQL Server host side.
Open in new window