Avatar of ankata
ankata
 asked on

parse xml file with japanese file path in java

Dear all,

I have a problem when use DocumentBuilder to parse xml file. When xml file path in English, DocumentBuilder parse okay. But when it in Japanese (such as: C:\Documents and Settings\BW_\sample.xml), DocumentBuilder cannot parse the file.

Please any help me!

Many thanks,
Java

Avatar of undefined
Last Comment
ankata

8/22/2022 - Mon
Mick Barry

ankata

ASKER

Thanks for your quick reply. I tried it but "builder.parse" do not have parameter InputStreamReader. I also tried  "Get Japanese folder name path":
https://www.experts-exchange.com/questions/21906404/Get-Japanese-folder-name-path.html?sfQueryTermInfo=1+vn

 but still not success.

Please help!
Mick Barry

can you post your code

This is the best money I have ever spent. I cannot not tell you how many times these folks have saved my bacon. I learn so much from the contributors.
rwheeler23
ankata

ASKER
my code is simple like below, at first time the filePath is: "c:\data\symbol.xml" (English path), db.parse(filePath) is okey. But when filePath is: "c:\Çü¿\symbol.xml" (Japanese path), exception has occured
DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance();
		
try {
			
	DocumentBuilder db = dbf.newDocumentBuilder();
 
	filePath += "symbol.xml";
	dom = db.parse(filePath);
}catch(ParserConfigurationException pce) {
	pce.printStackTrace();
}catch(SAXException se) {
	se.printStackTrace();
}catch(IOException ioe) {
	ioe.printStackTrace();
}

Open in new window

Mick Barry

where is the pathname coming from? what exception are you getting?

CEHJ

Try starting your app as follows:
java -Dfile.encoding=UTF-8 MyXmlApp

Open in new window

⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
Mick Barry

if you are working with japanese character set then its best to have an appropriate default encoding setup.

CEHJ

Another thing you can is run your source through native2ascii, then you can paste the original path in as a comment along with the encoded one, e.g

native2ascii Source.java.bak Source.java

See image below for an example:
jap-path-enc.png
ankata

ASKER
Thanks for your all replies

To Objects: - the pathFile you can assign directly, the path with Japanese characters is my prolem. As I said I can parse successful using filePath only contain English chacracters.
                   - I got IOException in the above codes

To CEHJ: I set my Eclipse to build with encoding UTF-8, I can see japanese in my code but when write to console it fail :(

Experts Exchange has (a) saved my job multiple times, (b) saved me hours, days, and even weeks of work, and often (c) makes me look like a superhero! This place is MAGIC!
Walt Forbes
CEHJ

Which console? Does it support Japanese characters?
ankata

ASKER
I run on Eclipse and I simply print using System.out.println() :)

My application can parse file xml that contains japanese character, but when I reference   DocumentBuilder.parse with "japanese folder path" I receipt an IOException like below.

The filePath is in japanese: "c:\Çü¿\symbol.xml" (translate to English is: "c:\data\symbol.xml")

Are there any solutions for these?

Thanks CEHJ & objects for your replies and patient :)
java.net.MalformedURLException: unknown protocol: c
	at java.net.URL.<init>(Unknown Source)
	at java.net.URL.<init>(Unknown Source)
	at java.net.URL.<init>(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLEntityManager.setupCurrentEntity(Unknown Source)
	at com.sun.org.apache.xerces.internal.impl.XMLVersionDetector.determineDocVersion(Unknown Source)
	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
	at com.sun.org.apache.xerces.internal.parsers.XML11Configuration.parse(Unknown Source)
	at com.sun.org.apache.xerces.internal.parsers.XMLParser.parse(Unknown Source)
	at com.sun.org.apache.xerces.internal.parsers.DOMParser.parse(Unknown Source)
	at com.sun.org.apache.xerces.internal.jaxp.DocumentBuilderImpl.parse(Unknown Source)
	at javax.xml.parsers.DocumentBuilder.parse(Unknown Source)
	at Util.XMLSymbolParser.parseXmlFile(XMLSymbolParser.java:65)
	at Util.XMLSymbolParser.main(XMLSymbolParser.java:205)

Open in new window

ASKER CERTIFIED SOLUTION
Mick Barry

THIS SOLUTION ONLY AVAILABLE TO MEMBERS.
View this solution by signing up for a free trial.
Members can start a 7-Day free trial and enjoy unlimited access to the platform.
See Pricing Options
Start Free Trial
GET A PERSONALIZED SOLUTION
Ask your own question & get feedback from real experts
Find out why thousands trust the EE community with their toughest problems.
CEHJ

>>Are there any solutions for these?

1. Are you sure that the path is appearing correctly in your editor?
2. Is that file with the exactly correct path visible in Explorer?
⚡ FREE TRIAL OFFER
Try out a week of full access for free.
Find out why thousands trust the EE community with their toughest problems.
ankata

ASKER
yes, that it
   1. The path is appearing correctly in my editor
   2. FilePath is visible in Explorer