?
Solved

problem loading Unicode Xml file (no utf8)

Posted on 2003-03-04
2
Medium Priority
?
302 Views
Last Modified: 2010-07-27
I'm using Msxml2.DOMDocument and its method load() for loading xml documents.
it's work fine when loading asci files or UTF-8 files but when i want to load Unicode (16bit) files it want work.
I dont get any error message but my document seems empty with no nodes at all.

Here's my example xml:

<?xml version="1.0" encoding="UTF-8"?>
<!-- edited with XML Spy v4.3 U (http://www.xmlspy.com) by fonis (fon) -->
<novosti>
     <naslov-strane>Novosti</naslov-strane>
     <opis-strane>Najnovije informacije</opis-strane>
     <novost id="new">
          <naslov>Primeri za kolokvijum</naslov>
          <datum>6.12.2002</datum>
          <tekst>
               Postavljeni primeri za vežbanje kolokvijuma iz Baza Podataka koji æe se održati u subotu u 08:00
          </tekst>
          <jos type="xml" href="literatura-primeri.xml"/>
     </novost>    
</novosti>


Can someone give me solution ?
0
Comment
Question by:cubrovic
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
2 Comments
 
LVL 27

Accepted Solution

by:
BigRat earned 300 total points
ID: 8070918
I assume you are loading a FILE not a stream from a web server. In this case :-

1) replace the encoding="utf-8" with encoding="ucs-2"
  OR
   omit it totally

2) Ensure that the file is stored in Unicode format.

   On WinNT/Win2K use NotePad to store the file in Unicode format.

Files stored in Unicode format (on Intel machines) are stored in Lendian format (=little end). This means that each 16-bit character is stored as two characters with the lower valued byte (=wchar MOD 65536) first (lower address) and the higher valued byte (=wchar DIV 65536) second (higher address). Additionally the start of the file is marked by two bytes FF FE

The parser will notice the first two bytes and will decode the rest of the file properly.

Incidentally, utf-8 files have the first three bytes as EF BB BF.

HTH
0
 
LVL 7

Author Comment

by:cubrovic
ID: 8082260
Now it seems that working but i do it earlier (not ucs-2 but unicode) .I handle my xml files in xmlspy and it change it automaticly.Anyway if i have problems in future with it that will be new question with new points.
I thought that ms domDocument not working with 16 bit characters at all.
Thanks man.
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Browsing the questions asked to the Experts of this forum, you will be amazed to see how many times people are headaching about monster regular expressions (regex) to select that specific part of some HTML or XML file they want to extract. The examp…
Create a Windows 10 custom Image with custom task bar and custom start menu using XML for deployment.
In this brief tutorial Pawel from AdRem Software explains how you can quickly find out which services are running on your network, or what are the IP addresses of servers responsible for each service. Software used is freeware NetCrunch Tools (https…
How to fix incompatible JVM issue while installing Eclipse While installing Eclipse in windows, got one error like above and unable to proceed with the installation. This video describes how to successfully install Eclipse. How to solve incompa…
Suggested Courses

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question