zeinth
asked on
Python: lxml: encoding
========================== ======= testxml.py ====================
somexmldata = """<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<res xmlns="http://www.abcxp.com">
<jx xmlns="" xsi:type="typens:output">
<fx xsi:type="name:Fields">
<FA xsi:type="name:ArrayOfFiel d">
<Field xsi:type="name:Field">
<Name>machine1</Name>
<Type>xp</Type>
<Length>4</Length>
<foreignchar>3¿me Arrondissement</foreigncha r>
</Field>
<Field xsi:type="name:Field">
<Name>IDFNDFIELD</Name>
<Type>win7</Type>
<Length>10</Length>
<foreignchar>20ème Arrondissement P</foreignchar>
</Field>
</FA>
</fx>
</jx>
</res>
</soap:Body>
</soap:Envelope> """
root = etree.fromstring(somexmlda ta)
print (etree.tostring.root)
========================== ===== Script testxml.py ==============
When I am running the above testxml.py script, then I am getting an error that
"ValueError: Unicode strings with encoding declaration are not supported."
how I can pass a xml file which will have "Unicoded strings with encoding" to lxml XML parser ?
Thanks!
somexmldata = """<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<res xmlns="http://www.abcxp.com">
<jx xmlns="" xsi:type="typens:output">
<fx xsi:type="name:Fields">
<FA xsi:type="name:ArrayOfFiel
<Field xsi:type="name:Field">
<Name>machine1</Name>
<Type>xp</Type>
<Length>4</Length>
<foreignchar>3¿me Arrondissement</foreigncha
</Field>
<Field xsi:type="name:Field">
<Name>IDFNDFIELD</Name>
<Type>win7</Type>
<Length>10</Length>
<foreignchar>20ème Arrondissement P</foreignchar>
</Field>
</FA>
</fx>
</jx>
</res>
</soap:Body>
</soap:Envelope> """
root = etree.fromstring(somexmlda
print (etree.tostring.root)
==========================
When I am running the above testxml.py script, then I am getting an error that
"ValueError: Unicode strings with encoding declaration are not supported."
how I can pass a xml file which will have "Unicoded strings with encoding" to lxml XML parser ?
Thanks!
Are you using Python 3 or Python 2?
If you do not need lxml for some serious reason, you can use the built-in xml.etree. Fix your last command. You should also try to write the result to a file as your console may not be capable to display some characters:
#!python3
import xml.etree.ElementTree as ET
somexmldata = """<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<res xmlns="http://www.abcxp.com">
<jx xmlns="" xsi:type="typens:output">
<fx xsi:type="name:Fields">
<FA xsi:type="name:ArrayOfField">
<Field xsi:type="name:Field">
<Name>machine1</Name>
<Type>xp</Type>
<Length>4</Length>
<foreignchar>3¿me Arrondissement</foreignchar>
</Field>
<Field xsi:type="name:Field">
<Name>IDFNDFIELD</Name>
<Type>win7</Type>
<Length>10</Length>
<foreignchar>20ème Arrondissement P</foreignchar>
</Field>
</FA>
</fx>
</jx>
</res>
</soap:Body>
</soap:Envelope> """
root = ET.fromstring(somexmldata)
with open('output.xml', 'w', encoding='utf-8') as f:
f.write(ET.tostring(root, encoding='unicode'))
# the 'unicode' leads to the unicode string result
print(ET.tostring(root, encoding='ascii'))
# the 'ascii' leads to the stream-of-bytes result (i.e. bytes type)
I have lxml installed only for Python 2.7. Then the same code for Python 2 and lxml looks like this:
#!python2
# -*- coding: utf-8 -*-
from lxml import etree
somexmldata = """<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xs="http://www.w3.org/2001/XMLSchema" xmlns:xsd="http://www.w3.org/2001/XMLSchema">
<soap:Body>
<res xmlns="http://www.abcxp.com">
<jx xmlns="" xsi:type="typens:output">
<fx xsi:type="name:Fields">
<FA xsi:type="name:ArrayOfField">
<Field xsi:type="name:Field">
<Name>machine1</Name>
<Type>xp</Type>
<Length>4</Length>
<foreignchar>3¿me Arrondissement</foreignchar>
</Field>
<Field xsi:type="name:Field">
<Name>IDFNDFIELD</Name>
<Type>win7</Type>
<Length>10</Length>
<foreignchar>20ème Arrondissement P</foreignchar>
</Field>
</FA>
</fx>
</jx>
</res>
</soap:Body>
</soap:Envelope> """
root = etree.fromstring(somexmldata)
with open('output.xml', 'w') as f:
f.write(etree.tostring(root, encoding='utf-8'))
print etree.tostring(root, encoding='ascii')
ASKER
Sorry for late reply, Thanks pepr for the help, Actually on my machine python 3 and lxml are installed. And I am looking for solution using lxml parser.
I tried to run your lxml code in my machine (which has python 3), then I am getting this error:
============= Error message from machine ========
root = etree.fromstring(somexmlda ta)
File "lxml.etree.pyx", line 2969, in lxml.etree.fromstring (src\lxml\lxml.etree.c:617 29)
File "parser.pxi", line 1585, in lxml.etree._parseMemoryDoc ument (src\lxml\lxml.etree.c:911 31)
ValueError: Unicode strings with encoding declaration are not supported.
========================== ========== =======
After looking into this error, can you give me some suggestions here so that we can fix my code ..... Thanks!
I tried to run your lxml code in my machine (which has python 3), then I am getting this error:
============= Error message from machine ========
root = etree.fromstring(somexmlda
File "lxml.etree.pyx", line 2969, in lxml.etree.fromstring (src\lxml\lxml.etree.c:617
File "parser.pxi", line 1585, in lxml.etree._parseMemoryDoc
ValueError: Unicode strings with encoding declaration are not supported.
==========================
After looking into this error, can you give me some suggestions here so that we can fix my code ..... Thanks!
As I cannot simulate it exactly, I can only guess that you should remove the first line with the <?xml version="1.0" encoding="utf-8"?>. This is the line that declares encoding. It makes sense with .fromstring() because it expects UNICODE string where any encoding declaration makes no sense.
It makes sense if the XML content is stored in a file. Then you should call:
It makes sense if the XML content is stored in a file. Then you should call:
root = etree.parse("myfile.xml")
Then the encoding declaration inside makes sense.
ASKER
Now I tried this code :
==========================
f = open("somexmldata.xml", "w")
f.write(somexmldata)
f.close()
tree = etree.parse("somexmldata.x ml")
Now, I am getting this error from above code:
========================== ========== ====
Traceback (most recent call last):
File "C:\test2.py", line 38, in <module>
f.write(somexmldata)
File "C:\Python33\lib\encodings \cp1252.py ", line 19, in encode
return codecs.charmap_encode(inpu t,self.err ors,encodi ng_table)[ 0]
UnicodeEncodeError: 'charmap' codec can't encode character '\ufffd' in position 709: character maps to <undefined>
I think, if I can write xml file to disk in "UTF-8" encoding
f.write(somexmldata) # How to write xml file in disk in utf-8 encoding
Then, I think this piece of code "root = etree.parse("myfile.xml")" will work in Python 3
Thanks!
==========================
f = open("somexmldata.xml", "w")
f.write(somexmldata)
f.close()
tree = etree.parse("somexmldata.x
Now, I am getting this error from above code:
==========================
Traceback (most recent call last):
File "C:\test2.py", line 38, in <module>
f.write(somexmldata)
File "C:\Python33\lib\encodings
return codecs.charmap_encode(inpu
UnicodeEncodeError: 'charmap' codec can't encode character '\ufffd' in position 709: character maps to <undefined>
I think, if I can write xml file to disk in "UTF-8" encoding
f.write(somexmldata) # How to write xml file in disk in utf-8 encoding
Then, I think this piece of code "root = etree.parse("myfile.xml")"
Thanks!
ASKER CERTIFIED SOLUTION
membership
Create a free account to see this answer
Signing up is free and takes 30 seconds. No credit card required.
ASKER
Wonderful, now the code is working, thanks!
========================== ========== ========
with open('somexmldata.xml', 'w', encoding='utf-8') as f:
f.write(somexmldata)
f.close()
tree = etree.parse("somexmldata.x ml")
print (tree)
========================== =========
A side note: I suggest to get used to the with construct...
>> I remember this suggestion .....
Thanks!
==========================
with open('somexmldata.xml', 'w', encoding='utf-8') as f:
f.write(somexmldata)
f.close()
tree = etree.parse("somexmldata.x
print (tree)
==========================
A side note: I suggest to get used to the with construct...
>> I remember this suggestion .....
Thanks!
When using the with construct, the f.close() is called automatically. This is the reason why the construct was introduced (not only for files; it is generally used for objects of classes that implement that kind of finalisation).
You can still use open/close pair of functions (without the with), but it is more error-prone.
You can still use open/close pair of functions (without the with), but it is more error-prone.