Read HtML Entity charcter from xml file

I Have Xml File :  where one tag is <copyright ownership="publisher">&#x000A9; Manoj Kumar corporation.</copyright>:and i want to read that value.but issue is  when i reading &#x000A9 this charcter it print sone special symbol :but i donot wants :i want to print on console same as in xml like:&#x000A9; Manoj Kumar corporation.  
so can u help me :i am using for xml parsing DOM
Regards
Manoj Kumar

My code is:
 DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
	        DocumentBuilder builder = factory.newDocumentBuilder();
	         Document d = builder.parse(file);
	        d.getDocumentElement().normalize();
	        NodeList  ndlist1= d.getElementsByTagName("title");
	    	for (int i = 0; i < ndlist1.getLength(); i++) {
	            Node node = ndlist1.item(i);

	            String attValue = "";
	           attValue = node.getTextContent();
	            System.out.println("val=" + attValue);
	            
	         	           
	           
	     	}

Open in new window

manoj kumarAsked:
Who is Participating?
 
Geert BormansInformation ArchitectCommented:
basically, if you want to put '&#x000A9;' on the console, you will need to reconstruct it
this is a numeric character entity pointing to the character number 169 (hex 000A9) of the unicode standard,
being the copyright sign
after building the dom tree in a parsing step, the parser is required as per XML recommendation,
to transform the entity into a UTF8 representation of the copyright sign,
and any following step will pick the UTF8 encoded character up and render it as expected
if you need to reconstruct the entity in your getTextContext method by mapping back
I don't think you can tell your parser not to resolve the entity
you could avoid the serialisation into a copyright sign by setting the output encoding to ASCII if your processor supports that
copyright false out of the ascii range so you force the processor to revert to entities, but that might result in &#169; instead
(both are equal in an XML context))
0
 
manoj kumarAuthor Commented:
can u  write some example hot to read  that value from xml file ...
0
 
manoj kumarAuthor Commented:
can u tell me the way how toread that charcter:
0
Cloud Class® Course: Microsoft Windows 7 Basic

This introductory course to Windows 7 environment will teach you about working with the Windows operating system. You will learn about basic functions including start menu; the desktop; managing files, folders, and libraries.

 
ste5anSenior DeveloperCommented:
What is your use-case? As you said HTML entities.. I guess the problem is more in creating your XML. If you want to preserve some (HTML) entities, then you MUST use the CDATA element:

namespace ConsoleCS
{
    using System;
    using System.Xml.Linq;
    using System.Xml.XPath;

    class Program
    {
        static void Main(string[] args)
        {
            XDocument document = ReadEncodedDocument();
            XElement copyright = document.XPathSelectElement("copyright");
            Console.WriteLine("Element contains: '{0}'", copyright.Value);

            document = ReadCdataDocument();
            copyright = document.XPathSelectElement("copyright");
            Console.WriteLine("Element contains: '{0}'", copyright.Value);

            Console.WriteLine("Done.");
            Console.ReadLine();
        }

        static XDocument ReadCdataDocument()
        {
            // Mock.
            XDocument result = XDocument.Parse("<copyright><![CDATA[&#x000A9; Manoj Kumar corporation.]]></copyright>");
            return result;
        }

        static XDocument ReadEncodedDocument()
        {
            // Mock.
            XDocument result = XDocument.Parse("<copyright>&#x000A9; Manoj Kumar corporation.</copyright>");
            return result;
        }
    }
}

Open in new window


This is just how XML encodings work. The only other approach would be using your own "parsing" based on string reading.
0
 
manoj kumarAuthor Commented:
Sir actally i have already xml file is there:i cannot modlified that xml file :i just want when  any HTML Enity paresend Enside Node.Simply skip it .And i am using java with xml (DOM).Not >net,So How can i do that in dom parser
0
 
manoj kumarAuthor Commented:
i have lots of data in xml i couldnot find any tag with given id or value.
0
 
ste5anSenior DeveloperCommented:
First of all: All bold is as bad as ALL CAPS. It is considered yelling. Turn this off.

Well, you should have tagged your post Java.. here you need to look at your XML reader. Does it specify a hook for entity translation?

It is a boundary from the XML perspective . When the author intended to post an entity as entity code, than he would have used CDATA.

Thus again: What is your use-case?
0
 
Geert BormansInformation ArchitectCommented:
What you need to do is all described in my previous comment.
Please read that, try to understand why this happens, from that post and reconsider why you need to reconstruct the entity
So before you are going to write some code to restore it, I believe it smart to think about why you want that,
and maybe we can get the info prior to parsing if you really need that
0
 
manoj kumarAuthor Commented:
sir.actally  i have to requirment whenever you find This(&#x000A9;) type of Entity in xml file u can simply Skip that:
0
 
Geert BormansInformation ArchitectCommented:
and yes for not yelling :-)
0
 
manoj kumarAuthor Commented:
What it means  for not yelling
0
 
Geert BormansInformation ArchitectCommented:
as I explained before and also ste5an tries to tell you
the character entity will be resolved by your document builder
so in java code when reading text nodes, you need to replace "©" with a replacement value
for the parsed XML the entity is gone and is now a "©"
and you can only get it back by replacing it back again
0
 
Geert BormansInformation ArchitectCommented:
What it means  for not yelling
stop using the bold
0
 
Geert BormansInformation ArchitectCommented:
I would like to make clear that Experts Exchange is about us trying to help you help yourself.
- So I explain what is going on
- You try to understand why this happens and what you can do about it
- then you write the code, not me
0
 
manoj kumarAuthor Commented:
Geert Bormans say that  "so in java code when reading text nodes, you need to replace "©" with a replacement value" .how can i do that if my xml file contail lots of data this is not a good programing practice  simply i wants how to find in xml file
0
 
ste5anSenior DeveloperCommented:
When you want to change the behavior how XML works, you need to look at the lower levels. Entity resolving happens when reading the XML document or stream.

As I'm not a Java guy, e.g. something IS_REPLACING_ENTITY_REFERENCES.

But the problem remains: You still cannot be sure whether the character entity encoding is there by intention or not.
0
 
Geert BormansInformation ArchitectCommented:
please don't teach me lessons on good coding practice
this is not a good programing practice
If you are reading an XML file when trying to reconstruct an entity is poor XML programming practice
Not being able to explain the use case for poor XML usage, is poor programming practice
So my poor programming practice was just trying to give you a hack for a poor requirement, I deserve more respect

I tried to explain why this is difficult and should not be done,
Two experts now have explained to you multiple times that
- once the XML is parsed the entity is gone (resolved)
    + you need a good reason to have a requirement as you have (where is the usecase?)
    + you need to tweak the parser, not the java code to get what you want

You are a hard listener, good luck
0
 
Geert BormansInformation ArchitectCommented:
long thread, at the end of the day, all required is in this very first comment
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.