Link to home
Start Free TrialLog in
Avatar of manoj kumar
manoj kumar

asked on

Read HtML Entity charcter from xml file

I Have Xml File :  where one tag is <copyright ownership="publisher">&#x000A9; Manoj Kumar corporation.</copyright>:and i want to read that value.but issue is  when i reading &#x000A9 this charcter it print sone special symbol :but i donot wants :i want to print on console same as in xml like:&#x000A9; Manoj Kumar corporation.  
so can u help me :i am using for xml parsing DOM
Regards
Manoj Kumar

My code is:
 DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
	        DocumentBuilder builder = factory.newDocumentBuilder();
	         Document d = builder.parse(file);
	        d.getDocumentElement().normalize();
	        NodeList  ndlist1= d.getElementsByTagName("title");
	    	for (int i = 0; i < ndlist1.getLength(); i++) {
	            Node node = ndlist1.item(i);

	            String attValue = "";
	           attValue = node.getTextContent();
	            System.out.println("val=" + attValue);
	            
	         	           
	           
	     	}

Open in new window

ASKER CERTIFIED SOLUTION
Avatar of Gertone (Geert Bormans)
Gertone (Geert Bormans)
Flag of Belgium image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of manoj kumar
manoj kumar

ASKER

can u  write some example hot to read  that value from xml file ...
can u tell me the way how toread that charcter:
What is your use-case? As you said HTML entities.. I guess the problem is more in creating your XML. If you want to preserve some (HTML) entities, then you MUST use the CDATA element:

namespace ConsoleCS
{
    using System;
    using System.Xml.Linq;
    using System.Xml.XPath;

    class Program
    {
        static void Main(string[] args)
        {
            XDocument document = ReadEncodedDocument();
            XElement copyright = document.XPathSelectElement("copyright");
            Console.WriteLine("Element contains: '{0}'", copyright.Value);

            document = ReadCdataDocument();
            copyright = document.XPathSelectElement("copyright");
            Console.WriteLine("Element contains: '{0}'", copyright.Value);

            Console.WriteLine("Done.");
            Console.ReadLine();
        }

        static XDocument ReadCdataDocument()
        {
            // Mock.
            XDocument result = XDocument.Parse("<copyright><![CDATA[&#x000A9; Manoj Kumar corporation.]]></copyright>");
            return result;
        }

        static XDocument ReadEncodedDocument()
        {
            // Mock.
            XDocument result = XDocument.Parse("<copyright>&#x000A9; Manoj Kumar corporation.</copyright>");
            return result;
        }
    }
}

Open in new window


This is just how XML encodings work. The only other approach would be using your own "parsing" based on string reading.
Sir actally i have already xml file is there:i cannot modlified that xml file :i just want when  any HTML Enity paresend Enside Node.Simply skip it .And i am using java with xml (DOM).Not >net,So How can i do that in dom parser
i have lots of data in xml i couldnot find any tag with given id or value.
First of all: All bold is as bad as ALL CAPS. It is considered yelling. Turn this off.

Well, you should have tagged your post Java.. here you need to look at your XML reader. Does it specify a hook for entity translation?

It is a boundary from the XML perspective . When the author intended to post an entity as entity code, than he would have used CDATA.

Thus again: What is your use-case?
What you need to do is all described in my previous comment.
Please read that, try to understand why this happens, from that post and reconsider why you need to reconstruct the entity
So before you are going to write some code to restore it, I believe it smart to think about why you want that,
and maybe we can get the info prior to parsing if you really need that
sir.actally  i have to requirment whenever you find This(&#x000A9;) type of Entity in xml file u can simply Skip that:
and yes for not yelling :-)
What it means  for not yelling
as I explained before and also ste5an tries to tell you
the character entity will be resolved by your document builder
so in java code when reading text nodes, you need to replace "©" with a replacement value
for the parsed XML the entity is gone and is now a "©"
and you can only get it back by replacing it back again
What it means  for not yelling
stop using the bold
I would like to make clear that Experts Exchange is about us trying to help you help yourself.
- So I explain what is going on
- You try to understand why this happens and what you can do about it
- then you write the code, not me
Geert Bormans say that  "so in java code when reading text nodes, you need to replace "©" with a replacement value" .how can i do that if my xml file contail lots of data this is not a good programing practice  simply i wants how to find in xml file
When you want to change the behavior how XML works, you need to look at the lower levels. Entity resolving happens when reading the XML document or stream.

As I'm not a Java guy, e.g. something IS_REPLACING_ENTITY_REFERENCES.

But the problem remains: You still cannot be sure whether the character entity encoding is there by intention or not.
please don't teach me lessons on good coding practice
this is not a good programing practice
If you are reading an XML file when trying to reconstruct an entity is poor XML programming practice
Not being able to explain the use case for poor XML usage, is poor programming practice
So my poor programming practice was just trying to give you a hack for a poor requirement, I deserve more respect

I tried to explain why this is difficult and should not be done,
Two experts now have explained to you multiple times that
- once the XML is parsed the entity is gone (resolved)
    + you need a good reason to have a requirement as you have (where is the usecase?)
    + you need to tweak the parser, not the java code to get what you want

You are a hard listener, good luck
long thread, at the end of the day, all required is in this very first comment