Read HtML Entity charcter from xml file

I Have Xml File :  where one tag is <copyright ownership="publisher">&#x000A9; Manoj Kumar corporation.</copyright>:and i want to read that value.but issue is  when i reading &#x000A9 this charcter it print sone special symbol :but i donot wants :i want to print on console same as in xml like:&#x000A9; Manoj Kumar corporation.  
so can u help me :i am using for xml parsing DOM
Regards
Manoj Kumar

My code is:
 DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
	        DocumentBuilder builder = factory.newDocumentBuilder();
	         Document d = builder.parse(file);
	        d.getDocumentElement().normalize();
	        NodeList  ndlist1= d.getElementsByTagName("title");
	    	for (int i = 0; i < ndlist1.getLength(); i++) {
	            Node node = ndlist1.item(i);

	            String attValue = "";
	           attValue = node.getTextContent();
	            System.out.println("val=" + attValue);
	            
	         	           
	           
	     	}

Open in new window

manoj kumarAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Gertone (Geert Bormans)Information ArchitectCommented:
basically, if you want to put '&#x000A9;' on the console, you will need to reconstruct it
this is a numeric character entity pointing to the character number 169 (hex 000A9) of the unicode standard,
being the copyright sign
after building the dom tree in a parsing step, the parser is required as per XML recommendation,
to transform the entity into a UTF8 representation of the copyright sign,
and any following step will pick the UTF8 encoded character up and render it as expected
if you need to reconstruct the entity in your getTextContext method by mapping back
I don't think you can tell your parser not to resolve the entity
you could avoid the serialisation into a copyright sign by setting the output encoding to ASCII if your processor supports that
copyright false out of the ascii range so you force the processor to revert to entities, but that might result in &#169; instead
(both are equal in an XML context))
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
manoj kumarAuthor Commented:
can u  write some example hot to read  that value from xml file ...
0
manoj kumarAuthor Commented:
can u tell me the way how toread that charcter:
0
Build an E-Commerce Site with Angular 5

Learn how to build an E-Commerce site with Angular 5, a JavaScript framework used by developers to build web, desktop, and mobile applications.

ste5anSenior DeveloperCommented:
What is your use-case? As you said HTML entities.. I guess the problem is more in creating your XML. If you want to preserve some (HTML) entities, then you MUST use the CDATA element:

namespace ConsoleCS
{
    using System;
    using System.Xml.Linq;
    using System.Xml.XPath;

    class Program
    {
        static void Main(string[] args)
        {
            XDocument document = ReadEncodedDocument();
            XElement copyright = document.XPathSelectElement("copyright");
            Console.WriteLine("Element contains: '{0}'", copyright.Value);

            document = ReadCdataDocument();
            copyright = document.XPathSelectElement("copyright");
            Console.WriteLine("Element contains: '{0}'", copyright.Value);

            Console.WriteLine("Done.");
            Console.ReadLine();
        }

        static XDocument ReadCdataDocument()
        {
            // Mock.
            XDocument result = XDocument.Parse("<copyright><![CDATA[&#x000A9; Manoj Kumar corporation.]]></copyright>");
            return result;
        }

        static XDocument ReadEncodedDocument()
        {
            // Mock.
            XDocument result = XDocument.Parse("<copyright>&#x000A9; Manoj Kumar corporation.</copyright>");
            return result;
        }
    }
}

Open in new window


This is just how XML encodings work. The only other approach would be using your own "parsing" based on string reading.
0
manoj kumarAuthor Commented:
Sir actally i have already xml file is there:i cannot modlified that xml file :i just want when  any HTML Enity paresend Enside Node.Simply skip it .And i am using java with xml (DOM).Not >net,So How can i do that in dom parser
0
manoj kumarAuthor Commented:
i have lots of data in xml i couldnot find any tag with given id or value.
0
ste5anSenior DeveloperCommented:
First of all: All bold is as bad as ALL CAPS. It is considered yelling. Turn this off.

Well, you should have tagged your post Java.. here you need to look at your XML reader. Does it specify a hook for entity translation?

It is a boundary from the XML perspective . When the author intended to post an entity as entity code, than he would have used CDATA.

Thus again: What is your use-case?
0
Gertone (Geert Bormans)Information ArchitectCommented:
What you need to do is all described in my previous comment.
Please read that, try to understand why this happens, from that post and reconsider why you need to reconstruct the entity
So before you are going to write some code to restore it, I believe it smart to think about why you want that,
and maybe we can get the info prior to parsing if you really need that
0
manoj kumarAuthor Commented:
sir.actally  i have to requirment whenever you find This(&#x000A9;) type of Entity in xml file u can simply Skip that:
0
Gertone (Geert Bormans)Information ArchitectCommented:
and yes for not yelling :-)
0
manoj kumarAuthor Commented:
What it means  for not yelling
0
Gertone (Geert Bormans)Information ArchitectCommented:
as I explained before and also ste5an tries to tell you
the character entity will be resolved by your document builder
so in java code when reading text nodes, you need to replace "©" with a replacement value
for the parsed XML the entity is gone and is now a "©"
and you can only get it back by replacing it back again
0
Gertone (Geert Bormans)Information ArchitectCommented:
What it means  for not yelling
stop using the bold
0
Gertone (Geert Bormans)Information ArchitectCommented:
I would like to make clear that Experts Exchange is about us trying to help you help yourself.
- So I explain what is going on
- You try to understand why this happens and what you can do about it
- then you write the code, not me
0
manoj kumarAuthor Commented:
Geert Bormans say that  "so in java code when reading text nodes, you need to replace "©" with a replacement value" .how can i do that if my xml file contail lots of data this is not a good programing practice  simply i wants how to find in xml file
0
ste5anSenior DeveloperCommented:
When you want to change the behavior how XML works, you need to look at the lower levels. Entity resolving happens when reading the XML document or stream.

As I'm not a Java guy, e.g. something IS_REPLACING_ENTITY_REFERENCES.

But the problem remains: You still cannot be sure whether the character entity encoding is there by intention or not.
0
Gertone (Geert Bormans)Information ArchitectCommented:
please don't teach me lessons on good coding practice
this is not a good programing practice
If you are reading an XML file when trying to reconstruct an entity is poor XML programming practice
Not being able to explain the use case for poor XML usage, is poor programming practice
So my poor programming practice was just trying to give you a hack for a poor requirement, I deserve more respect

I tried to explain why this is difficult and should not be done,
Two experts now have explained to you multiple times that
- once the XML is parsed the entity is gone (resolved)
    + you need a good reason to have a requirement as you have (where is the usecase?)
    + you need to tweak the parser, not the java code to get what you want

You are a hard listener, good luck
0
Gertone (Geert Bormans)Information ArchitectCommented:
long thread, at the end of the day, all required is in this very first comment
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
XML

From novice to tech pro — start learning today.