read Numeric Character Entities from xml file in java using dom parser

hi i am kumar:i have a issue in my java application:actually i wants to read xml file using dom parser:in xml file lots of Numeric Character Entities like(©ê) are parsent in diffrent place;i have a requirment to find that Numeric Character Entities with pattern :  whenever we fing we simply skip that node content element:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = factory.newDocumentBuilder();
         Document d = builder.parse(file);
        d.getDocumentElement().normalize();
NodeList  ndlist1= d.getElementsByTagName("title");
for (int i = 0; i < ndlist1.getLength(); i++) {
            Node node = ndlist1.item(i);
            String attValue = "";
           attValue = node.getTextContent();
            System.out.println("val=" + attValue);

Open in new window

:
when i print cpntent of node they can print some special charcter:so my pattern doesnot  find:so kindly help me to figure out this problem
manoj kumarAsked:
Who is Participating?

[Product update] Infrastructure Analysis Tool is now available with Business Accounts.Learn More

x
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

manoj kumarAuthor Commented:
my xml file is :<creator xml:id="mafi12169-cr-0001" creatorRole="author" affiliationRef="#mafi12169-aff-0001 #mafi12169-aff-0002"> <personName><givenNames>Amine&#x000ED;</givenNames><familyName>Ismail&#x000EC;</familyName></personName> </creator> <creator xml:id="mafi12169-cr-0002" creatorRole="author" affiliationRef="#mafi12169-aff-0002 #mafi12169-aff-0003" corresponding="yes"> <personName><givenNames>Huy&#x000EA;n</givenNames><familyName>Pham&#x0005E;</familyName>
0
Gertone (Geert Bormans)Information ArchitectCommented:
I am really puzzled why you open a new account and a new question, while this question simply is a diplicate from the question you failed to follow up yesterday

It has been explained to you that entities are resolved on parse, so they are gone after parsing
I still dont get the use case, it would be nice to know

If you need to catch the entities, you need to do before parsing
I would suggest that before you create the dom object, you read the xml as a text stream
And replace using regex all the entities and make an obvious marker so you can catch them back after parsing
This is a common practice with xml development
0
manoj kumarAuthor Commented:
your level of english is to hard,canot understand all things:can u user simple english word or convert to hindi
0
Build an E-Commerce Site with Angular 5

Learn how to build an E-Commerce site with Angular 5, a JavaScript framework used by developers to build web, desktop, and mobile applications.

CEHJCommented:
i have a requirment to find that Numeric Character Entities with pattern
Why? And ... how are you doing that? Please give an example
0
David Johnson, CD, MVPOwnerCommented:
no points hindi translaton

मैं वास्तव में परेशान हूं कि आप एक नया खाता और एक नया सवाल क्यों खोलते हैं, जबकि यह प्रश्न कल उस प्रश्न से एक राजनयिक है जिसे आप कल पालन करने में विफल रहे

यह आपको समझाया गया है कि संस्थाओं को पार्स पर हल किया गया है, इसलिए वे पार्सिंग के बाद चले गए हैं
मुझे अभी भी उपयोग का मामला नहीं मिला है, यह जानना अच्छा लगेगा

यदि आपको संस्थाओं को पकड़ने की आवश्यकता है, तो आपको पार्सिंग से पहले करना होगा
मैं सुझाव दूंगा कि आप डोम ऑब्जेक्ट बनाने से पहले, आप एक्सएमएल को टेक्स्ट स्ट्रीम के रूप में पढ़ते हैं
और सभी इकाइयों को regex का उपयोग करके प्रतिस्थापित करें और एक स्पष्ट मार्कर बनाएं ताकि आप पार्सिंग के बाद उन्हें वापस पकड़ सकें
यह एक्सएमएल विकास के साथ एक आम प्रथा है
टिप्पणी की रिपोर्ट करें
main vaastav mein pareshaan hoon ki aap ek naya khaata aur ek naya savaal kyon kholate hain, jabaki yah prashn kal us prashn se ek raajanayik hai jise aap kal paalan karane mein viphal rahe

yah aapako samajhaaya gaya hai ki sansthaon ko paars par hal kiya gaya hai, isalie ve paarsing ke baad chale gae hain
mujhe abhee bhee upayog ka maamala nahin mila hai, yah jaanana achchha lagega

yadi aapako sansthaon ko pakadane kee aavashyakata hai, to aapako paarsing se pahale karana hoga
main sujhaav doonga ki aap dom objekt banaane se pahale, aap eksemel ko tekst streem ke roop mein padhate hain
aur sabhee ikaiyon ko raigaix ka upayog karake pratisthaapit karen aur ek spasht maarkar banaen taaki aap paarsing ke baad unhen vaapas pakad saken
yah eksemel vikaas ke saath ek aam pratha hai
tippanee kee riport karen
0
manoj kumarAuthor Commented:
sir as u told me  that u can do all things before creating dom object:it's working but issue is there that my requirment is to find a pariticular node get the node content after that they will cheak:if i am doing all things before creating dom object then not able to solved it:
0
Gertone (Geert Bormans)Information ArchitectCommented:
No, that is not what I said exactly
If you do a regular expression before dom creation, that transforms this
"&#x000ED;"
into
"[[#x000ED]]"
(just as an example)
then the parser will not resolve the entity
because it is no longer an entity
After that you can parse
and in the text node processing you can do regex again on node content
and filter out the [[#x000ED]] and report it

This is not an elegant solution, though it is common practice
But for a better solution you need to explain why you want to do this
0
manoj kumarAuthor Commented:
aapki English mujhe samaj nahi aa rahi hi:
0
CEHJCommented:
हमें अनुवाद करने की उम्मीद मत करो। यह स्वयं करो

== "Don't expect us to translate. Do it yourself"
0
manoj kumarAuthor Commented:
Hello Brother:can u write some step to solve my issue:please use simple English:i donot know where are u from:i am not so good in english so make it simple:from your side lots of person giving suggestion but till now i am not getting solution due to your english word:
0
CEHJCommented:
Answer our questions please
0
Gertone (Geert Bormans)Information ArchitectCommented:
please write me solution i can sell to customer without understanding
0
manoj kumarAuthor Commented:
language issue is there from me and your side u know english and i know hindi:i can already explain my issue: so plz figure out  and wite some step to so that i can understand and solve my issue:
0
Gertone (Geert Bormans)Information ArchitectCommented:
step 1:
- read text stream
- replace all "&#x000ED;" with "[[#x000ED]]"
- use regular expression &#x([a-fA-F0-9]+);

step 2:
- parse result

step 3
- do string operations you need on node.getTextContent()
- use regular expression \[\[#x([a-fA-F0-9]+)\]\]
0

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
manoj kumarAuthor Commented:
Your Solution is work sucessfully thank you:from yestarday i wants only that step:so thank for all Expertexchange Team
0
manoj kumarAuthor Commented:
Thank u very much
0
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Java

From novice to tech pro — start learning today.