read Numeric Character Entities from xml file in java using dom parser

hi i am kumar:i have a issue in my java application:actually i wants to read xml file using dom parser:in xml file lots of Numeric Character Entities like(©ê) are parsent in diffrent place;i have a requirment to find that Numeric Character Entities with pattern :  whenever we fing we simply skip that node content element:
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
        DocumentBuilder builder = factory.newDocumentBuilder();
         Document d = builder.parse(file);
        d.getDocumentElement().normalize();
NodeList  ndlist1= d.getElementsByTagName("title");
for (int i = 0; i < ndlist1.getLength(); i++) {
            Node node = ndlist1.item(i);
            String attValue = "";
           attValue = node.getTextContent();
            System.out.println("val=" + attValue);

Open in new window

:
when i print cpntent of node they can print some special charcter:so my pattern doesnot  find:so kindly help me to figure out this problem
manoj kumarAsked:
Who is Participating?
 
Geert BormansConnect With a Mentor Information ArchitectCommented:
step 1:
- read text stream
- replace all "&#x000ED;" with "[[#x000ED]]"
- use regular expression &#x([a-fA-F0-9]+);

step 2:
- parse result

step 3
- do string operations you need on node.getTextContent()
- use regular expression \[\[#x([a-fA-F0-9]+)\]\]
0
 
manoj kumarAuthor Commented:
my xml file is :<creator xml:id="mafi12169-cr-0001" creatorRole="author" affiliationRef="#mafi12169-aff-0001 #mafi12169-aff-0002"> <personName><givenNames>Amine&#x000ED;</givenNames><familyName>Ismail&#x000EC;</familyName></personName> </creator> <creator xml:id="mafi12169-cr-0002" creatorRole="author" affiliationRef="#mafi12169-aff-0002 #mafi12169-aff-0003" corresponding="yes"> <personName><givenNames>Huy&#x000EA;n</givenNames><familyName>Pham&#x0005E;</familyName>
0
 
Geert BormansInformation ArchitectCommented:
I am really puzzled why you open a new account and a new question, while this question simply is a diplicate from the question you failed to follow up yesterday

It has been explained to you that entities are resolved on parse, so they are gone after parsing
I still dont get the use case, it would be nice to know

If you need to catch the entities, you need to do before parsing
I would suggest that before you create the dom object, you read the xml as a text stream
And replace using regex all the entities and make an obvious marker so you can catch them back after parsing
This is a common practice with xml development
0
Upgrade your Question Security!

Your question, your audience. Choose who sees your identity—and your question—with question security.

 
manoj kumarAuthor Commented:
your level of english is to hard,canot understand all things:can u user simple english word or convert to hindi
0
 
CEHJCommented:
i have a requirment to find that Numeric Character Entities with pattern
Why? And ... how are you doing that? Please give an example
0
 
David Johnson, CD, MVPConnect With a Mentor OwnerCommented:
no points hindi translaton

मैं वास्तव में परेशान हूं कि आप एक नया खाता और एक नया सवाल क्यों खोलते हैं, जबकि यह प्रश्न कल उस प्रश्न से एक राजनयिक है जिसे आप कल पालन करने में विफल रहे

यह आपको समझाया गया है कि संस्थाओं को पार्स पर हल किया गया है, इसलिए वे पार्सिंग के बाद चले गए हैं
मुझे अभी भी उपयोग का मामला नहीं मिला है, यह जानना अच्छा लगेगा

यदि आपको संस्थाओं को पकड़ने की आवश्यकता है, तो आपको पार्सिंग से पहले करना होगा
मैं सुझाव दूंगा कि आप डोम ऑब्जेक्ट बनाने से पहले, आप एक्सएमएल को टेक्स्ट स्ट्रीम के रूप में पढ़ते हैं
और सभी इकाइयों को regex का उपयोग करके प्रतिस्थापित करें और एक स्पष्ट मार्कर बनाएं ताकि आप पार्सिंग के बाद उन्हें वापस पकड़ सकें
यह एक्सएमएल विकास के साथ एक आम प्रथा है
टिप्पणी की रिपोर्ट करें
main vaastav mein pareshaan hoon ki aap ek naya khaata aur ek naya savaal kyon kholate hain, jabaki yah prashn kal us prashn se ek raajanayik hai jise aap kal paalan karane mein viphal rahe

yah aapako samajhaaya gaya hai ki sansthaon ko paars par hal kiya gaya hai, isalie ve paarsing ke baad chale gae hain
mujhe abhee bhee upayog ka maamala nahin mila hai, yah jaanana achchha lagega

yadi aapako sansthaon ko pakadane kee aavashyakata hai, to aapako paarsing se pahale karana hoga
main sujhaav doonga ki aap dom objekt banaane se pahale, aap eksemel ko tekst streem ke roop mein padhate hain
aur sabhee ikaiyon ko raigaix ka upayog karake pratisthaapit karen aur ek spasht maarkar banaen taaki aap paarsing ke baad unhen vaapas pakad saken
yah eksemel vikaas ke saath ek aam pratha hai
tippanee kee riport karen
0
 
manoj kumarAuthor Commented:
sir as u told me  that u can do all things before creating dom object:it's working but issue is there that my requirment is to find a pariticular node get the node content after that they will cheak:if i am doing all things before creating dom object then not able to solved it:
0
 
Geert BormansInformation ArchitectCommented:
No, that is not what I said exactly
If you do a regular expression before dom creation, that transforms this
"&#x000ED;"
into
"[[#x000ED]]"
(just as an example)
then the parser will not resolve the entity
because it is no longer an entity
After that you can parse
and in the text node processing you can do regex again on node content
and filter out the [[#x000ED]] and report it

This is not an elegant solution, though it is common practice
But for a better solution you need to explain why you want to do this
0
 
manoj kumarAuthor Commented:
aapki English mujhe samaj nahi aa rahi hi:
0
 
CEHJCommented:
हमें अनुवाद करने की उम्मीद मत करो। यह स्वयं करो

== "Don't expect us to translate. Do it yourself"
0
 
manoj kumarAuthor Commented:
Hello Brother:can u write some step to solve my issue:please use simple English:i donot know where are u from:i am not so good in english so make it simple:from your side lots of person giving suggestion but till now i am not getting solution due to your english word:
0
 
CEHJCommented:
Answer our questions please
0
 
Geert BormansInformation ArchitectCommented:
please write me solution i can sell to customer without understanding
0
 
manoj kumarAuthor Commented:
language issue is there from me and your side u know english and i know hindi:i can already explain my issue: so plz figure out  and wite some step to so that i can understand and solve my issue:
0
 
manoj kumarAuthor Commented:
Your Solution is work sucessfully thank you:from yestarday i wants only that step:so thank for all Expertexchange Team
0
 
manoj kumarAuthor Commented:
Thank u very much
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.