How to convert HTML special chars (&) to ASCII (&)?

Posted on 2003-03-16
Medium Priority
Last Modified: 2012-05-04
Hi, I'm parsing a HTML page. I have removed all html tags using a StringTokenizer. Is there an easy way to convert all special characters like &(&) and "(") without having an if statement for each character? Any advice would be helpful!

Question by:SimonD
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
LVL 86

Accepted Solution

CEHJ earned 300 total points
ID: 8147029
I would use a Hashtable for this, although you should check to see if there's not already some translation code for this (i don't know of any):

Hashtable entities = new Hashtable();
entities.put("&", "&");

Look them up as follows

String entityKey = inputEntity.toLowerCase();
String character = (String)entities.get(entityKey);

Expert Comment

ID: 8147376
I dont know any translation code either ( you'd think there should be some though ).

Easy? No. The thing that makes this complicated is that you can't switch off the String in Java. However, you CAN switch off the String.hashCode(). The Hashtable uses Object.hashCode() from the key, so this should be equally as safe as using the Hashtable method.

CEHJ's idea is good, but since you would be guaranteed the data type consistency in the underlying tables, and they will be fairly static, and you would really want 2 tables so you could do reverse lookups to do java.lang.String->HTML as well as HTML->java.lang.String, it might be more efficient to implement your own hash mapping.

It would be a pain to write, but once written, would be worthwhile as far as efficiency and able to be used for both HTML input AND output. Makes you wonder why something like this doesnt exist... Maybe noone's willing to create a java class around something as shaky as HTML. I know I would hate to be responsible for it. ;)

Expert Comment

ID: 8147641
If you are working with JDK 1.4 then you can use the String.replaceAll method.
Lets say you hold the HTML in a String variable 'HTML':

HTML = HTML.replaceAll("&","&");
and you do that for every special char.

Now, I must say that this is a VERY not efficient way, but I think it will work.

Author Comment

ID: 8170107
Thanks for your suggestions!

Featured Post

On Demand Webinar: Networking for the Cloud Era

Did you know SD-WANs can improve network connectivity? Check out this webinar to learn how an SD-WAN simplified, one-click tool can help you migrate and manage data in the cloud.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Are you developing a Java application and want to create Excel Spreadsheets? You have come to the right place, this article will describe how you can create Excel Spreadsheets from a Java Application. For the purposes of this article, I will be u…
Java Flight Recorder and Java Mission Control together create a complete tool chain to continuously collect low level and detailed runtime information enabling after-the-fact incident analysis. Java Flight Recorder is a profiling and event collectio…
This tutorial covers a practical example of lazy loading technique and early loading technique in a Singleton Design Pattern.
This video teaches viewers about errors in exception handling.
Suggested Courses
Course of the Month12 days, 18 hours left to enroll

777 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question