Want to protect your cyber security and still get fast solutions? Ask a secure question today.Go Premium

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 501
  • Last Modified:

How to convert HTML special chars (&) to ASCII (&)?

Hi, I'm parsing a HTML page. I have removed all html tags using a StringTokenizer. Is there an easy way to convert all special characters like &(&) and "(") without having an if statement for each character? Any advice would be helpful!

Thanks.
0
SimonD
Asked:
SimonD
1 Solution
 
CEHJCommented:
I would use a Hashtable for this, although you should check to see if there's not already some translation code for this (i don't know of any):

Hashtable entities = new Hashtable();
entities.put("&", "&");

Look them up as follows


String entityKey = inputEntity.toLowerCase();
String character = (String)entities.get(entityKey);
etc.
0
 
functionpointerCommented:
I dont know any translation code either ( you'd think there should be some though ).

Easy? No. The thing that makes this complicated is that you can't switch off the String in Java. However, you CAN switch off the String.hashCode(). The Hashtable uses Object.hashCode() from the key, so this should be equally as safe as using the Hashtable method.

CEHJ's idea is good, but since you would be guaranteed the data type consistency in the underlying tables, and they will be fairly static, and you would really want 2 tables so you could do reverse lookups to do java.lang.String->HTML as well as HTML->java.lang.String, it might be more efficient to implement your own hash mapping.

It would be a pain to write, but once written, would be worthwhile as far as efficiency and able to be used for both HTML input AND output. Makes you wonder why something like this doesnt exist... Maybe noone's willing to create a java class around something as shaky as HTML. I know I would hate to be responsible for it. ;)
0
 
shji1Commented:
If you are working with JDK 1.4 then you can use the String.replaceAll method.
Lets say you hold the HTML in a String variable 'HTML':

HTML = HTML.replaceAll("&","&");
and you do that for every special char.

Now, I must say that this is a VERY not efficient way, but I think it will work.
--
0
 
SimonDAuthor Commented:
Thanks for your suggestions!
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Tackle projects and never again get stuck behind a technical roadblock.
Join Now