Link to home
Start Free TrialLog in
Avatar of gudii9
gudii9Flag for United States of America

asked on

encoding of characters

à
é

i have some special characters shown as above in the screen.

How to convert them to corrsponding special character to display properly like  'a' with cap on top of it or apostophe on top etc?

any link or resource online to convert these encodings automatically
ASKER CERTIFIED SOLUTION
Avatar of CEHJ
CEHJ
Flag of United Kingdom of Great Britain and Northern Ireland image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of gudii9

ASKER

š      š      \u0161
what is difference between above 3 formats
Avatar of gudii9

ASKER

which one is better to keep in my code among these 3

š      š      \u0161

First one definitely messes up the display
i wonder why all below represent question mark
ě      ?
ł      ?
ź      ?
ś      ?
č      ?
which one is better to keep in my code among these 3
What code?   Anyway, the first in your list is a character entity. The second is a character. The third is a Unicode escape sequence.  
i wonder why all below represent question mark
The question mark is just a place holder. It tells us that the code can't figure what the encoding represent. It prints a ? in its place.  
I wrote some  test code.  First, here is test.html  
<!DOCTYPE html>
<html>
	<body>
			&#224; &#233; &#283; &#322; &#378;  &#347; &#269; &#353;
	</body>
</html>

Open in new window

In the browser,  it outputs
à é ě ł ź ś č š    
I couldn't display the same thing on the command line(maybe an expert will show us how to do that). But, I was able to display it in a JFrame.
import java.awt.*;
import javax.swing.*;
public class T {
    public static void main(String args[]) {
        JFrame frame = new JFrame("Testing Unicode");
        JPanel panel = new JPanel();
        panel.setLayout(new FlowLayout());
        JLabel label = new JLabel("\u00E0 \u00E9 \u011B \u0142 \u017A \u015B \u010D \u0161");
		label.setFont(new Font("Serif", Font.PLAIN, 28));
        panel.add(label);
        frame.add(panel);
        frame.setSize(500, 300);
        frame.setLocationRelativeTo(null);
        frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
        frame.setVisible(true);
    }
}

Open in new window

 It prints the same string of characters  à é ě ł ź ś č š  in the JLabel.
I couldn't display the same thing on the command line(maybe an expert will show us how to do that).

Windows has poor support for Unicode in a console. You might try setting a font in the console that looks promising (e.g. it has 'unicode' in its name) or
http://technojeeves.com/index.php/aliasjava1/107-enabling-windows-console-for-unicode
@CEHJ, thanks for explaining that.
Yes - sorry about that. But thanks to rrz for concern
As @rrz has very succintly explained, the character entity and Unicode code-point values are just different representations of the same character.

But the character will only display (in your browser or application window, or when printed) if the current font contains that character.
It's important to realize that character entities are usually only used in marked-up documents such as html/xml
@CEHJ: good point of clarification!

Outside of that environment, they wouldn't be interpreted, but just treated as character strings.
@Gerwin Jansen,  thanks for the cleanup and thanks for the mouse hover explanation.
@rrz - You're welcome ;)
This question is a duplicate of the question that CEHJ linked in his first comment.
I assisted by answering  some of gudii9's  follow up questions.