asked on

encoding of characters

à
é

i have some special characters shown as above in the screen.

How to convert them to corrsponding special character to display properly like 'a' with cap on top of it or apostophe on top etc?

any link or resource online to convert these encodings automatically

ASKER CERTIFIED SOLUTION

CEHJ

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

gudii9

ASKER

š š \u0161
what is difference between above 3 formats

rrz

pony10us already showed you the first format at
https://www.experts-exchange.com/questions/29089384/entities-to-special-characters.html?anchorAnswerId=42500959#a42500959
see also
https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
the second is just the character itself
and the third is
https://unicode-table.com/en/
also read the top paragraphs at
https://docs.oracle.com/javase/8/docs/api/java/lang/Character.html

gudii9

ASKER

which one is better to keep in my code among these 3

š š \u0161

First one definitely messes up the display
i wonder why all below represent question mark
ě      ?
ł      ?
ź      ?
ś      ?
č      ?

rrz

which one is better to keep in my code among these 3

What code? Anyway, the first in your list is a character entity. The second is a character. The third is a Unicode escape sequence.

i wonder why all below represent question mark

The question mark is just a place holder. It tells us that the code can't figure what the encoding represent. It prints a ? in its place.
I wrote some test code. First, here is test.html

<!DOCTYPE html>
<html>
	<body>
			&#224; &#233; &#283; &#322; &#378;  &#347; &#269; &#353;
	</body>
</html>

Open in new window

In the browser, it outputs
à é ě ł ź ś č š
I couldn't display the same thing on the command line(maybe an expert will show us how to do that). But, I was able to display it in a JFrame.

import java.awt.*;
import javax.swing.*;
public class T {
    public static void main(String args[]) {
        JFrame frame = new JFrame("Testing Unicode");
        JPanel panel = new JPanel();
        panel.setLayout(new FlowLayout());
        JLabel label = new JLabel("\u00E0 \u00E9 \u011B \u0142 \u017A \u015B \u010D \u0161");
		label.setFont(new Font("Serif", Font.PLAIN, 28));
        panel.add(label);
        frame.add(panel);
        frame.setSize(500, 300);
        frame.setLocationRelativeTo(null);
        frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
        frame.setVisible(true);
    }
}

Open in new window

It prints the same string of characters à é ě ł ź ś č š in the JLabel.

CEHJ

I couldn't display the same thing on the command line(maybe an expert will show us how to do that).

Windows has poor support for Unicode in a console. You might try setting a font in the console that looks promising (e.g. it has 'unicode' in its name) or
http://technojeeves.com/index.php/aliasjava1/107-enabling-windows-console-for-unicode

rrz

@CEHJ, thanks for explaining that.

CEHJ

Yes - sorry about that. But thanks to rrz for concern

DansDadUK

As @rrz has very succintly explained, the character entity and Unicode code-point values are just different representations of the same character.

But the character will only display (in your browser or application window, or when printed) if the current font contains that character.

CEHJ

It's important to realize that character entities are usually only used in marked-up documents such as html/xml

DansDadUK

@CEHJ: good point of clarification!

Outside of that environment, they wouldn't be interpreted, but just treated as character strings.

rrz

@Gerwin Jansen, thanks for the cleanup and thanks for the mouse hover explanation.

Gerwin Jansen

@rrz - You're welcome ;)

rrz

This question is a duplicate of the question that CEHJ linked in his first comment.
I assisted by answering some of gudii9's follow up questions.