gudii9
asked on
encoding of characters
à
é
i have some special characters shown as above in the screen.
How to convert them to corrsponding special character to display properly like 'a' with cap on top of it or apostophe on top etc?
any link or resource online to convert these encodings automatically
é
i have some special characters shown as above in the screen.
How to convert them to corrsponding special character to display properly like 'a' with cap on top of it or apostophe on top etc?
any link or resource online to convert these encodings automatically
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
pony10us already showed you the first format at
https://www.experts-exchange.com/questions/29089384/entities-to-special-characters.html?anchorAnswerId=42500959#a42500959
see also
https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
the second is just the character itself
and the third is
https://unicode-table.com/en/
also read the top paragraphs at
https://docs.oracle.com/javase/8/docs/api/java/lang/Character.html
https://www.experts-exchange.com/questions/29089384/entities-to-special-characters.html?anchorAnswerId=42500959#a42500959
see also
https://en.wikipedia.org/wiki/List_of_XML_and_HTML_character_entity_references
the second is just the character itself
and the third is
https://unicode-table.com/en/
also read the top paragraphs at
https://docs.oracle.com/javase/8/docs/api/java/lang/Character.html
ASKER
which one is better to keep in my code among these 3
š š \u0161
First one definitely messes up the display
i wonder why all below represent question mark
ě ?
ł ?
ź ?
ś ?
č ?
š š \u0161
First one definitely messes up the display
i wonder why all below represent question mark
ě ?
ł ?
ź ?
ś ?
č ?
which one is better to keep in my code among these 3What code? Anyway, the first in your list is a character entity. The second is a character. The third is a Unicode escape sequence.
i wonder why all below represent question markThe question mark is just a place holder. It tells us that the code can't figure what the encoding represent. It prints a ? in its place.
I wrote some test code. First, here is test.html
<!DOCTYPE html>
<html>
<body>
à é ě ł ź ś č š
</body>
</html>
In the browser, it outputs à é ě ł ź ś č š
I couldn't display the same thing on the command line(maybe an expert will show us how to do that). But, I was able to display it in a JFrame.
import java.awt.*;
import javax.swing.*;
public class T {
public static void main(String args[]) {
JFrame frame = new JFrame("Testing Unicode");
JPanel panel = new JPanel();
panel.setLayout(new FlowLayout());
JLabel label = new JLabel("\u00E0 \u00E9 \u011B \u0142 \u017A \u015B \u010D \u0161");
label.setFont(new Font("Serif", Font.PLAIN, 28));
panel.add(label);
frame.add(panel);
frame.setSize(500, 300);
frame.setLocationRelativeTo(null);
frame.setDefaultCloseOperation(JFrame.EXIT_ON_CLOSE);
frame.setVisible(true);
}
}
It prints the same string of characters à é ě ł ź ś č š in the JLabel.
I couldn't display the same thing on the command line(maybe an expert will show us how to do that).
Windows has poor support for Unicode in a console. You might try setting a font in the console that looks promising (e.g. it has 'unicode' in its name) or
http://technojeeves.com/index.php/aliasjava1/107-enabling-windows-console-for-unicode
@CEHJ, thanks for explaining that.
Yes - sorry about that. But thanks to rrz for concern
As @rrz has very succintly explained, the character entity and Unicode code-point values are just different representations of the same character.
But the character will only display (in your browser or application window, or when printed) if the current font contains that character.
But the character will only display (in your browser or application window, or when printed) if the current font contains that character.
It's important to realize that character entities are usually only used in marked-up documents such as html/xml
@CEHJ: good point of clarification!
Outside of that environment, they wouldn't be interpreted, but just treated as character strings.
Outside of that environment, they wouldn't be interpreted, but just treated as character strings.
@Gerwin Jansen, thanks for the cleanup and thanks for the mouse hover explanation.
@rrz - You're welcome ;)
This question is a duplicate of the question that CEHJ linked in his first comment.
I assisted by answering some of gudii9's follow up questions.
I assisted by answering some of gudii9's follow up questions.
ASKER
what is difference between above 3 formats