Solved

How to read Unicode / Extended characters on a Mac?

Posted on 2013-11-12
15
506 Views
Last Modified: 2014-01-07
We have a CSV file that is downloaded from users who entered information via iPhones and iPads all over the country. Recently, someone entered their name with a non-ascii character, which is an e with a downwards accent over it.

When we open it up in Excel, it displays as a capital A with umlauts over it and a squiggle. (BOth Windows and Mac versions of Excel have this problem).

On Windows, when I open up the CSV file in notepad++, the character displays properly. On Mac, if I open it up in Textedit, it displays at the square root sign followed by the registered trademark symbol.

The hex values for the character are shown below:
Character Hex Values
How do I get Mac to interpret this correctly?

Here's the goal: if we can SEE what the character is supposed to be, then we can re-type the character in Excel so that it prints properly. Ideally, we'd like to use the characters as they were originally encoded, but we'd settle for a hack where we can view the original in ANY program and then re-type it in Excel.
0
Comment
Question by:DrDamnit
  • 11
  • 4
15 Comments
 
LVL 53

Expert Comment

by:strung
ID: 39642432
The problem is likely that your default font does not include unicode characters. Try changing the font to Lucinda Sans Unicode.
0
 
LVL 53

Expert Comment

by:strung
ID: 39642441
See this page for further information on Unicode and how to view it. http://symbolcodes.tlt.psu.edu/web/unicode.html
0
 
LVL 32

Author Comment

by:DrDamnit
ID: 39642702
Changing the font just makes the name in question slightly bigger, but still incorrect. The link you provided gives me the how it works theory, not the "how to make it work" application I need.

I know how it works as demonstrated by the fact I posted the hex values of this two-byte single character. How do I force the Mac to interpret the two bytes as a single character is the question.
0
Three Reasons Why Backup is Strategic

Backup is strategic to your business because your data is strategic to your business. Without backup, your business will fail. This white paper explains why it is vital for you to design and immediately execute a backup strategy to protect 100 percent of your data.

 
LVL 53

Expert Comment

by:strung
ID: 39642832
I am puzzled.

The hex for é is E9 and for è is E8

certainly not c3 or A8

See Hex to Text converter:  http://www.string-functions.com/hex-string.aspx
Text to Hex converter:  http://www.string-functions.com/string-hex.aspx
0
 
LVL 53

Expert Comment

by:strung
ID: 39642838
C3 = Ã
A8 =  ¨
0
 
LVL 53

Expert Comment

by:strung
ID: 39642847
Can you post a portion of the csv file as an attachment?
0
 
LVL 53

Expert Comment

by:strung
ID: 39642909
This long winded pate seems to suggest the problem may be the encoding that was used to send the csv document:

http://www.cs.tut.fi/~jkorpela/chars.html#encinfo

It is also possible (per the same page) that the mime settings on your web server may have something to do with the problem.
0
 
LVL 53

Expert Comment

by:strung
ID: 39642933
I think I am getting closer to the source of the problem. See this page:

http://www.joelonsoftware.com/articles/Unicode.html

which suggests that the e-mails sent by your users should have had a header specifying which type of unicode was being used. For the standard UTF8, it should look like this:

Content-Type: text/plain; charset="UTF-8"

You might check to see if there is a such a header in the e-mail. If the header specifies a different version of Unicode, say, UTF7, we have to figure out at what point the conversion needs to be made.
0
 
LVL 53

Expert Comment

by:strung
ID: 39642950
Aha! If you open TextEdit and go to File > Open, navigate to your csv file, but before you open it, click on the text encoding box, which is, by default, set to Automatic. Click on Automatic and choose a different encoding to see if you can find one that works.

My guess is the the user who sent the file either sent it without a unicode header (as per my previous message) or the header got striped out.
0
 
LVL 53

Accepted Solution

by:
strung earned 500 total points
ID: 39642953
You could also try the freeware TextWrangler instead of  TextEdit to see if it is able to select the right encoding automatically.
0
 
LVL 53

Expert Comment

by:strung
ID: 39642966
<link removed - GaryC123>

Apparently it may misinterpret a comma that is part of a unicode character as a field delimiter.  One suggested solution is to use tabs instead of commas to delimit fields
0
 
LVL 32

Author Comment

by:DrDamnit
ID: 39645709
Problem character is in B3.
001894.zip
0
 
LVL 32

Author Comment

by:DrDamnit
ID: 39645714
Some background:

The text is submitted form an iPhone app to the server (data is submitted using an HTTP POST operation, which is saved in MySQL). Later, the CSVs are downloaded directly from the server (we use a SQL query and a php script to build the file). The file submitted above is EXACTLY what is downloaded in its entirety.

The CSV type needs no content header, although, I agree it could possibly fix an encoding issue by forcing it to be viewed properly. But, in this case, I don't see how to do that reliably.
0
 
LVL 53

Expert Comment

by:strung
ID: 39646100
Go to Firefox Preferences > Advanced > Network > Settings

Set proxy settings.

See screenshots attached
Screen-Shot-2013-11-13-at-3.24.1.pdf
Screen-Shot-2013-11-13-at-3.24.2.pdf
0
 
LVL 32

Author Closing Comment

by:DrDamnit
ID: 39762916
I can't remember how we solved this, but it was with some other program. :-)
0

Featured Post

Three Reasons Why Backup is Strategic

Backup is strategic to your business because your data is strategic to your business. Without backup, your business will fail. This white paper explains why it is vital for you to design and immediately execute a backup strategy to protect 100 percent of your data.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Apple's Mac OS X has become an official member of the malware club. The Flashback Trojan has affected over half million Macs, worldwide. It is behavior that ultimately gets malware onto a person’s computer. Obsolete or out-of-date software helps…
Create a default user profile for Mac OS X 10.7/10.8 Create a user account on OS X that will be a template for every other user of that computer. I usually call it “profile” and make it an administrator account for the time being. 1. Install a…
This Micro Tutorial hows how you can integrate  Mac OSX to a Windows Active Directory Domain. Apple has made it easy to allow users to bind their macs to a windows domain with relative ease. The following video show how to bind OSX Mavericks to …
A short tutorial showing how to set up an email signature in Outlook on the Web (previously known as OWA). For free email signatures designs, visit https://www.mail-signatures.com/articles/signature-templates/?sts=6651 If you want to manage em…

810 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question