How can I find which characters are in Unicode format in a text file?

Dellby
Dellby used Ask the Experts™
on
When I copy a large number of file names (from my Windows 7 - 64 bit computer, Swedish system) into Notepad and tries to save it with the default ANSI-option, I get a warning telling me that the file contains charachters in Unicode-format which will be lost if I save it in ANSI-format. Now I wonder what characters that might be and how i can find out? I have noticed that if I save the file in Unicode it is much larger than if I save it with the ANSI-option.
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Commented:
Hi,

There could be several instances where you get this warning, even if you don't have any unicode present, one of them is that the file you are copying has a non-printable character in it, is one of them. The other is that the content has the MARKER (Byte Order Mark (BOM) - FEFF - EFBBBF) used for Unicode detection is present without any Unicode characters.

In any case the way to see these "Unicode characters" would be to open the file in a hex dump mode, see list here:
http://en.wikibooks.org/wiki/X86_Disassembly/Analysis_Tools#Windows_Hex_Editors
“ANSI” format means that the file saved only has characters that occur in your 256-character, old style character set. If your computer is set up for English, this would be code page 1252. This page shows all the 256 characters: http://en.wikipedia.org/wiki/Windows-1252

The warning message will appear if there is at least one character in the file that is not in Code Page 1252. Thus you should save the file in a Unicode format if the file size difference is not significant or else those characters not in Code Page 1252 will become “?”.

If you want to find out which characters are non-ANSI, just save it as ANSI in spite of the warning and find the "?"s in the file.

You might want to check out this freeware as well:
http://www.rj-texted.se/bilder/MainUni-100.png
http://www.rj-texted.se/


Author

Commented:
Thanks!
I managed to narrow down the list by continually saving half of it until I just had one single line left. So, the filename below, pasted into Notepad rendered a warning about having Unicode charcters, so when I forced it to save as an ANSI-coded file and opened it again, it looked a bit different, and the Swedish charcter å had become a followed by °.
Kommentar a°ren 2009.pdf
Kommentar a°ren 2009.pdf

So I guess that the follow up question to this will be, does it really matter? Or can the use of Swedish charcters when copying and backuping the file to different devices back and forth render any particular problem? We are for instance copying the files to NAS-devices such as Netgear ReadyNAS, and we are also using Gbridge to syncronize and copy the files out of the office to another Windows machine. The OS running all this is always Windows 7 - 64 bit machines, BUT some of the PC:s have Swedish version and others English version.
Hi Delby

If you're using NFS, unicode should not be a problen for NAS device:
http://www.readynas.com/forum/viewtopic.php?f=23&t=2099&start=0
They have also resolved issues with unicode for GBridge as well.
http://www.gbridge.com/forum/archive/index.php/t-178.html

cheers!
Amb

Author

Commented:
Thanks Amber! That was most helpful!

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial