• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 611
  • Last Modified:

How can I get cygwin to recognize file names with special characters such as the n with a tilde above it?


I have recently encountered a problem with accessing files that have special characters in the file names. Specifically, the stat() function in C program fails to stat the files.  The special characters include (a couple examples) the trademark symbol and the n with a tilde above it.

I have looked at internationalization, but the recommendation from " http://www.cygwin.com/cygwin-ug-net/using-specialnames.html" is:
=============================
Filenames with unusual (foreign) characters Windows filesystems use Unicode encoded as UTF-16 to store filename information. If you don't use the UTF-8 character set (see the section called “Internationalization”) then there's a chance that a filename is using one or more characters which have no representation in the character set you're using.
      Note

      In the default "C" locale, Cygwin creates filenames using the UTF-8 charset. This will always result in some
       valid filename by default, but again might impose problems when switching to a non-"C" or non-"UTF-8" charset.

      Note

      To avoid this scenario altogether, always use UTF-8 as the character set.
=============================

Suggestions on how to access these files?  

Thanks in advance...
Leon
0
leonvan
Asked:
leonvan
  • 2
1 Solution
 
Gerwin Jansen, EE MVETopic Advisor Commented:
Hello leonvan, it seems that Cygwin is quite clear about the issue your facing:

"Only by using the UTF-8 charset you can avoid this problem safely." - found here, same page you got your info from, right?

Guess you are trying to acces Windows files from cygwin, and Windows is using UTF-16 to store filename information.

You could try stat from mkstoolkit instead of stat from cygwin
0
 
leonvanAuthor Commented:
Solved with the help of Cygwin.com.

If you don't define UNICODE, FindFirstFile/FindNextFile will use the ANSI versions of this API, FindFirstFileA/FindNextFileA.  If you didn't set your LANG/LC_CTYPE/LC_ALL variables to use your current Windows ANSI charset *and* called setlocale, Cygwin will use UTF-8 by default.  Therefore, the character ñ will have another multibyte encoding, 0xc3 0xb1, rather than, say, 0xf1 in Windows codepage 1252.  To avoid this problem, you can use the UNICODE API FindFirstFileW/ FindNextFileW and convert the filename the current multibyte charset via wcstombs and friends.
0
 
leonvanAuthor Commented:
Because this is what solved the problem.
0

Featured Post

Take Control of Web Hosting For Your Clients

As a web developer or IT admin, successfully managing multiple client accounts can be challenging. In this webinar we will look at the tools provided by Media Temple and Plesk to make managing your clients’ hosting easier.

  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now