Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

File system with mixed encodings UTF-8, iso-8859-1 and cp850

Posted on 2012-12-30
1
Medium Priority
?
974 Views
Last Modified: 2013-01-15
PROBLEM
I have a total mess in my Debian file system. During the years I'v used different encodings, sharing technology and clients. Every time I change/move my data I mess it up even more.

HISTORY
It has always been Windows clients (NT-WIN7) and data has been stored on Linux servers (Open filer, Centos/Samba, Debian/OpenVZ/Centos/Samba and Debian/OpenVZ/Centos/Webdav) encoding has been ISO-8859-1 and/or UTF-8.

The bad thing is that I can have a mix of encoding, f.ex. File created in ISO-8859-1 on UTF-8 and files created in UTF-8 stored in UTF-8, then I probably also have files that has been converted wrong. Because of Windows clients I probably has CP850 involved.

NEED HELP WITH
I need a script/command sets to convert files (file name?) so I use UTF-8 with correct representation of Swedish characters ÅÄÖ. The solution need to take care of unknown and mixed encoding.

I'v read about a tool called convmv but I don't know how to use it when I don't know encodings and how to use it when I have mixed codecs.
0
Comment
Question by:riverman
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
1 Comment
 
LVL 51

Accepted Solution

by:
ahoffmann earned 1000 total points
ID: 38732467
man convmv
man dos2unix
man iconv

convmv is just for filenames, think: rename cp80-name utf-8-name
you have to selct the files yourself to be performed by convmv
there is no unique mapping between the character sets, they even use different charcters for the same encoding and vice versa (see 0x80 in cp1250 and iso8859-15), a human (you) have to decide which coding is meant (which is obvious as you, the human, also encoded it ;-)

iconv and (older) dos2unix convert the character encodings in the file (file content), in general you run into the same problems as with convmv

I'd suggest that you first sort your files according the base characters set, as the filename and it's content is most likeley the same encoding
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Network Interface Card (NIC) bonding, also known as link aggregation, NIC teaming and trunking, is an important concept to understand and implement in any environment where high availability is of concern. Using this feature, a server administrator …
SSH (Secure Shell) - Tips and Tricks As you all know SSH(Secure Shell) is a network protocol, which we use to access/transfer files securely between two networked devices. SSH was actually designed as a replacement for insecure protocols that sen…
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.
Suggested Courses

688 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question