Solved

File system with mixed encodings UTF-8, iso-8859-1 and cp850

Posted on 2012-12-30
1
962 Views
Last Modified: 2013-01-15
PROBLEM
I have a total mess in my Debian file system. During the years I'v used different encodings, sharing technology and clients. Every time I change/move my data I mess it up even more.

HISTORY
It has always been Windows clients (NT-WIN7) and data has been stored on Linux servers (Open filer, Centos/Samba, Debian/OpenVZ/Centos/Samba and Debian/OpenVZ/Centos/Webdav) encoding has been ISO-8859-1 and/or UTF-8.

The bad thing is that I can have a mix of encoding, f.ex. File created in ISO-8859-1 on UTF-8 and files created in UTF-8 stored in UTF-8, then I probably also have files that has been converted wrong. Because of Windows clients I probably has CP850 involved.

NEED HELP WITH
I need a script/command sets to convert files (file name?) so I use UTF-8 with correct representation of Swedish characters ÅÄÖ. The solution need to take care of unknown and mixed encoding.

I'v read about a tool called convmv but I don't know how to use it when I don't know encodings and how to use it when I have mixed codecs.
0
Comment
Question by:riverman
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
1 Comment
 
LVL 51

Accepted Solution

by:
ahoffmann earned 500 total points
ID: 38732467
man convmv
man dos2unix
man iconv

convmv is just for filenames, think: rename cp80-name utf-8-name
you have to selct the files yourself to be performed by convmv
there is no unique mapping between the character sets, they even use different charcters for the same encoding and vice versa (see 0x80 in cp1250 and iso8859-15), a human (you) have to decide which coding is meant (which is obvious as you, the human, also encoded it ;-)

iconv and (older) dos2unix convert the character encodings in the file (file content), in general you run into the same problems as with convmv

I'd suggest that you first sort your files according the base characters set, as the filename and it's content is most likeley the same encoding
0

Featured Post

Enroll in June's Course of the Month

June’s Course of the Month is now available! Experts Exchange’s Premium Members, Team Accounts, and Qualified Experts have access to a complimentary course each month as part of their membership—an extra way to sharpen your skills and increase training.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Introduction We as admins face situation where we need to redirect websites to another. This may be required as a part of an upgrade keeping the old URL but website should be served from new URL. This document would brief you on different ways ca…
Fine Tune your automatic Updates for Ubuntu / Debian
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.

688 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question