Solved

File system with mixed encodings UTF-8, iso-8859-1 and cp850

Posted on 2012-12-30
1
954 Views
Last Modified: 2013-01-15
PROBLEM
I have a total mess in my Debian file system. During the years I'v used different encodings, sharing technology and clients. Every time I change/move my data I mess it up even more.

HISTORY
It has always been Windows clients (NT-WIN7) and data has been stored on Linux servers (Open filer, Centos/Samba, Debian/OpenVZ/Centos/Samba and Debian/OpenVZ/Centos/Webdav) encoding has been ISO-8859-1 and/or UTF-8.

The bad thing is that I can have a mix of encoding, f.ex. File created in ISO-8859-1 on UTF-8 and files created in UTF-8 stored in UTF-8, then I probably also have files that has been converted wrong. Because of Windows clients I probably has CP850 involved.

NEED HELP WITH
I need a script/command sets to convert files (file name?) so I use UTF-8 with correct representation of Swedish characters ÅÄÖ. The solution need to take care of unknown and mixed encoding.

I'v read about a tool called convmv but I don't know how to use it when I don't know encodings and how to use it when I have mixed codecs.
0
Comment
Question by:riverman
1 Comment
 
LVL 51

Accepted Solution

by:
ahoffmann earned 500 total points
ID: 38732467
man convmv
man dos2unix
man iconv

convmv is just for filenames, think: rename cp80-name utf-8-name
you have to selct the files yourself to be performed by convmv
there is no unique mapping between the character sets, they even use different charcters for the same encoding and vice versa (see 0x80 in cp1250 and iso8859-15), a human (you) have to decide which coding is meant (which is obvious as you, the human, also encoded it ;-)

iconv and (older) dos2unix convert the character encodings in the file (file content), in general you run into the same problems as with convmv

I'd suggest that you first sort your files according the base characters set, as the filename and it's content is most likeley the same encoding
0

Featured Post

Announcing the Most Valuable Experts of 2016

MVEs are more concerned with the satisfaction of those they help than with the considerable points they can earn. They are the types of people you feel privileged to call colleagues. Join us in honoring this amazing group of Experts.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

How many times have you wanted to quickly do the same thing to a list but found yourself typing it again and again? I first figured out a small time saver with the up arrow to recall the last command but that can only get you so far if you have a bi…
Setting up Secure Ubuntu server on VMware 1.      Insert the Ubuntu Server distribution CD or attach the ISO of the CD which is in the “Datastore”. Note that it is important to install the x64 edition on servers, not the X86 editions. 2.      Power on th…
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

809 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question