Solved

File system with mixed encodings UTF-8, iso-8859-1 and cp850

Posted on 2012-12-30
1
946 Views
Last Modified: 2013-01-15
PROBLEM
I have a total mess in my Debian file system. During the years I'v used different encodings, sharing technology and clients. Every time I change/move my data I mess it up even more.

HISTORY
It has always been Windows clients (NT-WIN7) and data has been stored on Linux servers (Open filer, Centos/Samba, Debian/OpenVZ/Centos/Samba and Debian/OpenVZ/Centos/Webdav) encoding has been ISO-8859-1 and/or UTF-8.

The bad thing is that I can have a mix of encoding, f.ex. File created in ISO-8859-1 on UTF-8 and files created in UTF-8 stored in UTF-8, then I probably also have files that has been converted wrong. Because of Windows clients I probably has CP850 involved.

NEED HELP WITH
I need a script/command sets to convert files (file name?) so I use UTF-8 with correct representation of Swedish characters ÅÄÖ. The solution need to take care of unknown and mixed encoding.

I'v read about a tool called convmv but I don't know how to use it when I don't know encodings and how to use it when I have mixed codecs.
0
Comment
Question by:riverman
1 Comment
 
LVL 51

Accepted Solution

by:
ahoffmann earned 500 total points
ID: 38732467
man convmv
man dos2unix
man iconv

convmv is just for filenames, think: rename cp80-name utf-8-name
you have to selct the files yourself to be performed by convmv
there is no unique mapping between the character sets, they even use different charcters for the same encoding and vice versa (see 0x80 in cp1250 and iso8859-15), a human (you) have to decide which coding is meant (which is obvious as you, the human, also encoded it ;-)

iconv and (older) dos2unix convert the character encodings in the file (file content), in general you run into the same problems as with convmv

I'd suggest that you first sort your files according the base characters set, as the filename and it's content is most likeley the same encoding
0

Featured Post

What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

Join & Write a Comment

This is the error message I got (CODE) Error caused by incompatible libmp3lame 3.98-2 with ffmpeg I've googled this error message and found out sometimes it attaches this note "can be treated with downgrade libmp3lame to version 3.97 or 3.98" …
The purpose of this article is to demonstrate how we can use conditional statements using Python.
Connecting to an Amazon Linux EC2 Instance from Windows Using PuTTY.
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now