Solved

File system with mixed encodings UTF-8, iso-8859-1 and cp850

Posted on 2012-12-30
1
955 Views
Last Modified: 2013-01-15
PROBLEM
I have a total mess in my Debian file system. During the years I'v used different encodings, sharing technology and clients. Every time I change/move my data I mess it up even more.

HISTORY
It has always been Windows clients (NT-WIN7) and data has been stored on Linux servers (Open filer, Centos/Samba, Debian/OpenVZ/Centos/Samba and Debian/OpenVZ/Centos/Webdav) encoding has been ISO-8859-1 and/or UTF-8.

The bad thing is that I can have a mix of encoding, f.ex. File created in ISO-8859-1 on UTF-8 and files created in UTF-8 stored in UTF-8, then I probably also have files that has been converted wrong. Because of Windows clients I probably has CP850 involved.

NEED HELP WITH
I need a script/command sets to convert files (file name?) so I use UTF-8 with correct representation of Swedish characters ÅÄÖ. The solution need to take care of unknown and mixed encoding.

I'v read about a tool called convmv but I don't know how to use it when I don't know encodings and how to use it when I have mixed codecs.
0
Comment
Question by:riverman
1 Comment
 
LVL 51

Accepted Solution

by:
ahoffmann earned 500 total points
ID: 38732467
man convmv
man dos2unix
man iconv

convmv is just for filenames, think: rename cp80-name utf-8-name
you have to selct the files yourself to be performed by convmv
there is no unique mapping between the character sets, they even use different charcters for the same encoding and vice versa (see 0x80 in cp1250 and iso8859-15), a human (you) have to decide which coding is meant (which is obvious as you, the human, also encoded it ;-)

iconv and (older) dos2unix convert the character encodings in the file (file content), in general you run into the same problems as with convmv

I'd suggest that you first sort your files according the base characters set, as the filename and it's content is most likeley the same encoding
0

Featured Post

On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This is the error message I got (CODE) Error caused by incompatible libmp3lame 3.98-2 with ffmpeg I've googled this error message and found out sometimes it attaches this note "can be treated with downgrade libmp3lame to version 3.97 or 3.98" …
Introduction We as admins face situation where we need to redirect websites to another. This may be required as a part of an upgrade keeping the old URL but website should be served from new URL. This document would brief you on different ways ca…
Learn how to find files with the shell using the find and locate commands. Use locate to find a needle in a haystack.: With locate, check if the file still exists.: Use find to get the actual location of the file.:
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

733 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question