Solved

looking for Linux commandline tool for determining character encoding of a file

Posted on 2010-11-17
7
1,989 Views
Last Modified: 2012-05-10
I need to change the character-encoding of a large number of files, to UTF-8.  The problem is that they are not all the same encoding to start with.  I want to make a script that runs through and determines the current encoding for each file, then run iconv on the file to change to UTF-8.

I tried enca -i fileName but got:
$ enca -i fileName
enca: Cannot determine (or understand) your language preferences.
Please use `-L language', or `-L none' if your language is not supported
(only a few multibyte encodings can be recognized then).
Run `enca --list languages' to get a list of supported languages

Thanks,
Frank
0
Comment
Question by:ibanja
  • 3
  • 3
7 Comments
 
LVL 7

Expert Comment

by:deisrobinson
ID: 34155724
You are using enca on text files only correct?
0
 
LVL 7

Accepted Solution

by:
deisrobinson earned 500 total points
ID: 34155742
Look,

"You can (or have to) use -L option to tell it the right language. Suppose, you downloaded some Russian HTML file, 'file.htm', it claims it's windows-1251 but it isn't. So you run

    enca -L ru file.htm"

In your case it would probably be "enca -L en file.htm"

Find out more  here:

http://linux.die.net/man/1/enca
0
 
LVL 7

Expert Comment

by:Hatrix76
ID: 34156248
try file

file <filename>

it should give you something like:

utf8-formatted text

best
0
Complete VMware vSphere® ESX(i) & Hyper-V Backup

Capture your entire system, including the host, with patented disk imaging integrated with VMware VADP / Microsoft VSS and RCT. RTOs is as low as 15 seconds with Acronis Active Restore™. You can enjoy unlimited P2V/V2V migrations from any source (even from a different hypervisor)

 

Author Comment

by:ibanja
ID: 34156735
I tried the file command.  I get:

$ file file.tex
file.tex: Non-ISO extended-ASCII text

enca didn't like "en"
$ enca -L en file.tex
enca: Language `en' is unknown or not supported.

So I tried:

$ enca -L none file.tex
Unrecognized encoding

It seems to recognize some encodings but not all.  I guess this is the best I'll get.
0
 

Author Comment

by:ibanja
ID: 34156773
// You are using enca on text files only correct?
Yes, I'm using on text files, only.

0
 
LVL 7

Expert Comment

by:deisrobinson
ID: 34156783
Great, use the `enca --list languages'  command to find out what the abbreviation is for english. I just guessed it would be 'en'. Good luck!
0
 

Author Closing Comment

by:ibanja
ID: 34167286
I didn't get any "english" listings, that's why I went with "none."  It seems to work.

Thanks!

$ enca --list languages
belarussian: CP1251 IBM866 ISO-8859-5 KOI8-UNI maccyr IBM855
  bulgarian: CP1251 ISO-8859-5 IBM855 maccyr ECMA-113
      czech: ISO-8859-2 CP1250 IBM852 KEYBCS2 macce KOI-8_CS_2 CORK
   estonian: ISO-8859-4 CP1257 IBM775 ISO-8859-13 macce baltic
   croatian: CP1250 ISO-8859-2 IBM852 macce CORK
  hungarian: ISO-8859-2 CP1250 IBM852 macce CORK
 lithuanian: CP1257 ISO-8859-4 IBM775 ISO-8859-13 macce baltic
    latvian: CP1257 ISO-8859-4 IBM775 ISO-8859-13 macce baltic
     polish: ISO-8859-2 CP1250 IBM852 macce ISO-8859-13 ISO-8859-16 baltic CORK
    russian: KOI8-R CP1251 ISO-8859-5 IBM866 maccyr
     slovak: CP1250 ISO-8859-2 IBM852 KEYBCS2 macce KOI-8_CS_2 CORK
    slovene: ISO-8859-2 CP1250 IBM852 macce CORK
  ukrainian: CP1251 IBM855 ISO-8859-5 CP1125 KOI8-U maccyr
    chinese: GBK BIG5 HZ
       none:
0

Featured Post

Microsoft Certification Exam 74-409

Veeam® is happy to provide the Microsoft community with a study guide prepared by MVP and MCT, Orin Thomas. This guide will take you through each of the exam objectives, helping you to prepare for and pass the examination.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Setting up Secure Ubuntu server on VMware 1.      Insert the Ubuntu Server distribution CD or attach the ISO of the CD which is in the “Datastore”. Note that it is important to install the x64 edition on servers, not the X86 editions. 2.      Power on th…
It’s 2016. Password authentication should be dead — or at least close to dying. But, unfortunately, it has not traversed Quagga stage yet. Using password authentication is like laundering hotel guest linens with a washboard — it’s Passé.
Learn how to get help with Linux/Unix bash shell commands. Use help to read help documents for built in bash shell commands.: Use man to interface with the online reference manuals for shell commands.: Use man to search man pages for unknown command…
Get a first impression of how PRTG looks and learn how it works.   This video is a short introduction to PRTG, as an initial overview or as a quick start for new PRTG users.

786 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question