Solved

looking for Linux commandline tool for determining character encoding of a file

Posted on 2010-11-17
7
2,021 Views
Last Modified: 2012-05-10
I need to change the character-encoding of a large number of files, to UTF-8.  The problem is that they are not all the same encoding to start with.  I want to make a script that runs through and determines the current encoding for each file, then run iconv on the file to change to UTF-8.

I tried enca -i fileName but got:
$ enca -i fileName
enca: Cannot determine (or understand) your language preferences.
Please use `-L language', or `-L none' if your language is not supported
(only a few multibyte encodings can be recognized then).
Run `enca --list languages' to get a list of supported languages

Thanks,
Frank
0
Comment
Question by:ibanja
  • 3
  • 3
7 Comments
 
LVL 7

Expert Comment

by:deisrobinson
ID: 34155724
You are using enca on text files only correct?
0
 
LVL 7

Accepted Solution

by:
deisrobinson earned 500 total points
ID: 34155742
Look,

"You can (or have to) use -L option to tell it the right language. Suppose, you downloaded some Russian HTML file, 'file.htm', it claims it's windows-1251 but it isn't. So you run

    enca -L ru file.htm"

In your case it would probably be "enca -L en file.htm"

Find out more  here:

http://linux.die.net/man/1/enca
0
 
LVL 7

Expert Comment

by:Hatrix76
ID: 34156248
try file

file <filename>

it should give you something like:

utf8-formatted text

best
0
U.S. Department of Agriculture and Acronis Access

With the new era of mobile computing, smartphones and tablets, wireless communications and cloud services, the USDA sought to take advantage of a mobilized workforce and the blurring lines between personal and corporate computing resources.

 

Author Comment

by:ibanja
ID: 34156735
I tried the file command.  I get:

$ file file.tex
file.tex: Non-ISO extended-ASCII text

enca didn't like "en"
$ enca -L en file.tex
enca: Language `en' is unknown or not supported.

So I tried:

$ enca -L none file.tex
Unrecognized encoding

It seems to recognize some encodings but not all.  I guess this is the best I'll get.
0
 

Author Comment

by:ibanja
ID: 34156773
// You are using enca on text files only correct?
Yes, I'm using on text files, only.

0
 
LVL 7

Expert Comment

by:deisrobinson
ID: 34156783
Great, use the `enca --list languages'  command to find out what the abbreviation is for english. I just guessed it would be 'en'. Good luck!
0
 

Author Closing Comment

by:ibanja
ID: 34167286
I didn't get any "english" listings, that's why I went with "none."  It seems to work.

Thanks!

$ enca --list languages
belarussian: CP1251 IBM866 ISO-8859-5 KOI8-UNI maccyr IBM855
  bulgarian: CP1251 ISO-8859-5 IBM855 maccyr ECMA-113
      czech: ISO-8859-2 CP1250 IBM852 KEYBCS2 macce KOI-8_CS_2 CORK
   estonian: ISO-8859-4 CP1257 IBM775 ISO-8859-13 macce baltic
   croatian: CP1250 ISO-8859-2 IBM852 macce CORK
  hungarian: ISO-8859-2 CP1250 IBM852 macce CORK
 lithuanian: CP1257 ISO-8859-4 IBM775 ISO-8859-13 macce baltic
    latvian: CP1257 ISO-8859-4 IBM775 ISO-8859-13 macce baltic
     polish: ISO-8859-2 CP1250 IBM852 macce ISO-8859-13 ISO-8859-16 baltic CORK
    russian: KOI8-R CP1251 ISO-8859-5 IBM866 maccyr
     slovak: CP1250 ISO-8859-2 IBM852 KEYBCS2 macce KOI-8_CS_2 CORK
    slovene: ISO-8859-2 CP1250 IBM852 macce CORK
  ukrainian: CP1251 IBM855 ISO-8859-5 CP1125 KOI8-U maccyr
    chinese: GBK BIG5 HZ
       none:
0

Featured Post

Windows Server 2016: All you need to know

Learn about Hyper-V features that increase functionality and usability of Microsoft Windows Server 2016. Also, throughout this eBook, you’ll find some basic PowerShell examples that will help you leverage the scripts in your environments!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Over the years I've spent many an hour playing on hardened, DMZ'd servers, with only a sub-set of the usual GNU toy's to keep me company; frequently I've needed to save and send log or data extracts from these server back to my PC, or to others, and…
The purpose of this article is to demonstrate how we can use conditional statements using Python.
Learn how to navigate the file tree with the shell. Use pwd to print the current working directory: Use ls to list a directory's contents: Use cd to change to a new directory: Use wildcards instead of typing out long directory names: Use ../ to move…
In a recent question (https://www.experts-exchange.com/questions/29004105/Run-AutoHotkey-script-directly-from-Notepad.html) here at Experts Exchange, a member asked how to run an AutoHotkey script (.AHK) directly from Notepad++ (aka NPP). This video…

726 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question