Solved

How to find the character encoding type of a file in Linux?

Posted on 2011-03-21
7
890 Views
Last Modified: 2012-05-11
I have a .txt file and I need to determine what character encoding it is using so I can then convert other files to match it.

If I run "file myfile.txt", I get this info:
       "Non-ISO extended-ASCII text, with very long lines"

I know the file is ANSI but I need to determine exactly what type of ANSI file so I can convert other files to match it.

When I check the filetypes available in "iconv", I find these possibilities. How do I determine which one is the exact match?

ANSI_X3.4-1968
ANSI_X3.4-1986
ANSI_X3.4
ANSI_X3.110-1983
ANSI_X3.110
ASCII
MS-ANSI
WINDOWS-31J
WINDOWS-874
WINDOWS-936
WINDOWS-1250
WINDOWS-1251
WINDOWS-1252
WINDOWS-1253
WINDOWS-1254
WINDOWS-1255
WINDOWS-1256
WINDOWS-1257
WINDOWS-1258


0
Comment
Question by:bearclaws75
  • 5
  • 2
7 Comments
 
LVL 31

Expert Comment

by:farzanj
ID: 35185608
Just use the command
unix2dos filename


And it should convert it to the DOS format.
or sometimes called
ux2dos
0
 
LVL 31

Expert Comment

by:farzanj
ID: 35185616
If you want to go the other way,

issue this command

dos2unix filename
0
 
LVL 31

Expert Comment

by:farzanj
ID: 35185634
I think the character encoding is UTF-8
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 

Author Comment

by:bearclaws75
ID: 35185695
farzani - I know the file is not UTF-8 because if I run "file otherfile.txt" on a different file, the output is:
     "UTF-8 Unicode text, with very long lines, with CRLF line terminators"

Howver, I ran "unix2dos myfile.txt" and it converted the file:
     "unix2dos: converting file myfile.txt to DOS format ..."

...but if I run "file myfile.txt", I get the same info:
       "Non-ISO extended-ASCII text, with very long lines"

"unix2dos" is a good command-line utility but, ultimately, i need to determine the exact character encoding so I can update my php scripts to generate the proper file type.


0
 
LVL 31

Expert Comment

by:farzanj
ID: 35185797
Well, I see your point but you can still call this utility from within PHP.  In any case let me look into it
0
 
LVL 31

Accepted Solution

by:
farzanj earned 500 total points
ID: 35185912
Well, I think it is very simple.  Basically you are converting the new line characters, that is about all.  Rest the remaining are the ASCII codes for characters which are the same.

So you need to convert line feed (\n) to carriage return (\r) and line feed.  Use a simple regular expression to do that.

So you are changing \n  to \r\n
0
 

Author Closing Comment

by:bearclaws75
ID: 35217944
I found this command which did the trick:

sed 's/\r$//' winfile.txt > unixfile.txt

I still wasn't able to determine the *exact* file encoding but this produced the desired results.

Thanks for the help!
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

Title # Comments Views Activity
Batch script to create folder and use it as a variable 6 41
IBM T20 - can DOS and Win95 be installed and fully working? 29 142
Openwrt vnstat 9 153
Use Powershell script to ftp 10 82
TOMORROW TOMORROW.BAT is inspired by a question I get asked over and over again; that is, "How can I use batch file commands to obtain tomorrow's date?" The crux of this batch file revolves around the XCOPY command - a technique I discovered w…
The purpose of this article is to demonstrate how we can upgrade Python from version 2.7.6 to Python 2.7.10 on the Linux Mint operating system. I am using an Oracle Virtual Box where I have installed Linux Mint operating system version 17.2. Once yo…
Windows 10 is mostly good. However the one thing that annoys me is how many clicks you have to do to dial a VPN connection. You have to go to settings from the start menu, (2 clicks), Network and Internet (1 click), Click VPN (another click) then fi…
This is used to tweak the memory usage for your computer, it is used for servers more so than workstations but just be careful editing registry settings as it may cause irreversible results. I hold no responsibility for anything you do to the regist…

911 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

21 Experts available now in Live!

Get 1:1 Help Now