Improve company productivity with a Business Account.Sign Up

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 958
  • Last Modified:

How to find the character encoding type of a file in Linux?

I have a .txt file and I need to determine what character encoding it is using so I can then convert other files to match it.

If I run "file myfile.txt", I get this info:
       "Non-ISO extended-ASCII text, with very long lines"

I know the file is ANSI but I need to determine exactly what type of ANSI file so I can convert other files to match it.

When I check the filetypes available in "iconv", I find these possibilities. How do I determine which one is the exact match?

ANSI_X3.4-1968
ANSI_X3.4-1986
ANSI_X3.4
ANSI_X3.110-1983
ANSI_X3.110
ASCII
MS-ANSI
WINDOWS-31J
WINDOWS-874
WINDOWS-936
WINDOWS-1250
WINDOWS-1251
WINDOWS-1252
WINDOWS-1253
WINDOWS-1254
WINDOWS-1255
WINDOWS-1256
WINDOWS-1257
WINDOWS-1258


0
bearclaws75
Asked:
bearclaws75
  • 5
  • 2
1 Solution
 
farzanjCommented:
Just use the command
unix2dos filename


And it should convert it to the DOS format.
or sometimes called
ux2dos
0
 
farzanjCommented:
If you want to go the other way,

issue this command

dos2unix filename
0
 
farzanjCommented:
I think the character encoding is UTF-8
0
Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

 
bearclaws75Author Commented:
farzani - I know the file is not UTF-8 because if I run "file otherfile.txt" on a different file, the output is:
     "UTF-8 Unicode text, with very long lines, with CRLF line terminators"

Howver, I ran "unix2dos myfile.txt" and it converted the file:
     "unix2dos: converting file myfile.txt to DOS format ..."

...but if I run "file myfile.txt", I get the same info:
       "Non-ISO extended-ASCII text, with very long lines"

"unix2dos" is a good command-line utility but, ultimately, i need to determine the exact character encoding so I can update my php scripts to generate the proper file type.


0
 
farzanjCommented:
Well, I see your point but you can still call this utility from within PHP.  In any case let me look into it
0
 
farzanjCommented:
Well, I think it is very simple.  Basically you are converting the new line characters, that is about all.  Rest the remaining are the ASCII codes for characters which are the same.

So you need to convert line feed (\n) to carriage return (\r) and line feed.  Use a simple regular expression to do that.

So you are changing \n  to \r\n
0
 
bearclaws75Author Commented:
I found this command which did the trick:

sed 's/\r$//' winfile.txt > unixfile.txt

I still wasn't able to determine the *exact* file encoding but this produced the desired results.

Thanks for the help!
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Keep up with what's happening at Experts Exchange!

Sign up to receive Decoded, a new monthly digest with product updates, feature release info, continuing education opportunities, and more.

  • 5
  • 2
Tackle projects and never again get stuck behind a technical roadblock.
Join Now