• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 501
  • Last Modified:

Is this file UTF-16?

I have a file here that I think is in a form of UTF-16. I need to know which format it uses and I also need to convert it to UTF-8.
090222
0
gmk1212
Asked:
gmk1212
  • 8
  • 6
  • 2
  • +1
1 Solution
 
CEHJCommented:
It looks like it could be. You can convert by using InputStreamReader/Writer with the right encoding specified
0
 
VBRocksCommented:
I don't think it is.  Where/What is it coming from?  
0
 
gmk1212Author Commented:
If I open the file with SuperEdi, some of the contents are readable.

It's a file produced by a game from an in game recording function. The game is Shadowbane, which has been shut down. But we're trying to create a server emulator for it - http://community.shadowbaneemulator.com/forums/
0
What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

 
CEHJCommented:
You probably need to make sure you understand the file format, irrespective of the encoding. The Unix utility 'strings', set to 16 bit encoding produces the attached (in UTF-8 format)
strings.txt
0
 
gmk1212Author Commented:
Do you think that strings.txt contains all of the data in the file, or could a direct conversion be parsing out some of the data?
0
 
CEHJCommented:
>>Do you think that strings.txt contains all of the data in the file

It contains it in the proportion

2 * strings.length() / original.length()

, on the assumption that the characters are encodable in one byte, and where

a. 'strings' is my derived file
b. 'original' is your original file
0
 
gmk1212Author Commented:
I think these might be client opcodes and the strings are comprised of 2 byte characters, preceded by the string length(4bytes).
0
 
CEHJCommented:
Possibly. Here is the file again with hex offsets preceding each string


strings-hexoffs.txt
0
 
gmk1212Author Commented:
Sorry for not replying for a few days.

Do you think you could give me a step by step on how you got that last file?
0
 
jazzIIIloveCommented:
For original question, please download sublime text editor, that you can open/convert whatever UTF format you like.

Please examine the screenshot.

Best regards.
sublime.jpg
0
 
gmk1212Author Commented:
Well I've figured out that it isn't plain utf-16. But I still need to convert it to utf-8.
0
 
CEHJCommented:
>>Do you think you could give me a step by step on how you got that last file?

There's little to it, but you must have a Unix-based OS - do you have one?
0
 
gmk1212Author Commented:
Yep.
0
 
CEHJCommented:
The command is below, where the last argument is the file name. I've also attached a file detailing Unicode-undefined characters (possible control codes?)


strings -e b -t x 090222

Open in new window

errs
0
 
CEHJCommented:
(Done with the following)
import java.io.*;

public class ConvUTF16 {

    public static void main(String[] args) throws IOException {

	Reader in = null;
	try {
	    in = new InputStreamReader(new FileInputStream(args[0]), "UTF-16");
	    int buf = -1;
	    int offs = 0;
	    while((buf = in.read()) > -1) {
		char c = (char)buf;
		if(!Character.isDefined(c)) {
		    System.out.printf("Character %04x at offset %04x is undefined\n", buf, offs);
		}
		offs++;
	    }
	}
	finally {
	    in.close();
	}
    }
}

Open in new window

0
 
jazzIIIloveCommented:
>>Well I've figured out that it isn't plain utf-16. But I still need to convert it to utf-8.
Simply, Save As with encoding in Sublime Text Editor.

Bestr regards.
0
 
CEHJCommented:
:-)

>>>>

>>Well I've figured out that it isn't plain utf-16. But I still need to convert it to utf-8.
Simply, Save As with encoding in Sublime Text Editor.

>>>>

No - you can't do that. The reason is that a lot of the file is not text at all
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 8
  • 6
  • 2
  • +1
Tackle projects and never again get stuck behind a technical roadblock.
Join Now