• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 2904
  • Last Modified:

Detect if a text file contains single-byte or unicode.

I need to detect whether or not a text file contains unicode text or single-byte text. After a little research I suspect it might be something as simple as counting the number of high ASCII characters and, if more than some small number, assume the file is unicode. Is there a better way? Is there an API call to detect the type of text in a string or byte array?

Kevin
0
zorvek (Kevin Jones)
Asked:
zorvek (Kevin Jones)
4 Solutions
 
mvidasCommented:
Kevin,

Take a look at http://codesnipers.com/?q=node/68
Gives some good information about determining if it is or not.  It does mention the IsTextUnicode API (http://www.ex-designz.net/apidetail.asp?api_id=471 ) though it seems with newer unicode types it is not compatable.  Looks like you're gonna have to build a function for it.  I'd offer to help, but I know you know what you're doing.

Of course, there are similar functions at http:Q_21836497.html#16611812 though they don't look to be as detailed as the article at codesnipers seems to say they should be.

Matt
0
 
EDDYKTCommented:
does this work?


Private Function IsUnicode(s As String) As Boolean


      If Len(s) = LenB(s) Then
         IsUnicode = False
      Else
         IsUnicode = True
      End If
   End Function
0
 
nffvrxqgrcfqvvcCommented:
Option Explicit

Private Declare Function IsTextUnicode Lib "advapi32" ( _
    ByVal lpBuffer As String, _
    ByVal cb As Long, _
    lpi As Long) As Long

Public Function isUni(bchar As String) As Boolean
   If Len(bchar) > 1 Then
    isUni = IsTextUnicode(ByVal bchar, 4, &HF)
    Else
    'You must enter atleast 2 bytes to check
   End If
End Function
0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
zorvek (Kevin Jones)ConsultantAuthor Commented:
Sorry...not done yet. I am still trying to get the code from egl1044 to function correctly. I'll be posting more comments soon seeking additional assistance with this.

Kevin
0
 
[ fanpages ]IT Services ConsultantCommented:
PS.
[ http://www.experts-exchange.com/Programming/Programming_Languages/Cplusplus/Q_20136466.html ]

"...Since the sequence 0xFEFF is exceedingly rare at the outset of regular non-Unicode text files, it can serve as an implicit marker or signature to identify the file as a Unicode file. Applications that read both Unicode and non-Unicode text files should use the presence of this sequence as an indicator that the file is most likely a Unicode file. (Compare this technique to using the MS-DOS EOF marker to terminate text files.)

When an application finds 0xFEFF at the beginning of a text file, it typically processes the file as though it were a Unicode file, although it may also perform further heuristic checks to verify that this is true. Such a check could be as simple as testing whether the variation in the low-order bytes is much higher than the variation in the high-order bytes. For example, if ASCII text is converted to Unicode text, every second byte is zero. Also, checking both for the linefeed and carriage-return characters (0x000A and 0x000D) and for even or odd file size can provide a strong indicator of the nature of the file.

When an application finds 0xFFFE at the beginning of a text file, it interprets it to mean the file is a byte-reversed Unicode file. The application can either swap the order of the bytes or alert the user that an error has occurred.

The Unicode byte-order mark character is not found in any code page, so it disappears if data is converted to ANSI. Unlike other Unicode characters, it is not replaced by a default character when it is converted. If a byte-order mark is found in the middle of a file, it is not interpreted as a Unicode character and has no effect on text output.

The Unicode value 0xFFFF is illegal in plain text files and cannot be passed between Win32 functions. The value 0xFFFF is reserved for an application's private use."


BFN,

fp.
0
 
zorvek (Kevin Jones)ConsultantAuthor Commented:
I still have not had time to get this to work. My tests thus far have proven that it does not work but I do not yet have enough information to post follow-up information/questions. As none of the above answers have been proven to work I can therefore not allow any of them to be selected as an answer as that will provide false information to future viewers of this question. I also do not have the time right now, not the appropriate Windows installations, to fully test the above scenarios or any derivatives of such.

I therefore ask that the question either be left alone for the time being or deleted. If deleted I will repost at a later date with as much of the information above as is relevant.

Remember that being a responsible EE member is not just maintaining questions, it's making sure the EE database provides good information to future viewers.

Kevin
0
 
zorvek (Kevin Jones)ConsultantAuthor Commented:
I have not been able to get any of the above solutions to work yet. But I am confident an answer does lie somewhere above. The problem I have is the machine I need to test these potential solutions is only occasionally available to me and I am being pulled in other directions. I, like you, like a clean TA and try to encourage askers to clean up sooner versus later. But I also appreciate the occasional difficult situation and the need to add good content to the database.

So, for the record, I am confident that an answer to this problem lies above. However, I have been unable to get any of the above answers to work reliably. By closing the question I will be unable to post additional information after one week so the final correct answer will remain a challenge for any who follow.

Since you have forced my hand (I don't want the above information deleted) I'm going to mark all of the answers above as correct and you, Mr. Rollins, can live with the fact that the database now has one more incomplete thread.

Kevin
0

Featured Post

Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Tackle projects and never again get stuck behind a technical roadblock.
Join Now