Tech or Treat! Write an article about your scariest tech disaster to win gadgets!Learn more

x
?
Solved

Handling UTF8 Unix flat file, while maintaining international text characters

Posted on 2004-09-17
8
Medium Priority
?
1,800 Views
Last Modified: 2008-03-10
I am trying to write a series of VBA macros to parse and validate flat files, ready for import into a database.  The database, file format and technology are not changeable.

The flat file is sent to us, containing data that should be added, as a Unix text file, with UTF-8 encoding and no Byte Object Marker.  However, it can contain any and all number of foreign characters (Chinese, Taiwanese, Japanese, Sri Lankan, Arabic, etc).  

In its current state, my Excel application can read data, validate and recompile the desired output to another sheet, ready for export, without affecting the characters.  

The primary problem occurs when I try to export the file.  The exported file format has to match that of the imported file.  When I view the original file in SC Unipad, I can see the file is a Unix UTF-8 file, with no Byte Order Marker.  The output file, however, is a Unix UTF-16 Little Endian file, with a Byte Object Marker.

I am using the File System Object; creating the file with
    Set FileObj = FSO.CreateTextFile(OutputFile, Overwrite, True)

and writing with
    FileObj.Write (Sheets(OutSheet).Cells(RowCount, 1).Value)
    FileObj.Write EndLine                                                        ' Endline = Chr(10)

to write each individual row.

I have tried using the WideCharToMultiByte() external function (with Codepage = 65001) in a UTF8Encode() function, as suggested in previous questions, so the write line is
    FileObj.Write (UTF8Encode (Sheets(OutSheet).Cells(RowCount, 1).Value))

but this still creates a UTF-16 file and the foreign characters are lost in favour of ANSI(?) counterparts.

I figure the UTF-16 output might be due to the Unicode flag in FSO.CreateTextFile() being set to True, but I cannot see any other way of maintaining the characters.

I would prefer to avoid opening SC Unipad in a shell and using it to open the UTF-16 file, remove the BOM and convert to UTF-8, as it would rely too heavily on third party software.

Any help anyone can give would be greatly appreciated...
0
Comment
Question by:jimbobmcgee
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 3
8 Comments
 
LVL 16

Author Comment

by:jimbobmcgee
ID: 12142251
Anybody?

...

:)
0
 
LVL 16

Author Comment

by:jimbobmcgee
ID: 12190911
Please?

...

:)
0
 
LVL 3

Accepted Solution

by:
Arundel_Castle earned 2000 total points
ID: 12200047
Try this link Jim Bob. :)

http://www.cyberactivex.com/UnicodeTutorialVb.htm

By the way my name is also Magee. Magee and McGee are the same name.
0
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 3

Expert Comment

by:Arundel_Castle
ID: 12200056
Thanks for the points Jim, by the way my first name is also James:) Though I have no bob.
0
 
LVL 3

Expert Comment

by:Arundel_Castle
ID: 12200075
Actually thats a pretty interesting question in working with multinational encoding from a unix based system.
0
 
LVL 3

Expert Comment

by:Arundel_Castle
ID: 12200385
You don't have to give me the extra points for this question Jim. But it would be nice if you did. I would reach my experts total. I haven't used this site in a couple of years but I was an expert before.
0
 
LVL 16

Author Comment

by:jimbobmcgee
ID: 12200599
>> Actually thats a pretty interesting question in working with multinational encoding from a unix based system.

It was a pain!!  I think my original problem was due to opening the text stream in unicode (utf16).  The second problem I had was due to my UTF8Encode function using WideCharToMultiByte and then returning a Left(...) on the result.  I think this converted the string back to UTF16.

Either way, all cleared up by the link you posted...
0
 
LVL 3

Expert Comment

by:Arundel_Castle
ID: 12200765
Thanks for the points Jim, by the way it does sound like a nightmare to write for. Thanks again.
0

Featured Post

VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

There are many ways to remove duplicate entries in an SQL or Access database. Most make you temporarily insert an ID field, make a temp table and copy data back and forth, and/or are slow. Here is an easy way in VB6 using ADO to remove duplicate row…
If you have ever used Microsoft Word then you know that it has a good spell checker and it may have occurred to you that the ability to check spelling might be a nice piece of functionality to add to certain applications of yours. Well the code that…
Get people started with the process of using Access VBA to control Excel using automation, Microsoft Access can control other applications. An example is the ability to programmatically talk to Excel. Using automation, an Access application can laun…
This lesson covers basic error handling code in Microsoft Excel using VBA. This is the first lesson in a 3-part series that uses code to loop through an Excel spreadsheet in VBA and then fix errors, taking advantage of error handling code. This l…
Suggested Courses

647 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question