Link to home
Create AccountLog in
Microsoft Excel

Microsoft Excel

--

Questions

--

Followers

Top Experts

Avatar of Brad Shaffer
Brad Shaffer🇺🇸

Check CSV file encoding for UTF-8 in Excel
Hi All -  Anyone have ideas on how to use excel vba (presumably file system object) to detect if a csv file is in utf-8 ?

Thanks for any opinions here. 

Zero AI Policy

We believe in human intelligence. Our moderation policy strictly prohibits the use of LLM content in our Q&A threads.


Avatar of HainKurtHainKurt🇨🇦

here, there is a function that returns the encoding...

http://www.vbaexpress.com/forum/showthread.php?25095-Solved-Check-the-file-encode-type 

looks like it resolves file and returns

Public Enum abCharsets
abError = 0
abANSI = 1
abUnicode = 2
abUnicodeBigEndian = 3
abUTF8 = 4
ebUnknown = 5
End Enum

Open in new window


Avatar of Brad ShafferBrad Shaffer🇺🇸

ASKER

Thank you.  I ran across that one before - but I'm having trouble with

User defined type not defined on this line:

Public Function ReturnCharset(ByVal filePath As String, Optional verifyANSI As Boolean = True) As abCharsets

Trying to follow the code - I can't tell if there is an additional reference required or just something missing running in Excel 2016.

Any ideas ?


Avatar of Brad ShafferBrad Shaffer🇺🇸

ASKER

Actually - Looks like my declarations were in the wrong place - trying again !

Reward 1Reward 2Reward 3Reward 4Reward 5Reward 6

EARN REWARDS FOR ASKING, ANSWERING, AND MORE.

Earn free swag for participating on the platform.


Avatar of Brad ShafferBrad Shaffer🇺🇸

ASKER

So, the script detects UTF8 ;  if the csv  file is opened and saved as UTF8 in notepad.
If the csv file is created by an upstream system ; and just opened in notepad ; it shows encoding=UTF8 but the script detects as US-ASCII.  
If you save it - then the script detects it.
Unfortunately there are so many csv files to go through - its not feasible to open/save them in notepad.
I'm going to keep searching for other possibilities
Thank you

Avatar of HainKurtHainKurt🇨🇦

are you saying script is not working properly?
can you attach sample files for different formats

Avatar of Brad ShafferBrad Shaffer🇺🇸

ASKER

I'm unable to get it to identify the UTF8 file without me opening the file in notepad ; saving it as encoding=utf8.
If I get a system generated file that appears UTF8 ; it always translates to US ASCII by the script

I've included two files - unfortunately I had to modify them to remove sensitive data .

The Test_US-ASCII.csv is a file that was provided to me appearing to be utf8 but the script identifies as ASCII
(when I first opened in notepad, it showed encoding=utf8) but again script=ascii

The second file was also generated by system - At first the script identified as ASCII.  I opened it, saved it as UTF8 and now the script identifies it as UTF8

Its really not so important that I identify what type it is - I just need to know if it is UTF8 or not UTF8.

The above is how it works on my local c.
Interestingly, if I run it with the files on a file share ; they both come back ASCII
Ideally - my script will loop through all csv files on a file share (if possible) to determine which are UTF8

Thanks for any tips

Test_US-ASCII.csvTest_UTF.csv

Free T-shirt

Get a FREE t-shirt when you ask your first question.

We believe in human intelligence. Our moderation policy strictly prohibits the use of LLM content in our Q&A threads.


Avatar of HainKurtHainKurt🇨🇦

check the file's first 3 chars
if it is "\xef\xbf\xbe", i.e. ï¿¾ in the Latin-1 character set, then it is UTF-8

User generated image
User generated image
maybe this is better works for you...


Avatar of HainKurtHainKurt🇨🇦

or the char codesö from the script

Const bytByte0UTF8_c As Byte = 239
Const bytByte1UTF8_c As Byte = 187
Const bytByte2UTF8_c As Byte = 191 

Open in new window


Avatar of Brad ShafferBrad Shaffer🇺🇸

ASKER

ok. that makes sense viewing them in hex in notepad++
The core problem as I understand it is - The upstream system is creating a csv (presumed in utf8 but maybe not now) and words should translate to this : àdvèrsè  but are showing like this in excel:  Ã dvèrsè   They were then assumed by another programmer on the upstream system to be UTF-8.   So, the task given to me was to loop through a directory of about 100 spreadsheets created daily and identify these spreadsheets that would have these problem characters so they can be handled differently.  Presumably identify those that are UTF-8.

I could send the test file which also comes in as US-ASCII by the script but I have to strip the sensitive content; and the first few characters in hex look like below :

User generated image 

The files that show no character issues come in as US-ASCII by the script and the files that do have character issues come in as US-ASCII by the script ; so either UTF8 is not the problem or there is a required script adjustment ; or the BOM characters are not being set when the CSV is created.

Do you think there is a better way to identify these csv files that are going to have characters like above because using this method the csv files with problem characters are looking the same encoding as those without problem characters ?

Thanks again for all your time. 

Reward 1Reward 2Reward 3Reward 4Reward 5Reward 6

EARN REWARDS FOR ASKING, ANSWERING, AND MORE.

Earn free swag for participating on the platform.


Avatar of Brad ShafferBrad Shaffer🇺🇸

ASKER

Correction ; Here are the first few characters of the file with problem characters before making any changes:
User generated imageAnd although the first characters would have changed, after I stripped out sensitive content ; attached  is a test file showingEE_test.csv

ASKER CERTIFIED SOLUTION
Avatar of HainKurtHainKurt🇨🇦

Link to home
membership
Log in or create a free account to see answer.
Signing up is free and takes 30 seconds. No credit card required.
Create Account
Microsoft Excel

Microsoft Excel

--

Questions

--

Followers

Top Experts

Microsoft Excel topics include formulas, formatting, VBA macros and user-defined functions, and everything else related to the spreadsheet user interface, including error messages.