[Last Call] Learn how to a build a cloud-first strategyRegister Now


Data Import from a CSV file

Posted on 2014-08-07
Medium Priority
Last Modified: 2014-08-07
I have been supplied with a text file with comma separated values which require update to a database.

When read in programmatically the string has an ASCII Null between every character
eg.   D A V 0 0 1 , J u l i e , D a v i e s - J o n e s ,

In notepad the file displays correctly as DAV001,Julie,Davies-Jones

The supplier of the file is downloading the file from a web site of an international company so I am unable to contact the originator.

Since Notepad is automatically stripping out the nulls is there a Windows API function I can utilise to replicate the Notepad functionality
Question by:TomPreen
LVL 30

Accepted Solution

Olaf Doschke earned 2000 total points
ID: 40245964
Notepad is not striping out the NULLs, but notepad is capable to identify Unicode, while fox is not.

The simple solution: Stripe off chr(0):
ansi = chrtran(unicodetext,chr(0),'')

Open in new window

The correcter solution
ansi = ?Strconv(unicode,6,1252,1)

Open in new window

You'll have an additional problem, as most probably notepad added a few bytes called BOM (byte order mark) marking the rest of the file as Unicode, so record 1 pr the first field name might contain unwanted characters.

Another simple solution is to save again with notepad via Save As, then set the encoding to ANSI instead of Unicode.

Bye, Olaf.
LVL 23

Expert Comment

ID: 40245968
What i normally do if i have converting issues is trying to open the CSV in Excel because it holds it's original information and then copy and paste it into NotePad. Normally this does do the trick. The only other option is to ask for a correctly configured file..

Author Closing Comment

ID: 40246116

 I have done hundred of imports using Fox Pro over the last 10 years and I have never seen this issue before.
Prepare for your VMware VCP6-DCV exam.

Josh Coen and Jason Langer have prepared the latest edition of VCP study guide. Both authors have been working in the IT field for more than a decade, and both hold VMware certifications. This 163-page guide covers all 10 of the exam blueprint sections.

LVL 30

Expert Comment

by:Olaf Doschke
ID: 40246260

more typical for internet services is UTF-8 encoding, which encodes the most commonly used 128 characters (eg letters, digits) in the same way as ASCII and many ANSI codepages. That's why you might never have stumbled upon this, even if you already precessed UTF-8 data only using these 128 characters.

Unicode (specifcally ISO 10646) is capable to encode more characters, it defines 17 planes with each potentially 65535 characters. Unicode Version 7 is "just" using 113.021 of theses potential codes.

MS mostly uses UTF-16 for Unicode, which is using 16bits or two bytes for most characters, no characters are encoded with a single byte, also the simple ones of ASCII or of ANSI codepages, so you surely stumble upon this encoding problem the first time you get UTF-16 data.

Character/Text/Data encoding (in the sense of codepages, not in the sense of cryptographic encoding) is a topic you should be aware of at least, even if you stumble upon this for the first time. Unicode is one of the weaknesses of VFP you can only partly cope with. You can easily store unicode in binary fields, but the native VFP controls will not display unicode and you have all kind of symptoms with it. Some operations will irreversably change unknown chars with '?' and so you always better keep original files for postprocessing into something VFP can work with, once such conversion losses in data are recognized.

Bye, Olaf.

Author Comment

ID: 40246393
The file in question had one other thing that I have not seen.

The first two characters were ASCII 255 and ASCII 254.

Is this a signal that it is Unicode?

Also Excel does not recognize the file as a CSV file in the conventional way by creating columns for each comma separated variable, all values for each line appeared in column A separated by commas
LVL 30

Expert Comment

by:Olaf Doschke
ID: 40246421
>The first two characters were ASCII 255 and ASCII 254.
This is the BOM I mentioned earlier. Look that up in Wikipedia and you get it explained.

>Is this a signal that it is Unicode?
Yes and no. It's just a few applications setting and expecting a BOM at the begin of a file. In other cases files may simply be Unicode without any hint in itself. Today XML is more common and has standard tags and attributes defining the encoding.

You may use the first two bytes as indicator, but not as proof.

Bye, Olaf.
LVL 46

Expert Comment

ID: 40246432
That might be the BOM (byte order marker)

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Entering time in Microsoft Access can be difficult. An input mask often bothers users more than helping them and won't catch all typing errors. This article shows how to create a textbox for 24-hour time input with full validation politely catching …
Microsoft's Excel has many features that most people will never need nor take advantage of.  Conditional formatting is one feature that you may find a necessity once you start using it.
Viewers will learn how to maximize accessibility options in an Excel workbook for users with accessibility issues.
This is Part 3 in a 3-part series on Experts Exchange to discuss error handling in VBA code written for Excel. Part 1 of this series discussed basic error handling code using VBA. http://www.experts-exchange.com/videos/1478/Excel-Error-Handlin…

829 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question