FOPEN( ) Function
http://msdn2.microsoft.com
FEOF( ) Function
http://msdn2.microsoft.com
FGETS( ) Function
http://msdn2.microsoft.com
FCLOSE( ) Function
http://msdn2.microsoft.com
Main Topics
Browse All TopicsHello experts. Have a csv file that I download and parse every day to create a products table.
The csv file contains 26 fields and about 11,000 records. It is (automatically) run through several prgs and several tables are created using the data from the file. This takes around 45 minutes to complete.
The problem is recently the supplied csv file has, from time to time, had a few records with incomplete data. Some records are missing several fields, creating a file that cannot be correctly parsed. This creates a particular problem when, for example, the import process is running unattended and I return an hour later just to find that there is an error in the csv file and the import has halted.
So my question is, can I run this csv file through some sort of 'verification' process first; something that would quickly check the number of columns before each line feed and provide me with a report of those lines that fail? And if so, how to do?
This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.
Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.
If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.
Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.
Access the answers to your technology questions today.
30-day free trial. Register in 60 seconds.
Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Try it out and discover for yourself.
30-day free trial. Register in 60 seconds.
Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.
FOPEN( ) Function
http://msdn2.microsoft.com
FEOF( ) Function
http://msdn2.microsoft.com
FGETS( ) Function
http://msdn2.microsoft.com
FCLOSE( ) Function
http://msdn2.microsoft.com
A really easy way is to create a cursor with one memo field, append one blank record to it then issue Append Memo <fieldname> From <filename> . You now have the entire file in a memo field that you can step thorugh a line at a time and examine. For example you can use ALINES() to load it into an array or just step through it with MLINE. You can count commas in each line with OCCURS() to locate records with incorrect field counts. You can throw "bad" lines into the memo field of another cursor to review later and append all the "good" lines together and re-export them as csv and then feed the "scrubbed" version to your regular import process. We do this a lot.
Each field is enclosed in quotes, i.e. "field" and separated by a comma, i.e. "field1","field2"
Some fields do contain quotes within the field data.
I am not sure of the difference between a line feed or a return, or how to tell which is used, but one is used to end each line.
Additionally, there are three lines of junk data (column names, etc) at the top of the file that I would like to ignore before processing, if possible.
Sample File:
junk (file date, etc)
more junk (file origination, etc.)
columnname1, columnname2
"record1-field1","record1-
"record2-field1","record2-
This example is a complete program that will read and store your data from your text file to an array in memory. The array litterly becomes dymanic in that it will expand as your file grows larger - I hate to waste space. Additionally, it will read and store up to ten columns from the file as well. This program could easily be scaled down but I wanted to give you plenty of exmples and comments to help you down your road.
CharlesWard, I really appreciate your efforts.
I did create a prg, however when I ran it, it looked as though somewhere something got stuck, perhaps in a loop somewhere. Let me explain.
The FP prog I use was originally started by someone who is no longer with us (not dead, just no longer woks here ...). In this program, I do not get the typical foxpro window, so output that prints to the screen never appears.
I know somehow the FP window is supressed in the main prg, and I have tried to re-enable it so I can run the prg in the standard FP window, but with no luck.
As such I am also unable to use any of the debugging tools, which I'm sure would have saved me much time over the years. But that is a different matter...
In any event, I placed a button on a form to call the prog that you provided. However, once the prog started running, it just never ended (I let it run for over two hours while I was gone). With no visible output I can't really tell what or where something is happening or not happening.
I am guessing perhaps the problem has to do with the large file size - the csv file contains 26 fields and about 11,000 records. Can my PC hold all this data in an array with only 1GB RAM (the csv filesize itself is just 6.5MB)?
That file size is not large at all, nor the record count or the field count, especially considering the fact that you are using VFP9. I do these types of quicky programs myself using the low lefile file functions, whicj is why I recommended it to begin with.
One area in the low level file functions I'm particularly sensitive to is in the use of the FGETS() function, which I depend on heavily because it automatically grabs the next chunk of data until it finds a line feed and carriage return combination. The caveat though is to keep in mind that by default it grabs 254 bytes. If you hapen to have some long text strings within those 26 fields that force the byte count over 254 bytes, you might need to provide a higher number to the FGETS() function so that it doesn't limit itself to only 254 bytes before it stops looking for that line feed/carriage reeturn combination. VFP9 allows us to specify up to 8,192 bytes at a time. I don't know if this is part of the problem with the above code. But, as I said, I am sensitive to it as I have been caught by the lower 254 byte default in the past. Just add a number higher than the assumed default and give the code another try. At least you can then rule that part out as the only problem if it doesn't work or behaves differently.
FGETS(xmFileHandle ,1024)
FGETS( ) Function
http://msdn2.microsoft.com
FGETS( ) returns a maximum of 8192 bytes. If you omit nBytes, FGETS( )returns 254 bytes by default.
It is looping forever somewhere in here:
IF xmColumn <= 10 && Only for array columns <= 10
xmArray [ xmPointer, xmColumn ] = LEFT( xmStringIn, xmAtPos ) && Store the data to the array
xmNewString = SUBSTR( xmStringIn, xmAtPos + 2, LEN( xmStringIn ) ) && Remove the Stored Data from the remaining data
xmStringIn = xmNewString && Resultant data is no thw search string again
&& This line will remove the heading and trailing " from the data
&& xmArray [ xmPointer, xmColumn ] = SUBSTR( xmArray [ xmPointer, xmColumn ], 2, LEN( xmArray [ xmPointer, xmColumn ] ) - 2 )
ENDIF
The counter reached 50,000 in just a matter of seconds. There are only 11,000 records so I'm assuming this must be where the problem lies (unless where I placed the counter would result in it counting something other than actual records).
DO WHILE AT( '",' , xmStringIn ) > 0 && The Search Delimiter is ",
xmAtPos = AT( '",' , xmStringIn ) && What is the location of the delimiter
xmColumn = xmColumn + 1 && Which Array column do you want to place the value in?
WAIT WINDOW NOWAIT "Processing, please wait..." + ALLTRIM(STR(lnCount))
IF xmColumn <= 10 && Only for array columns <= 10
xmArray [ xmPointer, xmColumn ] = LEFT( xmStringIn, xmAtPos ) && Store the data to the array
xmNewString = SUBSTR( xmStringIn, xmAtPos + 2, LEN( xmStringIn ) ) && Remove the Stored Data from the remaining data
xmStringIn = xmNewString && Resultant data is no thw search string again
&& This line will remove the heading and trailing " from the data
&& xmArray [ xmPointer, xmColumn ] = SUBSTR( xmArray [ xmPointer, xmColumn ], 2, LEN( xmArray [ xmPointer, xmColumn ] ) - 2 )
ENDIF
lnCount = lnCount + 1
ENDDO
The loop cycles thru Search Delimiters. Therefore your counter counts Search Delimiters, not records.
If there is a record with more than 10 Search Delimiters xmColumn will eventually reach 11. After that xmStringIn will not be changed, and the loop will not terminate.
Please use the code snippet. It makes code reading easier.
I know I'm missing something here, as I'm sure this should be fairly simple. I've changed all the 10s throughout the file to 26 to match the number of fields in the file.
Now it begins to process the file, however it hangs on record #1754 every time. And it doesn't matter what the contents of line 1754 are, I have deleted it over and over, deleted it along with 5 preceding and succeeding lines, still it hangs on line 1754.
Frmadmirer,
The answer as to if your PC can hold that much data depends on the amount of free stack and memory space available with the other applications running on your system. Additionally, there are additional settings that can be configured in the config.fpw file.
The example works fine on smaller files - probably less than 1,000 lines without issue. Keep in mind, if it was me retrieving those values, I would probably save them to a database table as I was retrieving them - as I read a value I would write them out to a table, maybe a temporary table with the structure line:
Row Numeric 8
Column Numeric 8
TextVal Character 80
Then as I retrieved a value, I would create an entry into the table and place that informaiton into the table. As I don't know all the types of data you are retrieveing, I am assuming strictly text. By doing that you would not utilize an array.
****
My guess is that what is happening to you is that as the array grows larger, each time it is resized takes cpu time - you may want to use the following example instead of the one in the example that I gave you - it will probably make it considerable faster if you have that many records you are reading:
IF xmPointer > xmMax
xmMax = xmMax + 1000 &&resize by an additional 1000 elements
DECLARE xmArray [ xmMax, 26 ]
ENDIF
At a resize adjustment of every 26 as in the previous example, at 17,000+ records the program was resizing 653+ times for multiple elements and then reinitilizing those new entries each time, probably swapping to hard drive at times. By using the above line change and by dropping the initilization of the additional 1000 new elements, the software will only resize about 16 times for the same 1000 rows.
As I said, the code example was basically designed to show you how to traverse the text file. Another thought for you is to add the following line everytime it resizes so you can monitor for testing purposes:
Messagebox ( 'Currently Resizing To: ' + AllTrim ( Str ( xmMax ) ) )
This would appear of course over 17 times but it would allow you to monitor the progress of the file until it had completed.
Charles
CharlesWard,
Thanks again for the suggestions. I tried changing to 1000, but it still hangs at exactly 1074 like before. I'm not sure what to try next.
temdley,
I then gave your suggestion a whirl. All looked good, especially since I had been unable to get past 1074 records before, but then at around 2200 records the loop started slowing down dramatically, and at around 2800 records I got:
File c:\..... temp is too large.
at which point I had to cancel. Tried it again, same results.
Any suggestions?
tmedley,
Have worked with yours some more, placing the data into tables instead of cursors. I can now process the entire file, however I wind up with around 2550 error lines (there are actually about 4) in the errors table and 1713 in the scrubbed table. The other 7500 or so lines are not in either.
I am not very skilled in FP, but I was wondering if the problem might be related to the fact that the last field one each line does not end with the "," that is used to separate all the other fields, and that perhaps when detecting/counting the delimiters this should be taken into account?
"record1-field1","record1-
"record2-field1","record2-
Since some fields contain commas withing the data, I changed from looking just for commas to looking for "," .
CharlesWard,
I am still working with yours as well, as I have not yet been able to get either of the solutions presented to work in my particular situation/with my data.
Sadly I have worked for more than a week with many, many incarnations of the suggestions here, but still have nothing that works and no way to verify the file.
Because this must be completed as it is delaying other work, I am turning this over to a contract programmer. I do, as always, appreciate the efforts and help that were provided.
Sorry you need to do that but fully understand. This is generally something VFP is really good at but when you are trying to import data and the values in fields sometimes contain the delimiter itself it always gets messy. If you want to email me the raw file that is stopping at 1074 with 1 method and generating way to many reported errors with the other method I would be glad to give it a look and try to post a snippet back here that works or at least tell you what is generating the bad behavior.
tmedley@goibs.com
Using email to resolve a problem
http://www.experts-exchang
Trying this again...
Some of the suggestions here were very helpful in my previous attempt at rectifying this problem, and I felt like we were on the verge of victory. Sadly though to date there is still no working solution that will allow me to verify the integrity of the csv file prior to it simply failing.
I am once again working with the script that this error affects, so I am revisiting this problem in hopes of finding a suitable solution.
Any help greatly appreciated!
Business Accounts
Answer for Membership
by: CarlWarnerPosted on 2007-12-19 at 12:22:32ID: 20502575
You could write a simple Fox routine using low level file functions to scream on through the entire CSV file one line at a time checking quickly that each line in the CSV file matches the rules you specify; i.e., a minimum or maximum length requirement or a count of delimiters or the like.
Use FOPEN() to create an open file handle to that CSV file and then use the file handle for subsequent operations like FGETS() to put a single line in a memory variable for your evaluation.