Question

Verify Integrity of CSV File

Asked by: formadmirer

Hello experts. Have a csv file that I download and parse every day to create a products table.
The csv file contains 26 fields and about 11,000 records. It is (automatically) run through several prgs and several tables are created using the data from the file. This takes around 45 minutes to complete.

The problem is recently the supplied csv file has, from time to time, had a few records with incomplete data. Some records are missing several fields, creating a file that cannot be correctly parsed. This creates a particular problem when, for example, the import process is running unattended and I return an hour later just to find that there is an error in the csv file and the import has halted.

So my question is, can I run this csv file through some sort of 'verification' process first; something that would quickly check the number of columns before each line feed and provide me with a report of those lines that fail? And if so, how to do?

This Question has been solved and asker verified All Experts Exchange premium technology solutions are available to subscription members.

Subscribe now for full access to Experts Exchange and get

Instant Access to this Solution

  • Plus...
  • 30 Day FREE access, no risk, no obligation
  • Collaborate with the world's top tech experts
  • Unlimited access to our exclusive solution database
  • Never be left without tech help again

Subscribe Now

Asked On
2007-12-19 at 11:44:40ID23034294
Tags

csv

,

checking

,

files

,

verify

Topic

FoxPro Database

Participating Experts
5
Points
500
Comments
28

Trusted by hundreds of thousands everyday for fast, accurate and reliable tech support.

  • "The time we save is the biggest benefit of Experts Exchange to Warner Bros. What could take multiple guys 2 hours or more each to find is accessed in around 15 minutes on Experts Exchange." Mike Kapnisakis, Warner Bros.
  • "Our team likes having a resource that is more secure than just using Google and most experts using this service really know their stuff. It's nice to look here first versus using Google." Dayna Sellner, Lockheed Martin
  • "Anytime that I've been stumped with a problem, 9 out of 10 times Experts Exchange has either the accepted solution or an open discussion of the potential solution to the problem." Kenny Red, eBay Inc.

See what Experts Exchange can do for you.

Got a question?

We've got the answer.

Experts Exchange has been collecting answers to technology questions since 1996…3 million and counting! If you have a question, chances are we already have your answer.

Screenshot of Experts Exchange Knowledgebase

Need individual assistance?

Our experts are ready to help.

If you can't find the exact answer you're looking for, ask our exclusive community of 50,000 experts. You’ll get a personalized answer from a trusted professional.

Screenshot of Experts Exchange Knowledgebase

Want to learn from the best?

Read articles from industry experts.

Thousands of free tech tips, tricks, how-to’s and tutorials are available in our peer reviewed articles section. See for yourself how smart our experts are, no login required.

Screenshot of an Article

Working on a long term project?

Store your work and research.

Save solutions to your questions, answers you’ve discovered through searching plus helpful articles in your personal knowledgebase for easy future access.

Screenshot of Experts Exchange Knowledgebase

Access the answers to your technology questions today.

Subscribe Now

30-day free trial. Register in 60 seconds.

What Makes Experts Exchange Unique?

Members of the expert community talk about why the experience at Experts Exchange is different than what you will find anywhere else.

Trusted by the world's most respected brands.

image of each brand's logo

Faithfully serving IT professionals since 1996.

Experts Exchange Logo

Try it out and discover for yourself.

Subscribe Now

30-day free trial. Register in 60 seconds.

Related Solutions

  1. Install Windows XP Pro SP1 unattended (winnt32) inclu…
    Hi I need some information on how to do the following. I wanna install Windows XP Pro SP1 unattended (winnt32) including dynamic updates and integrate with SP2. The whole installation should be done over the network, connecting to Share folders, but the most important is ins...
  2. How do I integrate RSS feed into the SharePoint site?
    How do I integrate RSS feed into the SharePoint site? I am looking for a RSS feed of free stock ticker and breaking news. Even if I have found oone, how do I implement it in Sharepoint? Is there anyone may offer some ideas?

Free Tech Articles

  1. WARNING: 5 Reasons why you should NEVER fix a computer for free.
    It is in our nature to love the puzzle. We are obsessed. The lot of us. We love puzzles. We love the challenge. We thrive on finding the answer. We hate disarray. It bothers us deep in our soul. W...
  2. SCCM OSD Basic troubleshooting
    SCCM 2007 OSD is a fantastic way to deploy operating systems, however, like most things SCCM issues can sometimes be difficult to resolve due to the sheer volume of logs to sift through and the dispe...
  3. Migrate Small Business Server 2003 to Exchange 2010 and Windows 2008 R2
    This guide is intended to provide step by step instructions on how to migrate from Small Business Server 2003 to Windows 2008 R2 with Exchange 2010. For this migration to work you will need the fo...
  4. Create a Win7 Gadget
    This article shows you how to create a simple "Gadget" -- a sort of mini-application supported by Windows 7 and Vista. Gadgets can be dropped anywhere on the desktop to provide instant information, ...
  5. Outlook continually prompting for username and password
    There have been a lot of questions recently regarding Outlook prompting for a username and password whilst using Exchange 2007. There are a few reasons why this would happen and I will try to cover t...
  6. Backup Exchange 2010 Information Store using Windows Backup
    There seems to be quite a lot of confusion around the ability to backup Exchange 2010 using the built in Windows Backup feature. This stems from the omission of this feature prior to Exchange 2007 s...

Cloud Class Webinars

  1. Avoiding Bugs in Microsoft Access
    Alison Balter takes and in-depth look at avoiding bugs in Access. In this webinar you will learn about using the immediate window to debug your applications, invoking the debugger, using breakpoints to troubleshoot, stepping through code, setting the next statement to execute, ...
  2. Top 10 Best New Features in Visio 2010
    Scott Helmers gives live demonstrations of the top 10 new features in Visio 2010. This webinar will teach you how to create compelling diagrams by adding shapes to the page with a single click, linking the shapes in a diagram to data in Excel (or SQL Server, or SharePoint), ...
  3. IT Consultant Business Secrets Revealed
    Michael Munger, Experts Exchange tech pro and IT consultant, pulls back the curtain on his very successful businesses and answers question on every IT consultant and business owner should know about. He shares secrets on what he did to solve the 5 most common problems in IT, ...
  4. Disaster Recovery and Business Continuity
    Quest CTO, Mike Billon, gives an overview of the steps involved in building a dunamic disaster recovery plan. Through case studies and an examination of software/hardware tooles for monitoring and testing, you'll gain a better understandin of where you are, where you want ...
  5. Organize Your Visio Diagrams with Containers and Lists
    Scott Helmers uses cross functional flowcharts, wireframe diagrams, data graphic legends and seating charts to teach you: how to ustilize all three new structured diagram components in Visio 2010, the best practices for organizeing shapes in previous version of Visio, how to organize ...
  6. How to Us Objects, Properties, Events and Methods in Microsoft Access
    Alison Dalter gives an in-depbth look at objects, properties, events and methods in Microsoft Access. In this webinar you will learn about using the object browser, referring to objects, working with properties and methods, working with object variables, understanding the ...

Join the Community

Give a Little. Get a Lot.

Join the community of experts here and help other tech pros by answering question in your area of expertise. You can earn FREE access to all Experts Exchange's premium features and resources.

Join the Community

Answers

 

by: CarlWarnerPosted on 2007-12-19 at 12:22:32ID: 20502575

You could write a simple Fox routine using low level file functions to scream on through the entire CSV file one line at a time checking quickly that each line in the CSV file matches the rules you specify; i.e., a minimum or maximum length requirement or a count of delimiters or the like.

Use FOPEN() to create an open file handle to that CSV file and then use the file handle for subsequent operations like FGETS() to put a single line in a memory variable for your evaluation.

 

by: CaptainCyrilPosted on 2007-12-19 at 20:39:20ID: 20504868

You can also create the cursor and use the APPEND FROM data.csv TYPE CSV command. And then search the fields in the cursor for missing fields.

I have examples of data around 32,000 to 65,000 lines and it parses it in less than a minute.

 

by: tmedleyPosted on 2007-12-20 at 11:51:26ID: 20509378

A really easy way is to create a cursor with one memo field, append one blank record to it then issue Append Memo <fieldname> From <filename> . You now have the entire file in a memo field that you can step thorugh a line at a time and examine. For example you can use ALINES() to load it into an array or just step through it with MLINE. You can count commas in each line with OCCURS() to locate records with incorrect field counts. You can throw "bad" lines into the memo field of another cursor to review later and append all the "good" lines together and re-export them as csv and then feed the "scrubbed" version to your regular import process. We do this a lot.

 

by: CharlesWardPosted on 2007-12-21 at 10:16:57ID: 20515358

Can you give me an example of how the data is represented in the file, the delimiters ( what seperates one column from another ) ect.  This verification routine will be about 10 lines or less.

 

by: formadmirerPosted on 2008-01-03 at 07:36:31ID: 20573550

Each field is enclosed in quotes, i.e. "field" and separated by a comma, i.e. "field1","field2"
Some fields do contain quotes within the field data.

I am not sure of the difference between a line feed or a return, or how to tell which is used, but one is used to end each line.

Additionally, there are three lines of junk data (column names, etc) at the top of the file that I would like to ignore before processing, if possible.

Sample File:

junk (file date, etc)
more junk (file origination, etc.)
columnname1, columnname2
"record1-field1","record1-field2"
"record2-field1","record2-field2"

 

by: CaptainCyrilPosted on 2008-01-03 at 07:54:21ID: 20573741

Usually at the end of each line you find CRLF which is Carriage Return and Line Feed

CHR(13)+CHR(10)

If you read the file with APPEND FROM or FGETS you will not need to worry about CRFL.

OR

#DEFINE CRLF CHR(13)+CHR(10)

 

by: CharlesWardPosted on 2008-01-03 at 09:09:45ID: 20574446

This example is a complete program that will read and store your data from your text file to an array in memory.  The array litterly becomes dymanic in that it will expand as your file grows larger - I hate to waste space.  Additionally, it will read and store up to ten columns from the file as well.  This program could easily be scaled down but I wanted to give you plenty of exmples and comments to help you down your road.

LOCAL   xmFileHandle, xmFileResult, xmStringIn, xmAtPos, xmPointer, xmArray, xmMax, xmColumn, xmLoop, ymLoop, xmNewString
 
xmFileHandle = FOPEN( 'C:\test.txt' )   && Open the input file
 
xmMax = 10   && Maximum array size allowed - this is dymanic in this example
xmPointer = 0	&& Variable to hold the current array pointer
DECLARE xmArray [ xmMax, 10 ]  && Declare an array to place the column values read from the file
 
FOR xmLoop = 1 TO xmMax   && Always a good idea to initilize your variables and arrays
    FOR ymLoop = 1 TO 10
        xmArray [ xmLoop, ymLoop ] = ''
    NEXT 
NEXT 
 
DO WHILE FEOF ( xmFileHandle ) = .F.	  && Start at the top of the file all the way to the end
   xmStringIn = ALLTRIM( FGETS( xmFileHandle ) )  && Read a line from the test file
   
    IF LEFT( xmStringIn, 1 ) = '"' .AND. RIGHT( xmStringIn, 1 ) = '"'	&& Only Conerned about the lines with DATA
      xmStringIn = xmStringIn + ','										&& Add a comma to the end of the string so all data is formatted "DATA",
      xmPointer  = xmPointer + 1										&& Move to the next Array Position to place the data values
 
      IF xmPointer > xmMax						&& This will resize the array if it needs to be large
         xmMax = xmMax + 10						&& lets resize by an additional 10 elements
         DECLARE xmArray [ xmMax, 10 ]			&& Resize the Array to Make it Larger - it leaves the origional elements alone!
 
         FOR xmLoop = xmPointer TO xmMax		&& Always a good idea to initilize your variables and arrays
             FOR ymLoop = 1 TO 10
                 xmArray [ xmLoop, ymLoop ] = ''
             NEXT 
         NEXT 
      ENDIF 
 
      xmColumn   = 0													&& Reset the column counter for placing data read into the array
 
      DO WHILE AT( '",' , xmStringIn ) > 0				&& The Search Delimiter is ",
         xmAtPos  = AT( '",' , xmStringIn )				&& What is the location of the delimiter
         xmColumn = xmColumn + 1						&& Which Array column do you want to place the value in?
 
         IF xmColumn <= 10								&& Only for array columns <= 10
            xmArray [ xmPointer, xmColumn ] = LEFT( xmStringIn, xmAtPos )		&& Store the data to the array
            xmNewString = SUBSTR( xmStringIn, xmAtPos + 2, LEN( xmStringIn ) )	&& Remove the Stored Data from the remaining data
            xmStringIn = xmNewString											&& Resultant data is no thw search string again
            
			&& This line will remove the heading and trailing " from the data
            && xmArray [ xmPointer, xmColumn ] = SUBSTR( xmArray [ xmPointer, xmColumn ], 2, LEN( xmArray [ xmPointer, xmColumn ] ) - 2 )
         ENDIF 
 
      ENDDO 
 
   ENDIF 
ENDDO 
 
IF xmPointer > 0
   DECLARE xmArray [ xmPointer, 10 ]	&& Last - Resize the Array down to the exact size of the data read so there is no wasted space.
ENDIF 
 
xmFileResult = FCLOSE( xmFileHandle )	&& Close the input file
 
Clear									&& Clear the output device - you would drop this on a form
FOR xmLoop = 1 TO xmPointer				&& Want to see the results?
    ? 'Row: ' + ALLTRIM( STR( xmLoop ) )
    FOR ymLoop = 1 TO 10
        ? '     Column: ' + STR( ymLoop,2 ) + '    Value:' + xmArray [ xmLoop, ymLoop ]
    NEXT 
    ?
NEXT 

                                              
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
38:
39:
40:
41:
42:
43:
44:
45:
46:
47:
48:
49:
50:
51:
52:
53:
54:
55:
56:
57:
58:
59:
60:
61:
62:
63:
64:
65:
66:

Select allOpen in new window

 

by: formadmirerPosted on 2008-01-03 at 21:23:34ID: 20580149

CharlesWard, I really appreciate your efforts.
I did create a prg, however when I ran it, it looked as though somewhere something got stuck, perhaps in a loop somewhere. Let me explain.

The FP prog I use was originally started by someone who is no longer with us (not dead, just no longer woks here ...). In this program, I do not get the typical foxpro window, so output that prints to the screen never appears.
I know somehow the FP window is supressed in the main prg, and I have tried to re-enable it so I can run the prg in the standard FP window, but with no luck.

As such I am also unable to use any of the debugging tools, which I'm sure would have saved me much time over the years. But that is a different matter...

In any event, I placed a button on a form to call the prog that you provided. However, once the prog started running, it just never ended (I let it run for over two hours while I was gone). With no visible output I can't really tell what or where something is happening or not happening.

I am guessing perhaps the problem has to do with the large file size - the csv file contains 26 fields and about 11,000 records. Can my PC hold all this data in an array with only 1GB RAM (the csv filesize itself is just 6.5MB)?

 

by: CarlWarnerPosted on 2008-01-03 at 21:38:19ID: 20580197

That file size is not large at all, nor the record count or the field count, especially considering the fact that you are using VFP9.  I do these types of quicky programs myself using the low lefile file functions, whicj is why I recommended it to begin with.

One area in the low level file functions I'm particularly sensitive to is in the use of the FGETS() function, which I depend on heavily because it automatically grabs the next chunk of data until it finds a line feed and carriage return combination.  The caveat though is to keep in mind that by default it grabs 254 bytes.  If you hapen to have some long text strings within those 26 fields that force the byte count over 254 bytes, you might need to provide a higher number to the FGETS() function so that it doesn't limit itself to only 254 bytes before it stops looking for that line feed/carriage reeturn combination.  VFP9 allows us to specify up to 8,192 bytes at a time.  I don't know if this is part of the problem with the above code.  But, as I said, I am sensitive to it as I have been caught by the lower 254 byte default in the past.  Just add a number higher than the assumed default and give the code another try.  At least you can then rule that part out as the only problem if it doesn't work or behaves differently.

FGETS(xmFileHandle ,1024)

FGETS( ) Function
http://msdn2.microsoft.com/en-us/library/f4ytfbz9(VS.80).aspx

FGETS( ) returns a maximum of 8192 bytes. If you omit nBytes, FGETS( )returns 254 bytes by default.

 

by: formadmirerPosted on 2008-01-03 at 22:32:58ID: 20580326

Still nothing. I tried 1024, then 2048, then all the way to 8192.
So I added a wait window counter and it appears to be hanging on the very first line.

Any ideas?

Honestly, I tried to look, but your code is decades advanced over the little bit I can do, so I haven't a clue...

 

by: formadmirerPosted on 2008-01-03 at 22:42:43ID: 20580343

It is looping forever somewhere in here:


    IF xmColumn <= 10                                                && Only for array columns <= 10
            xmArray [ xmPointer, xmColumn ] = LEFT( xmStringIn, xmAtPos )            && Store the data to the array
            xmNewString = SUBSTR( xmStringIn, xmAtPos + 2, LEN( xmStringIn ) )      && Remove the Stored Data from the remaining data
            xmStringIn = xmNewString                                                                  && Resultant data is no thw search string again
       
                  && This line will remove the heading and trailing " from the data
            && xmArray [ xmPointer, xmColumn ] = SUBSTR( xmArray [ xmPointer, xmColumn ], 2, LEN( xmArray [ xmPointer, xmColumn ] ) - 2 )
         ENDIF

 

by: CarlWarnerPosted on 2008-01-03 at 22:44:19ID: 20580346

First off, that's not my code.  I would try something different.  But, it always depends on the problem I'm trying to solve and the file I have at hand.

 

by: formadmirerPosted on 2008-01-03 at 22:46:52ID: 20580357

The counter reached 50,000 in just a matter of seconds. There are only 11,000 records so I'm assuming this must be where the problem lies (unless where I placed the counter would result in it counting something other than actual records).  


    DO WHILE AT( '",' , xmStringIn ) > 0                        && The Search Delimiter is ",
         xmAtPos  = AT( '",' , xmStringIn )                        && What is the location of the delimiter
         xmColumn = xmColumn + 1                                    && Which Array column do you want to place the value in?
WAIT WINDOW NOWAIT "Processing, please wait..." + ALLTRIM(STR(lnCount))  
         IF xmColumn <= 10                                                && Only for array columns <= 10
            xmArray [ xmPointer, xmColumn ] = LEFT( xmStringIn, xmAtPos )            && Store the data to the array
            xmNewString = SUBSTR( xmStringIn, xmAtPos + 2, LEN( xmStringIn ) )      && Remove the Stored Data from the remaining data
            xmStringIn = xmNewString                                                                  && Resultant data is no thw search string again
       
                  && This line will remove the heading and trailing " from the data
            && xmArray [ xmPointer, xmColumn ] = SUBSTR( xmArray [ xmPointer, xmColumn ], 2, LEN( xmArray [ xmPointer, xmColumn ] ) - 2 )
         ENDIF
  lnCount = lnCount + 1
      ENDDO

 

by: ramromPosted on 2008-01-04 at 01:49:50ID: 20580817

The loop cycles thru Search Delimiters. Therefore your counter counts Search Delimiters, not records.
If there is a record with more than 10 Search Delimiters xmColumn will eventually reach 11. After that xmStringIn will not be changed, and the loop will not terminate.

Please use the code snippet. It makes code reading easier.

 

by: formadmirerPosted on 2008-01-04 at 07:07:59ID: 20582727

Sorry, I've just recently noticed the snippet option but didn't know what it was for. I'll be sure to make use of it from now on.

 

by: formadmirerPosted on 2008-01-04 at 07:42:19ID: 20583060

I know I'm missing something here, as I'm sure this should be fairly simple. I've changed all the 10s throughout the file to 26 to match the number of fields in the file.

Now it begins to process the file, however it hangs on record #1754 every time. And it doesn't matter what the contents of line 1754 are, I have deleted it over and over, deleted it along with 5 preceding and succeeding lines, still it hangs on line 1754.

LOCAL   xmFileHandle, xmFileResult, xmStringIn, xmAtPos, xmPointer, xmArray, xmMax, xmColumn, xmLoop, ymLoop, xmNewString, lnCount
lnCount = 0
xmFileHandle = FOPEN( 'C:\sourcedata\feeds\list.csv' )   && Open the input file
 
xmMax = 26   && Maximum array size allowed - this is dymanic in this example
xmPointer = 0	&& Variable to hold the current array pointer
DECLARE xmArray [ xmMax, 26 ]  && Declare an array to place the column values read from the file
 
FOR xmLoop = 1 TO xmMax   && Always a good idea to initilize your variables and arrays
    FOR ymLoop = 1 TO 26
        xmArray [ xmLoop, ymLoop ] = ''
    NEXT 
NEXT 
 
DO WHILE FEOF ( xmFileHandle ) = .F.	  && Start at the top of the file all the way to the end
WAIT WINDOW NOWAIT "Processing, please wait..." + ALLTRIM(STR(lnCount))  
   xmStringIn = ALLTRIM( FGETS( xmFileHandle, 1024 ) )  && Read a line from the test file
 	
    IF LEFT( xmStringIn, 1 ) = '"' .AND. RIGHT( xmStringIn, 1 ) = '"'	&& Only Concerned about the lines with DATA
      xmStringIn = xmStringIn + ','										&& Add a comma to the end of the string so all data is formatted "DATA",
      xmPointer  = xmPointer + 1										&& Move to the next Array Position to place the data values
 
      IF xmPointer > xmMax						&& This will resize the array if it needs to be large
         xmMax = xmMax + 26						&& lets resize by an additional 26 elements
         DECLARE xmArray [ xmMax, 26 ]			&& Resize the Array to Make it Larger - it leaves the origional elements alone!
 
         FOR xmLoop = xmPointer TO xmMax		&& Always a good idea to initilize your variables and arrays
             FOR ymLoop = 1 TO 26
                 xmArray [ xmLoop, ymLoop ] = ''
             NEXT 
         NEXT 
      ENDIF 
 
      xmColumn   = 0									&& Reset the column counter for placing data read into the array
 
      DO WHILE AT( '",' , xmStringIn ) > 0				&& The Search Delimiter is ",
         xmAtPos  = AT( '",' , xmStringIn )				&& What is the location of the delimiter
         xmColumn = xmColumn + 1						&& Which Array column do you want to place the value in?
 
         IF xmColumn <= 26								&& Only for array columns <= 26
            xmArray [ xmPointer, xmColumn ] = LEFT( xmStringIn, xmAtPos )		&& Store the data to the array
            xmNewString = SUBSTR( xmStringIn, xmAtPos + 2, LEN( xmStringIn ) )	&& Remove the Stored Data from the remaining data
            xmStringIn = xmNewString											&& Resultant data is no thw search string again
       
			&& This line will remove the heading and trailing " from the data
            && xmArray [ xmPointer, xmColumn ] = SUBSTR( xmArray [ xmPointer, xmColumn ], 2, LEN( xmArray [ xmPointer, xmColumn ] ) - 2 )
         ENDIF 
 
      ENDDO 
 
   ENDIF 
  lnCount = lnCount + 1 
ENDDO 
 
IF xmPointer > 0
   DECLARE xmArray [ xmPointer, 26 ]	&& Last - Resize the Array down to the exact size of the data read so there is no wasted space.
ENDIF 
 
xmFileResult = FCLOSE( xmFileHandle )	&& Close the input file
 
Clear	
CREATE TABLE (pcMainData) + "csv_result" (rowcount N(4,0))

                                              
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
38:
39:
40:
41:
42:
43:
44:
45:
46:
47:
48:
49:
50:
51:
52:
53:
54:
55:
56:
57:
58:
59:
60:
61:
62:

Select allOpen in new window

 

by: formadmirerPosted on 2008-01-04 at 07:44:56ID: 20583089

trash the very bottom line, it's just junk that somehow got copied in when I pasted.

 

by: tmedleyPosted on 2008-01-04 at 08:41:58ID: 20583747

Here is another alternative using memo fields which is not as fast but very easy to follow coding wise

* Set the environment
* I want to overwite files w/o warnings
* I want the maximum size for memowidth so that new lines will only be generated by the CR LF at the end of record
SET SAFETY OFF
SET MEMOWIDTH TO 8192
 
* Set number of commas to look for
* Then create a holder for the orginal data, scrubbed data and errors
NumberOfCommas = 27
CREATE CURSOR datOriginal (datMemo M)
APPEND BLANK
CREATE CURSOR datScrubbed (datMemo M)
APPEND BLANK
CREATE CURSOR datErrors (datMemo M)
APPEND BLANK
 
* Pull the data from the CSV file into the memo field of the first cursor
* Then step through each line of the memo skipping the first 3 header lines by starting at line 4
* Copy good records to Second cursor, Bad records to the third cursor
* The first time a record is added to one of the output cursors do not add a CR LF
SELECT datOriginal 
APPEND MEMO datMemo FROM OriginalData.csv
FOR i = 4 TO MEMLINES(datMemo)
	myStringToCheck = MLINE(datMemo,i)
	IF OCCURS(',', myStringToCheck) = NumberOfCommas
		IF EMPTY(datScrubbed.datMemo)
			REPLACE datScrubbed.datMemo WITH myStringToCheck IN datScrubbed
		ELSE
			REPLACE datScrubbed.datMemo WITH datScrubbed.datMemo + CHR(13) + CHR(10) + myStringToCheck IN datScrubbed
		ENDIF
	ELSE
		IF EMPTY(datErrors.datMemo)
			REPLACE datErrors.datMemo WITH myStringToCheck IN datErrors
		ELSE
			REPLACE datErrors.datMemo WITH datErrors.datMemo + CHR(13) + CHR(10) + myStringToCheck IN datErrors
		ENDIF
	ENDIF
NEXT i
 
* Copy good records to a new csv file you can now use for import to continue
* Copy bad records to an error file
SELECT datScrubbed
COPY MEMO datMemo TO DataToImport.csv
SELECT datErrors
COPY MEMO datMemo TO ImportErrors.csv
                                              
1:
2:
3:
4:
5:
6:
7:
8:
9:
10:
11:
12:
13:
14:
15:
16:
17:
18:
19:
20:
21:
22:
23:
24:
25:
26:
27:
28:
29:
30:
31:
32:
33:
34:
35:
36:
37:
38:
39:
40:
41:
42:
43:
44:
45:

Select allOpen in new window

 

by: CharlesWardPosted on 2008-01-04 at 10:32:48ID: 20584855

Frmadmirer,

The answer as to if your PC can hold that much data depends on the amount of free stack and memory space available with the other applications running on your system.  Additionally, there are additional settings that can be configured in the config.fpw file.  

The example works fine on smaller files - probably less than 1,000 lines without issue.  Keep in mind, if it was me retrieving those values, I would probably save them to a database table as I was retrieving them - as I read a value I would write them out to a table, maybe a temporary table with the structure line:
      Row      Numeric 8
      Column  Numeric 8
      TextVal  Character 80

Then as I retrieved a value, I would create an entry into the table and place that informaiton into the table.  As I don't know all the types of data you are retrieveing, I am assuming strictly text.  By doing that you would not utilize an array.

****

My guess is that what is happening to you is that as the array grows larger, each time it is resized takes cpu time - you may want to use the following example instead of the one in the example that I gave you - it will probably make it considerable faster if you have that many records you are reading:

   IF xmPointer > xmMax
       xmMax = xmMax + 1000      &&resize by an additional 1000 elements
       DECLARE xmArray [ xmMax, 26 ]
   ENDIF

At a resize adjustment of every 26 as in the previous example, at 17,000+ records the program was resizing 653+ times for multiple elements and then reinitilizing those new entries each time, probably swapping to hard drive at times.  By using the above line change and by dropping the initilization of the additional 1000 new elements, the software will only resize about 16 times for the same 1000 rows.

As I said, the code example was basically designed to show you how to traverse the text file.  Another thought for you is to add the following line everytime it resizes so you can monitor for testing purposes:

Messagebox ( 'Currently Resizing To: ' + AllTrim ( Str ( xmMax ) ) )

This would appear of course over 17 times but it would allow you to monitor the progress of the file until it had completed.  

Charles

 

by: formadmirerPosted on 2008-01-04 at 18:00:49ID: 20587458

Thank you so much. I have been pulled off of this for today - working with some php stuff right now - but will give the new info a run later this evening and see how things work out.

 I really do appreciate the help.

 

by: formadmirerPosted on 2008-01-09 at 18:45:34ID: 20624582

CharlesWard,
Thanks again for the suggestions. I tried changing to 1000, but it still hangs at exactly 1074 like before. I'm not sure what to try next.

temdley,
I then gave your suggestion a whirl. All looked good, especially since I had been unable to get past 1074 records before, but then at around 2200 records the loop started slowing down dramatically, and at around 2800 records I got:
File c:\.....  temp is too large.
at which point I had to cancel. Tried it again, same results.

Any suggestions?

 

by: formadmirerPosted on 2008-01-10 at 10:00:32ID: 20629371

tmedley,
Have worked with yours some more, placing the data into tables instead of cursors. I can now process the entire file, however I wind up with around 2550 error lines (there are actually about 4) in the errors table and 1713 in the scrubbed table. The other 7500 or so lines are not in either.

I am not very skilled in FP, but I was wondering if the problem might be related to the fact that the last field one each line does not end with the "," that is used to separate all the other fields, and that perhaps when detecting/counting the delimiters this should be taken into account?

"record1-field1","record1-field2","record1-field3"
"record2-field1","record2-field2","record2-field3"

Since some fields contain commas withing the data, I changed from looking just for commas to looking for "," .


CharlesWard,
I am still working with yours as well, as I have not yet been able to get either of the solutions presented to work in my particular situation/with my data.

 

by: formadmirerPosted on 2008-01-14 at 19:12:26ID: 20659748

Sadly I have worked for more than a week with many, many incarnations of the suggestions here, but still have nothing that works and no way to verify the file.

Because this must be completed as it is delaying other work, I am turning this over to a contract programmer. I do, as always, appreciate the efforts and help that were provided.

 

by: tmedleyPosted on 2008-01-14 at 20:41:07ID: 20660180

Sorry you need to do that but fully understand. This is generally something VFP is really good at but when you are trying to import data and the values in fields sometimes contain the delimiter itself it always gets messy. If you want to email me the raw file that is stopping at 1074 with 1 method and generating way to many reported errors with the other method I would be glad to give it a look and try to post a snippet back here that works or at least tell you what is generating the bad behavior.
tmedley@goibs.com

 

by: CarlWarnerPosted on 2008-01-14 at 20:53:07ID: 20660227

Using email to resolve a problem
http://www.experts-exchange.com/help.jsp#hi15

 

by: formadmirerPosted on 2008-08-01 at 17:06:02ID: 22142639

Trying this again...

Some of the suggestions here were very helpful in my previous attempt at rectifying this problem, and I felt like we were on the verge of victory. Sadly though to date there is still no working solution that will allow me to verify the integrity of the csv file prior to it simply failing.

I am once again working with the script that this error affects, so I am revisiting this problem in hopes of finding a suitable solution.

Any help greatly appreciated!

 

by: formadmirerPosted on 2008-08-05 at 12:19:49ID: 31483694

I rated 'A', 'yes', 'yes', and 'yes' because, even though in the end I still do not have a solution that works, I don't feel that it is due to your lack of trying to help or with the quality of the answers you provided, but rather a unique situation caused by an irregular csv source file.

20120131-EE-VQP-002

3 Ways to Join

30-Day Free Trial

The Experts

98% positive feedback on 31,087 answers since March 2000. angeliii is a Microsoft Most Valuable Professional for his work with MS SQL Server & Develoment.

He has also proven his knowledge of Visual Basic Programming, PHP Scripting and Oracle Databases.

The Experts

97% positive feedback on 10,752 answers since July 2000. lrmoore has more than 18 years experience in the networking industry.

The six-time Mircosoft MVPs specialties include firewalls, virtual private networking, and network management.

Testimonials

"...and excellent source for support... Kind of like having your very own IT dept." Electriciansnet

Testimonials

"I was apprehensive at signing up at first. However... it has already made my life as an IT administrator much easier." JaCrews

Testimonials

"WOW! You guys have great, active, and knowledgeable people on here." moore50

Business Clients

Business Clients

In the Press

"If you’ve got a question... Experts Exchange can supply an answer.”

In the Press

"...an invaluable aid for both IT professionals and those who require tech support."

In the Press

"where IT professionals provide quick answers on just about any topic"

Business Account Plans

Loading Advertisement...