Link to home
Start Free TrialLog in
Avatar of IainMacb
IainMacbFlag for United Kingdom of Great Britain and Northern Ireland

asked on

Not enough memory to complete this operation

I am trying to import some data from a series of XML files to a FoxPro database.

The XML files are big (typically 200MB), and have most of the data held in attributes rather than 'proper' XML elements. I've had to write quite a lengthy piece of code to interpret all of this.

When I try running this code on a single input XML file, it works OK and I'm happy with the data conversion.

But when I try running this on a production series of input XML files (I've got 59 of them, representing six weeks' data), the routine gradually slows down and eventually crashes with the message

    There is not enough memory to complete this operation

I have tried closing everything down except for FoxPro - and still get the same message.

I have tried timing how long it takes to process each successive file:

    File    Size (MB)    Time (secs)    Speed (MB/s)
    1        204             309                  0.66
    2        122             320                 0.38
    3        244             924                  0.26
    4        272             1618                0.17
    5        233             1721               0.14
    6        271             2466               0.11
    7        243             crashes

As I say, I've got a total of 59 files to process. Not good!

I've tried Googling the error message. Most of the results are about displaying big files or lots and lots of menus. But they do point me towards the garbage collection function SYS(1104).

I tried running the debugger at the end of each file's processing, and inspecting the contents of SYS(1016), the user object memory in use.

That produced some interesting results. I would typically get a displayed value of about 4M, which would then drop back to 2M. Which felt as if FoxPro was doing its behind-the-scenes automatic garbage collection. And when that was happening, the timings for the processing changed to

    File    Size (MB)    Time (secs)    Speed (MB/s)
    1        204             286                  0.71
    2        122             143                  0.85
    3        244             405                  0.60
    4        272             494                  0.55
    5        233             384                  0.61
    6        271             507                  0.53

Which seemed bearable.

So it looked as if there was an issue about garbage collection. I've looked at where I might be wasting memory. I have genuinely closed the input file with FCLOSE(). I've used the .RELEASE() and .DESTROY() methods to get rid of the most obvious objects in the processing routine - and they're defined as LOCALs, so they ought to be released when the routine RETURNs.

But what I suspect is that FoxPro doesn't have an idle state in all of this processing when it can engage in its garbage collection. The debug version may yet turn out to be the fastest!

I've tried forcing garbage collection by calling SYS(1104), but that doesn't do any good.

I've tried forcing an idle state by calling INKEY(60), but that doesn't do any good either.

I'm running out of ideas - can anybody point me in the right direction?
Avatar of Pavel Celba
Pavel Celba
Flag of Czechia image

I don't know your code so I cannot say what's wrong with it. Even when you use local variables for objects instantiation they can remain in memory if there is some pending reference. etc. etc.

I hope we are talking about VFP 9 SP2.

Not enough memory can also rise when processing long memo fields in memory and it appears even when there is memory enough. The trick is to find the most problematic command in your code and use some replacement for it.  Did you check the Task Manager to inspect the memory consumed by VFP process?

The run under debugger points to some garbage but it is not always possible to clean it up.

You may try following cleanup after each processed file:
RELEASE ALL
CLOSE ALL
CLEAR ALL
= SYS(1104)
DOEVENTS FORCE
WAIT WINDOW "Cleaning the environment up... " TIME 5    &&  5 seconds is enough for any garbage collection

If it does not help then the only work around (beside better code) is to process each XML file in a separate VFP session which you may invoke by RUN command or by ShellExecute Win32 API function call.
The phenomenon is not unknown. I did data migrations, which work best in short portions. I never had to modularize migrations, but when running them in the IDE coming back next morning, the VFP IDE session was unusable, until restarting it. So finally that is a solution, write an EXE importing only one file at a time and RUN or Shellexecute that EXE for each XML file.

http://www.foxpert.com/docs/howfoxproworks.en.htm has something about garbage collection, too. Also see the paragraph about freeing memory with Sys(3050). So you might try that:

lcMemory = Sys(3050,1)
Sys(3050,1,1)
Sys(3050,1,Val(lcMemory))

Open in new window

Good Luck!

Bye, Olaf.
Avatar of IainMacb

ASKER

OK, some answers.

Yes, VFP 9 SP2.

It's running on a Dell laptop under Windows 8.1. The Dell has 8GB of memory.

I've altered the code in the light of your comments. I've got a lot of other things open while I process the XML files, so I'm not keen on closing all the files and releasing all of the variables. But I have added DOEVENTS FORCE and the WAIT command.

But running this doesn't change the behaviour - it still slows down with each XML file.

Looking at the TaskManager while the routine in question is running, the memory use starts by varying between 335MB and 535MB, bouncing up and down all of the time.

But ... that value gets bigger as we get through the files. Later on it's bouncing between 650 and 960 MB.

After it has crashed, the value is 353.9MB.

After Olaf had added another suggestion, I tried adding the forcing of the foreground memory - but it has no effect. Still slows down. Memory usage doesn't look that different.

I'm minded to go with the debugger approach as a temporary fix.
DOEVENTS and WAIT WINDOW are less important commands in the environment cleanup.
The growing memory consumption points to problems in releasing objects and other resources in your routine and if you cannot find the place where it happens in your code (it can also happen in VFP itself) and CLEAR ALL does not fit to your template then the standalone EXE for each XML file is the right solution.
I agree to Pavel, if you can't get to the bottom of the problem, then making the XML impart an external module is a solution.

You might have problems with deallocating MSXML components, although you said you have written your own code to extract data, as it was in attributes rather than in elements.

Perhaps take a look at ALINES() and STREXTRACT for parsing the xml.

Bye, Olaf.
Yes, a lot of use of STREXTRACT().

Not sure that I see how ALINES() would be useful in these circumstances.
ASKER CERTIFIED SOLUTION
Avatar of Olaf Doschke
Olaf Doschke
Flag of Germany image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks for that. I can see the logic of ALINES().

Fortunately, the XML that I'm having to deal with is well-organised as far as each element is concerned. They're all self-contained within one line of text and I can read the file quite painlessly by doing FOPEN and then reading each line with FGETS().

Your comment about the "immutability" of strings being the source of my problem is probably right.

I had been reading each element's attributes into an object and returning that from a .READ_XML_ATTRIBUTES() method.

But in the light of the comments from yourself and from Pcelba, I have revisited this bit of code. It's now using STREXTRACT() to do most of the work. This has speeded everything up (now 2-2.5 times faster). At present I'm still running it with a DEBUG at the end of each file import, so can't say if this has also resolved the memory issue.
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I think that the key thing here is "when VFP waits for user input".

In the big production run, with 59 input files, it doesn't get a chance to do garbage collection because there isn't any user input.

In the DEBUG mode it stops and waits for something from me. The 'Watch' window is reporting the value of SYS(1016) - and I can see this reduce quite markedly.
In somem of your previous posts is stated "INKEY(60) did not help" which I suppose as sufficient (or even overesized) for garbage collection but who knows...
I was thinking about a scenario you read in the XML into a string variable and then cut roughly, eg extracting <table1>....</table2>, then <table2>...</table2> etc., then go down to record and field level, since you read in lines with fgets your memory consumption should be smaller.
Since you accelerated by factor 2 you might add in wait states even within reading in a single xml file every 1000 lines or so.

You might also try:
*****************************************************************************************
* Function....:	 ReduceMemory()
* Author......:  Bernard Bout
* Date........:  05/12/2007 3:03:15 PM
* Returns.....:
* Parameters..:
* Notes.......:  reduces memory usage for vfp
*****************************************************************************************
Function ReduceMemory()

Declare Integer SetProcessWorkingSetSize In kernel32 As SetProcessWorkingSetSize  ;
	Integer hProcess , ;
	Integer dwMinimumWorkingSetSize , ;
	Integer dwMaximumWorkingSetSize
Declare Integer GetCurrentProcess In kernel32 As GetCurrentProcess
nProc = GetCurrentProcess()
bb = SetProcessWorkingSetSize(nProc,-1,-1)

Open in new window

Actually that's what I think SYS(3050) uses internally, nevertheless all is worth a try in your situation.

Bye, Olaf.

Edit: Another function to monitor memory consumption of a vfp9.exe process:
Function displaymem()
   Local loWMI, loVFPProcesses
   loWMI = Getobject("winmgmts:")
   loVFPProcesses = loWMI.ExecQuery("Select WorkingSetSize From win32_process Where Name = 'vfp9.exe'")
   ? loVFPProcesses.itemindex(0).Properties_.Item("WorkingSetSize").Value
Endfunc

Open in new window

(Warning: This has no error handling about whether a VFP9 IDE runs or not.)
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
The following is not a comprehensive benchmark, but indeed reading in a file and looping an array once vs. scanning a cursor once seems to speak for using arrays:

lcText = Replicate("<rec></rec>"+Chr(13)+Chr(10),1000000)
Set Safety Off
StrToFile(lcText,"H:\test\long.txt",0)
StrToFile(lcText,"H:\test\long.txt",1)
StrToFile(lcText,"H:\test\long.txt",1)
Set Safety On

t0 = Seconds()
lcFile = FileToStr("H:\test\long.txt")
For lnCount = 1 To ALines(laLines,lcFile)
   lcLine = laLines[lnCount]
EndFor
t1 = Seconds()
? "Array:",t1-t0

Create Cursor curLines (cLine C(254))
Append From ("H:\test\long.txt") TYPE DELIMITED 
Scan
   lcLine = curLines.cLine
EndScan 
t2 = Seconds()
? "Cursor",t2-t1

Open in new window


For me this gives 1.7 seconds for the array vs 1.9 seconds for the cursor, times vary a but, but arrays always win.

The big advantage of cursors are their versatility including being able to index them, to query them, to filter them, to relate them to other data, etc. But indeed an array is a collection of memory variables you can address as fast as you can address memory, it does not have the overhead of a table structure, therefore it's preferable for sequential processing.

Bye, Olaf.

PS:
t0 = Seconds()
lnH = fopen("H:\test\long.txt",0)
Do while not Feof(lnH)
   lcLine = Fgets(lnH)
EndDo
Fclose(lnH)
t1 = Seconds()
? "fgets:",t1-t0

Open in new window


This takes even much longer, but that's the nature of reading a file line by line vs. as a whole. It's costy to load just a small amount of bytes repeatedly. You can accelerate that by reading larger blocks with FREAD and split them, as reading in a whole 500MB surely also isn't very performance friendly.

You have to find the balance of file access vs. memory usage.  Overall this difference might just add a few seconds to an overall process mainly consuming time with parsing the XML and processing the data into further cursors or DBFs anyway, so the difference of this sequential read and scan/loop is neglectable.

Bye, Olaf.
OK, I've now managed to import the 59 files with their 11.9GB of data.

Importing was complicated by the fact that five of the files were corrupt. Always difficult to decide what you should do with these - throw the whole thing away or try to 'lock on' to the XML structure further down the file? Ended up finding that the corrupt ones were duplicated by others - but that no-one had passed the message on to me.

Meanwhile, various suggestion had been made about speeding up the reading of the files. So I did a small test program, just to read the first of the files (fairly typical size - 209MB). This produced the following:

(1) reading the file one line at a time using FGETS() = 22.0 seconds
(2) reading the whole file using FILETOSTR() = I gave up after 10 minutes!
(3) reading the file in batches using FREAD() and ALINES() = 2.2 seconds
(4) reading the file into a cursor = 7.6 seconds

I did some further experimentation with (3), trying the effect of different file sizes. For my particular file and my particular memory, I get

    10K byte batches = 3.16 sec
    100K batches = 2.30s
    1M batches = 2.48s
    10M batches = 3.05s

The 2.2s time was achieved with 80K batches.

In the end I went for the read-to-cursor approach. Not as fast as read-in-batches, but it meant that less of the existing code had to be altered.

Processing the data

But ... that's obviously a small portion of the overall total time. Most of the time (and most of the memory wastage) must be happening during the sorting-out of the data. The input file contains a big hierarchy of XML elements; I'm trying to transcribe this into a three-level hierarchy in 'my' DBF files.

I had started with a very general routine that would read an XML start tag and return an object containing names and values for the attributes that it encountered. That's attached as the READ_XML_ATTRIBUTES() file. Renamed as .TXT because EE doesn't like .PRG extensions. Please don't spend any time suggesting improvements to this - I've only included it so as to show you where I started from. It's probably the main source of the memory problems.

I don't need all of the attributes that are available, so I've changed the logic so that I look for the specific ones that I want, as in the second of the attached files, READ_XML_ATTRIBUTE(). As you can see, this is a lot simpler.

All of this meant that the processing didn't slow down and eventually run out of memory. What's more, the basic processing of each file was quicker.

For all of which many thanks. You've all contributed suggestions which have been included in the final (sorry - current) version!
read-xml-attributes.txt
read-xml-attribute.txt
Both your prgs look ok, the first general routine does even cover the case of attribute values not delimited by quotes.

Indeed you exceed the string limit of 16MB, if reading in such a full file with FILETOSTR. It's not a hard limit, you can try with a 32MB or 100MB file, but the limit documented in system capacities help chapter about string variables is not only there for caution, I think.

Your test and the final decision are fine, I think, as the overall time surely is not very dependant on just the file read time. My test was more about the processing/scanning of an array vs a cursor, but overall that difference also isn't that much and indeed code processing lines is easier to write when reading in a text file via Append and using scan..endscan on that.

>It's probably the main source of the memory problems.
That suspicion on your first prg might be true, as you may have used up memory by generating many record objects and passing them as return value. It's not an illegal or dirty option. The loRecord object you return is kept alive in the caller code, while the loRecord variable is released, in theory. But VFP can't release the loRecord variable until it is received by the caller. That might lead to loRecord not really being released as other local variables of the function before really returning and cause the memory consumption.

You might try not genrating the object in the function, but passing in a custom object from outside to add the properties to, and I'd recommend CreateObject("EMPTY") instead of CreateObject("Custom"). EMPTY has no Addproperty method, but you can use Addproperty(object,"property",value).

Bye, Olaf.
Olaf

Thanks for that, I had indeed missed the 16MB limit in the system capacities. Though I'm surprised it didn't send back an error message.
Good work Iain!

FILETOSTR() result is interesting and I have to test it. I would not expect such a bad result. Of course, 200 MB string in a variable is not good option BUT VFP 9 allows it. Processing of such string is restricted to subset of VFP functions only and "Not enough memory" is your night mare obviously...

And I have to agree to Olaf - to return the object is potentially dangerous. You could create the (Empty) object in the calling routine and pass it as a parameter. Or you could create it as a private variable in the calling routine and then you don't need to pass it at all (which is not so obvious today but it should work). BUT if it works now ...

Updated
For the benefit of anyone who comes and consults this solution in the future, I'd suggest that you read the whole of the conversation rather than focussing on the three 'accepted answers' - I've picked those out as being the ones that pointed me in certain directions. They're as follows (in no particular order):

Pcelba's comment about strings being immutable. That got me to focus on minimising the volume of strings that were building up.

CaptainCyril's suggestion about importing into a cursor. That got used in the solution that I adopted.

Olaf_Doschke's suggestion about parsing the content using ALINES(). That would seem to be faster. It didn't get used in the solution because it was slightly more complicated than the cursor approach.

And my own comments have got some useful information about timings!