Link to home
Start Free TrialLog in
Avatar of sfun28
sfun28

asked on

garbage collection in loop

Folks, lets say I have a loop as follows:

foreach (Object myObject in myObjectCollection)
{
    Object veryLargeObject = GetVeryLargeObject()
     // do some stuff
    // no longer need veryLargeObject    
}

GetVeryLargeObject returns an object that takes up a lot of memory.  Over time, this loop causes an OutOfMemoryException.  Is there a way to explicitly garbage collect veryLargeObject? I tried doing the following, but it doesn't seem to work:

veryLargeObject = null;
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
GC.WaitForPendingFinalizers();

is this because i'm iterating the in the scope of a method?  Are all the created veryLargeObject objects candidates for garbage collection AFTER I exit the method?  What's the best practice here?
Avatar of Fernando Soto
Fernando Soto
Flag of United States of America image

Hi sfun28;

You can try this to see if it helps.

foreach (Object myObject in myObjectCollection)
{
    Object veryLargeObject = GetVeryLargeObject()
     // do some stuff
    // no longer need veryLargeObject    
}

GC.Collect(GC.MaxGeneration, GC.Forced);

Fernando
hi,
you might be kowing what are heavy objects in you class so, firstly do a bit of cleanup from your end by overriding dispose method in veryLargeObject class. second, try with { using() }  block then you can call GC for extra cleanup if required

foreach (Object myObject in myObjectCollection)
{
   using ( Object veryLargeObject = GetVeryLargeObject())
  {
     // do some stuff
    // no longer need veryLargeObject    
  }
}
That was a problem in .net 1.1 because of memory fragmentation.
In your case the approach seems Ok, because the GC.Collect in the loop will clean up the memory.
Is the very large object having a reference to any of your main instances? If so, this reference is keeping the large object alive. I will use a memory profiler to determine what is happening inside your code.
Are you a debug or release build? In a debug build, the GC won't work. The reference is kept alive by something (dunno what...) even though you've set your local var. to null and called the GC.

So.. you might want to put the contents of the foreach {...} iteration code into a separate method, and call that i.e.
foreach (Object myObject in myObjectCollection)
{
  DoStuffOn( myObject )
}

private void DoStuffOn( object x )
{
    Object veryLargeObject = GetVeryLargeObject()
     // do some stuff
    // no longer need veryLargeObject    
}


.. then you might not even need the GC calls


You could also put a finaliser method on your large object to prove that it's getting GC'd - that might help testing to prove it's getting cleaned up.


Avatar of sfun28
sfun28

ASKER

This is driving me crazy...My large object is just a very large 2d array.  I don't construct this array - its returned to me when I call some Excel Interop code.  Basically excel returns a grid as a 2d array of values (strings, doubles, etc.).  


Fernando - Gc.Collect() is the same as GC.Collect(GC.MaxGeneration, GC.Forced)

akhileshcoer - using/dispose is not relevant in this issue because its an array

mas_oz2003 - no references to any main objects.  the array just contains strings/doubles/etc.
I think you're right about fragmentation.  Just to test fragmentation I held references to 100 of these objects in the loop (by storing references in a List<object>).  After the 100th item I cleared the List and forced GC to collect.  I figured that a continuous block of 100 large items freed at the same time would solve any fragmentation issue, but it didn't.

andrewjb:- release build.  I could easily try to put the calls in a separate method.  so you think this is a scope issue?  technically speaking, when there are no references to an object then shouldn't the GC consider freeing it?

I have a feeling that something else is happening here.  Perhaps Excel is pinning the memory?  I don't really understand pinning, but I read somewhere about that.


Not an issue in release builds. In debug builds, even though YOU don't have references any more, something does (?the debugger I guess?) In a release build, so long as you're dropped all references (setting local vars to null etc) then I do believe that GC works OK.

Is it literally an array being returned, just with simple types in? Not wrappers around the array or anything. Just an object[][] or something?

In which case, I'd guess that the Excel calls are holding stuff open. Are you closing/disposing any of the excel references that you're supposed to?

We probably need a working-but-simple sample app, and a spreadsheet to go with it, that demonstrate the problem...

Avatar of sfun28

ASKER

yup.  literally an array - object[][] filled with simple types.  I'm meticulous about releasing COM objects and disposing.   As it is, this particular case only creates 2 COM objects and I've triple-checked that I'm disposing of them.  Good idea about working up a simple sample - all I'm really doing is looping through a few hundred spreadsheets and copying the values from each spreadsheet into one master spreadsheet that I keep open.  I've even tried quitting and reloading the master spreadsheet periodically through the life of  the loop, but it doesn't seem to do anything.  Unless you have any other idea, I might work-up a simple app and see if I can repro this issue.
Does sound like it could well be an Excel 'feature' :-) A working simple sample will probably help.

You could get a copy of ProcExplorer from sysinternals - it has some info one one of the process tabs about .Net memory usage in an app... that may or may not give you some clues.

I quite like AutomatedQA's memory profiler. There'll be a demo version you could download and use to try looking at this - one of the profilers is a live memory profiler that tells you what objects are alive. Though it'sa  non-trivial learning curve to get the best out of the tool.

A.
Can you post your code in which you get the "large object" from Excel?

I am interested to see how you are interacting with Excel, how you create, how you take the values to your big object. You need to use Marshal.ReleaseComObject(ExcelCOMObj) for every com object on top of the calls of GC. As a best practice, I prefer to create just one Excel application object for the lifetime of the application (or after finishing reading all excel files) and just used every time you need to read an Excel.

Note: The application object is a COM interface STA thread model, which is very difficult to interact it and it will not die away easily( try open excel with a workbook, do some interaction, close Excel and then look at process explorer). If I were you I will try to have the excel information as CSV values and read it with the StreamReader. (this will open the possibility to use threads to improve performance in loading the objects form files)
Avatar of sfun28

ASKER

I hear ya!  All of the best practices you mention are things I incorporate.  Having been deep in Excel interop for the past 2 years, I've learned that there are lots of tricks to working with Excel.

Here's the method I"m using.  Note that both "copy" and "paste" are released via Marshal.FinalReleaseComObject() after this method is called.  Fairly straightforward stuff - copy from one range to another.

I profiled the memory and found that there isn't a leak (private bytes are low, large object heap is remains fairly constant).

Here's the strange part (just discovered) - the OutOfMemoryException is being thrown on the set_Value line!!!  So its not that I'm retrieving a large pasteValue and then the system pukes.  This happens when I'm SENDING the large dataset to Excel.  And I've got plenty of free ram.  How is that even possible?  I wouldn't be surprised if I just discovered another Excel "gotcha".  The excel worksheet that I'm trying to paste into is about 1.4 GB in memory.  I thought Excel 2007 was limited by the memory on the box.  Just as a test, I saved the worksheet after each iteration of my loop.  So when I get the OutOfMemoryException, I can look at the Excel worksheet pre-exception.  Then I asked the system to resume pasting where it left off.  Sure enough, after 1 or 2 worksheets, I get an OutOfMemoryException.  So somehow Excel has grown so large that subsequent set_Values don't work.  thoughts?

 private void CopyPaste( Range copy , Range paste )
        {
            Object pasteValue = copy.get_Value( XlRangeValueDataType.xlRangeValueDefault )
            paste.set_Value( XlRangeValueDataType.xlRangeValueDefault , pasteValue );
        }
ASKER CERTIFIED SOLUTION
Avatar of Miguel Oz
Miguel Oz
Flag of Australia image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of sfun28

ASKER

Good question.  There is a good reason for not using copy.Copy(paste) - I had to write the test code to remind myself why I stopped using that method.  copy.Copy(paste) preserves formulas, which leads to data corruption when multiple spreadsheets are being normalized into a single spreadsheet.  Of course, I could always Copy, then PasteSpecial, but that presents 2 problems.  First is the use of the clipboard.  I avoid that at all costs - the windows clipboard is unreliable and data corruption could occur if a user uses the clipboard while a program is also using it.  Also, PasteSpecial doesn't preserve errors.  So #REF!, etc. get converted into their underlying integer values.  I've found that the best thing to do is to copy the value of the range into memory, iterate through the values converting all integers to their respective error wrappers, then pushing that object to the paste worksheet.

Are you sure about Windows only having 2GB available?  I'm on the x64 box with an x64 build - I believe I can address way more memory space than 2GB.  I don't quite understand your 4GB total comment nor your comment about AppDomain.  Excel is out-of-process.  If private bytes are not increasing, then it means that .net itself isn't growing in memory, right?  What part does Excel play in that memory space?
Ok, you did not mention before that you are using a x64 box. The answer is "No" because even though your program can address more memory:
- You are stuck with Excel 32 bit limitation every time you make an Excel call. (but Excel 2010 will support 64 bits).
- CopyPaste method is creating an object (pasteValue) that does not go away until GC runs.
- Excel automation is based on STA COM.
Check:
http://social.msdn.microsoft.com/forums/en-US/sqlintegrationservices/thread/213b90cb-5aea-41be-b935-fa995b6322ca/
http://blogs.msdn.com/excel/archive/2009/08/28/excel-2010-now-with-more-bits.aspx
Avatar of sfun28

ASKER

I'm still confused about what "You are stuck with Excel 32 bit limitation every time you make an Excel call" means?  What is the limitation and do you mean on the Excel process side or the .net process side?  The Excel website doesn't specify how large the Excel process can get in memory
http://office.microsoft.com/en-us/excel/HP100738491033.aspx
is it 2GB?  If so, then it makes perfect sense that my Excel file is blowing up.  Or is that a limitation of STA COM in general?  
Avatar of sfun28

ASKER

this site says that Excel2007 is limited to 2GB:
http://www.decisionmodels.com/memlimitsc.htm

I guess that's the answer.  my master spreadsheet is getting too big