garbage collection in loop

sfun28
sfun28 used Ask the Experts™
on
Folks, lets say I have a loop as follows:

foreach (Object myObject in myObjectCollection)
{
    Object veryLargeObject = GetVeryLargeObject()
     // do some stuff
    // no longer need veryLargeObject    
}

GetVeryLargeObject returns an object that takes up a lot of memory.  Over time, this loop causes an OutOfMemoryException.  Is there a way to explicitly garbage collect veryLargeObject? I tried doing the following, but it doesn't seem to work:

veryLargeObject = null;
GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();
GC.WaitForPendingFinalizers();

is this because i'm iterating the in the scope of a method?  Are all the created veryLargeObject objects candidates for garbage collection AFTER I exit the method?  What's the best practice here?
Comment
Watch Question

Do more with

Expert Office
EXPERT OFFICE® is a registered trademark of EXPERTS EXCHANGE®
Fernando SotoRetired
Distinguished Expert 2017

Commented:
Hi sfun28;

You can try this to see if it helps.

foreach (Object myObject in myObjectCollection)
{
    Object veryLargeObject = GetVeryLargeObject()
     // do some stuff
    // no longer need veryLargeObject    
}

GC.Collect(GC.MaxGeneration, GC.Forced);

Fernando
hi,
you might be kowing what are heavy objects in you class so, firstly do a bit of cleanup from your end by overriding dispose method in veryLargeObject class. second, try with { using() }  block then you can call GC for extra cleanup if required

foreach (Object myObject in myObjectCollection)
{
   using ( Object veryLargeObject = GetVeryLargeObject())
  {
     // do some stuff
    // no longer need veryLargeObject    
  }
}
Miguel OzSenior Software Engineer
Top Expert 2009

Commented:
That was a problem in .net 1.1 because of memory fragmentation.
In your case the approach seems Ok, because the GC.Collect in the loop will clean up the memory.
Is the very large object having a reference to any of your main instances? If so, this reference is keeping the large object alive. I will use a memory profiler to determine what is happening inside your code.
Bootstrap 4: Exploring New Features

Learn how to use and navigate the new features included in Bootstrap 4, the most popular HTML, CSS, and JavaScript framework for developing responsive, mobile-first websites.

Commented:
Are you a debug or release build? In a debug build, the GC won't work. The reference is kept alive by something (dunno what...) even though you've set your local var. to null and called the GC.

So.. you might want to put the contents of the foreach {...} iteration code into a separate method, and call that i.e.
foreach (Object myObject in myObjectCollection)
{
  DoStuffOn( myObject )
}

private void DoStuffOn( object x )
{
    Object veryLargeObject = GetVeryLargeObject()
     // do some stuff
    // no longer need veryLargeObject    
}


.. then you might not even need the GC calls


You could also put a finaliser method on your large object to prove that it's getting GC'd - that might help testing to prove it's getting cleaned up.


Author

Commented:
This is driving me crazy...My large object is just a very large 2d array.  I don't construct this array - its returned to me when I call some Excel Interop code.  Basically excel returns a grid as a 2d array of values (strings, doubles, etc.).  


Fernando - Gc.Collect() is the same as GC.Collect(GC.MaxGeneration, GC.Forced)

akhileshcoer - using/dispose is not relevant in this issue because its an array

mas_oz2003 - no references to any main objects.  the array just contains strings/doubles/etc.
I think you're right about fragmentation.  Just to test fragmentation I held references to 100 of these objects in the loop (by storing references in a List<object>).  After the 100th item I cleared the List and forced GC to collect.  I figured that a continuous block of 100 large items freed at the same time would solve any fragmentation issue, but it didn't.

andrewjb:- release build.  I could easily try to put the calls in a separate method.  so you think this is a scope issue?  technically speaking, when there are no references to an object then shouldn't the GC consider freeing it?

I have a feeling that something else is happening here.  Perhaps Excel is pinning the memory?  I don't really understand pinning, but I read somewhere about that.


Commented:
Not an issue in release builds. In debug builds, even though YOU don't have references any more, something does (?the debugger I guess?) In a release build, so long as you're dropped all references (setting local vars to null etc) then I do believe that GC works OK.

Is it literally an array being returned, just with simple types in? Not wrappers around the array or anything. Just an object[][] or something?

In which case, I'd guess that the Excel calls are holding stuff open. Are you closing/disposing any of the excel references that you're supposed to?

We probably need a working-but-simple sample app, and a spreadsheet to go with it, that demonstrate the problem...

Author

Commented:
yup.  literally an array - object[][] filled with simple types.  I'm meticulous about releasing COM objects and disposing.   As it is, this particular case only creates 2 COM objects and I've triple-checked that I'm disposing of them.  Good idea about working up a simple sample - all I'm really doing is looping through a few hundred spreadsheets and copying the values from each spreadsheet into one master spreadsheet that I keep open.  I've even tried quitting and reloading the master spreadsheet periodically through the life of  the loop, but it doesn't seem to do anything.  Unless you have any other idea, I might work-up a simple app and see if I can repro this issue.

Commented:
Does sound like it could well be an Excel 'feature' :-) A working simple sample will probably help.

You could get a copy of ProcExplorer from sysinternals - it has some info one one of the process tabs about .Net memory usage in an app... that may or may not give you some clues.

I quite like AutomatedQA's memory profiler. There'll be a demo version you could download and use to try looking at this - one of the profilers is a live memory profiler that tells you what objects are alive. Though it'sa  non-trivial learning curve to get the best out of the tool.

A.
Miguel OzSenior Software Engineer
Top Expert 2009

Commented:
Can you post your code in which you get the "large object" from Excel?

I am interested to see how you are interacting with Excel, how you create, how you take the values to your big object. You need to use Marshal.ReleaseComObject(ExcelCOMObj) for every com object on top of the calls of GC. As a best practice, I prefer to create just one Excel application object for the lifetime of the application (or after finishing reading all excel files) and just used every time you need to read an Excel.

Note: The application object is a COM interface STA thread model, which is very difficult to interact it and it will not die away easily( try open excel with a workbook, do some interaction, close Excel and then look at process explorer). If I were you I will try to have the excel information as CSV values and read it with the StreamReader. (this will open the possibility to use threads to improve performance in loading the objects form files)

Author

Commented:
I hear ya!  All of the best practices you mention are things I incorporate.  Having been deep in Excel interop for the past 2 years, I've learned that there are lots of tricks to working with Excel.

Here's the method I"m using.  Note that both "copy" and "paste" are released via Marshal.FinalReleaseComObject() after this method is called.  Fairly straightforward stuff - copy from one range to another.

I profiled the memory and found that there isn't a leak (private bytes are low, large object heap is remains fairly constant).

Here's the strange part (just discovered) - the OutOfMemoryException is being thrown on the set_Value line!!!  So its not that I'm retrieving a large pasteValue and then the system pukes.  This happens when I'm SENDING the large dataset to Excel.  And I've got plenty of free ram.  How is that even possible?  I wouldn't be surprised if I just discovered another Excel "gotcha".  The excel worksheet that I'm trying to paste into is about 1.4 GB in memory.  I thought Excel 2007 was limited by the memory on the box.  Just as a test, I saved the worksheet after each iteration of my loop.  So when I get the OutOfMemoryException, I can look at the Excel worksheet pre-exception.  Then I asked the system to resume pasting where it left off.  Sure enough, after 1 or 2 worksheets, I get an OutOfMemoryException.  So somehow Excel has grown so large that subsequent set_Values don't work.  thoughts?

 private void CopyPaste( Range copy , Range paste )
        {
            Object pasteValue = copy.get_Value( XlRangeValueDataType.xlRangeValueDefault )
            paste.set_Value( XlRangeValueDataType.xlRangeValueDefault , pasteValue );
        }
Senior Software Engineer
Top Expert 2009
Commented:
Any reason why you are using CopyPaste method instead of
copy.Copy(paste); // It consumes less memory and avoid two COM calls to Excel.
Note:  pasteValue  will not go away until GC collects it. So if you are calling it multiple times, you will reach windows limit. Remember that in Windows you only have 2GB available for data  for any windows application.
Note: 4Gb total (memory + program) for Excel and any Excel interaction. remember that you are sharing the memory with Excel itself.

It is very hard to provide more insights without making a code review. I have a VSTO Add-in that populates thousands of rows in multiple worksheets without any issues but I minimize my interaction with Excel as much as possible, but as you correctly points out interacting with Excel is a big pain and booby traps are everywhere. (It helps that VSTO creates a new AppDomain so that it gives 2GB for my program's use and not shared with Excel)

http://en.wikipedia.org/wiki/Visual_Studio_Tools_for_Office

Author

Commented:
Good question.  There is a good reason for not using copy.Copy(paste) - I had to write the test code to remind myself why I stopped using that method.  copy.Copy(paste) preserves formulas, which leads to data corruption when multiple spreadsheets are being normalized into a single spreadsheet.  Of course, I could always Copy, then PasteSpecial, but that presents 2 problems.  First is the use of the clipboard.  I avoid that at all costs - the windows clipboard is unreliable and data corruption could occur if a user uses the clipboard while a program is also using it.  Also, PasteSpecial doesn't preserve errors.  So #REF!, etc. get converted into their underlying integer values.  I've found that the best thing to do is to copy the value of the range into memory, iterate through the values converting all integers to their respective error wrappers, then pushing that object to the paste worksheet.

Are you sure about Windows only having 2GB available?  I'm on the x64 box with an x64 build - I believe I can address way more memory space than 2GB.  I don't quite understand your 4GB total comment nor your comment about AppDomain.  Excel is out-of-process.  If private bytes are not increasing, then it means that .net itself isn't growing in memory, right?  What part does Excel play in that memory space?
Miguel OzSenior Software Engineer
Top Expert 2009

Commented:
Ok, you did not mention before that you are using a x64 box. The answer is "No" because even though your program can address more memory:
- You are stuck with Excel 32 bit limitation every time you make an Excel call. (but Excel 2010 will support 64 bits).
- CopyPaste method is creating an object (pasteValue) that does not go away until GC runs.
- Excel automation is based on STA COM.
Check:
http://social.msdn.microsoft.com/forums/en-US/sqlintegrationservices/thread/213b90cb-5aea-41be-b935-fa995b6322ca/
http://blogs.msdn.com/excel/archive/2009/08/28/excel-2010-now-with-more-bits.aspx

Author

Commented:
I'm still confused about what "You are stuck with Excel 32 bit limitation every time you make an Excel call" means?  What is the limitation and do you mean on the Excel process side or the .net process side?  The Excel website doesn't specify how large the Excel process can get in memory
http://office.microsoft.com/en-us/excel/HP100738491033.aspx
is it 2GB?  If so, then it makes perfect sense that my Excel file is blowing up.  Or is that a limitation of STA COM in general?  

Author

Commented:
this site says that Excel2007 is limited to 2GB:
http://www.decisionmodels.com/memlimitsc.htm

I guess that's the answer.  my master spreadsheet is getting too big

Do more with

Expert Office
Submit tech questions to Ask the Experts™ at any time to receive solutions, advice, and new ideas from leading industry professionals.

Start 7-Day Free Trial