Solved

Memory Performance Question - very technical

Posted on 2002-06-19
27
218 Views
Last Modified: 2012-05-04
I am building an server application that uses business objects extensively.  What started out as a "let's see if we can break it" test has turned into a serious performance tuning consideration.

Allocating new objects is very fast.  Until RAM is filled, I get somewhere between 100,000 and 500,000 objects per second created.  Once I start hitting virtual memory I get a flat 33,000 objects created per second.  At just over 4,900,000 objects the allocation loop fails (out of memory).  This is fine - it was the oroginal point of the test - to see what the limits were.

When I go to free the objects, I get a fraction of the performance.  At this high utilization, I can only free 130 objects per second.  Even at lower utilizations I see that releasing objects is far more expensive than creating them.  Persisting this many objects takes 8 minutes. Releasing them ... well over night I still had over 4,000,000 objects allocated.

Further testing showed tha the performance of "free" degrades lineraly with the number of objects allocated in memory.  This was done by allocating 10,000, 20,000, and 30,000 objects.  When no objects were in memory, creating and freeing a single object 10,000 times required 110 ms.  When 10,000 objects were resident the same operations required 220ms. 20,000 objects resident required 330 ms, and so on.  Create timing was flat regardless of the number of objects already resident at 10 ms for 10,000 objects.

Given that this application is intended to service a mission critical enterprise scale problem, the capacity for a clean shutdown is a very necessary thing.  An application that will start up instantly is nice, but if it takes three days to shut it down it makes maintenance difficult.

1. Does anyone know why is the performance of "Free" is so much poorer than "Create"?

2. Does anyone know why "Free" degrades in performance while "Create" remains flat?

3. Does anyone have some strategies for handling this performance differential?

To answer in advance some of the obvious questions:

1. I don't know for sure that the end users will have 4,000,000 plus objects resident in memory at one time.  but I also don't know for sure that they won't.  The project is still at the early stage where I can make drastic design changes without incurring a lot of overhead, so I am testing the limits early.  I'd rather deal with it now than have a dis-satisfied customer.

2. An earlier concept of this project was built with a relational model.  Althought rhe relational model handled much of the core business well, it did not handle many of the common circumstances very well at all.  The polymorphism of the object model provides a much more elegant way to handle the problem.

3. For the persistence layer I am using Interbase, although the business objects are agnostic towards the persistence layer implementation.

Thanks in advance

David Johnson

0
Comment
Question by:swift99
  • 14
  • 7
  • 2
  • +2
27 Comments
 
LVL 6

Expert Comment

by:DrDelphi
ID: 7094603
I'll take a stab at it:

1. Does anyone know why is the performance of "Free" is so much poorer than "Create"?
>> Create allocates new memory off the stack. It doesn't need to know WHERE in memory to look, so long as it is not memory allocated elsewhere. Free on the other hand, needs to know WHERE in memory to look to free the memory (and/or) pointer allocated for an object. An anology would be:

 Create: Get a brick off that pile over there... any old brick.  

Free: Go find brick #3902 and remove it frm that pile.

Which would you expect to be faster?



2. Does anyone know why "Free" degrades in performance while "Create" remains flat?

--See above


0
 
LVL 14

Expert Comment

by:DragonSlayer
ID: 7095252
interesting comment, dr delphi :)
0
 
LVL 12

Expert Comment

by:Lee_Nover
ID: 7095435
yep a good answer :)
so there's actually no way of making the freeing of objects faster
what about interfacedobjects performance ?
0
 
LVL 6

Author Comment

by:swift99
ID: 7095984
Good comments.

It looks like this performance bottleneck is a fact of life and I need to find a wway to live with it.  I guess I need some strategies for handling the emergency shutdown scenario then.

Some of what I have considered are:

1. A garbage collection thread.  Every getter/setter method will update a "last accessed timestamp" property.  Any object that is not accessed within a <<configured sweep time>> will be freed by the GC thread periodically.  This should keep the number of objects resident in memory at any one time way down.  At a very small performance penalty on each object transaction I gain the capacity to free things up early rather than late.

2. "Persist and crash" to exit.  With 4,900,000 objects This requires 8 minutes to persist and 12 minutes for Windows to free up the memory before the application completely terminates.  A 20 minute shutdown is certainly preferable to 3 days, and is acceptable for an enterprise scale application that is rarely shut down.

I guess it is also time to start considering "hot swap" technologies for application maintenance, so all but the very core functions can be maintained without shutting the application down.

Any other suggestions?
0
 
LVL 6

Author Comment

by:swift99
ID: 7096112
Good comments.

It looks like this performance bottleneck is a fact of life and I need to find a wway to live with it.  I guess I need some strategies for handling the emergency shutdown scenario then.

Some of what I have considered are:

1. A garbage collection thread.  Every getter/setter method will update a "last accessed timestamp" property.  Any object that is not accessed within a <<configured sweep time>> will be freed by the GC thread periodically.  This should keep the number of objects resident in memory at any one time way down.  At a very small performance penalty on each object transaction I gain the capacity to free things up early rather than late.

2. "Persist and crash" to exit.  With 4,900,000 objects This requires 8 minutes to persist and 12 minutes for Windows to free up the memory before the application completely terminates.  A 20 minute shutdown is certainly preferable to 3 days, and is acceptable for an enterprise scale application that is rarely shut down.

I guess it is also time to start considering "hot swap" technologies for application maintenance, so all but the very core functions can be maintained without shutting the application down.

Any other suggestions?
0
 
LVL 6

Author Comment

by:swift99
ID: 7096125
Lee:  Do you mean what happens if I establish the objects via ActiveX or Visibroker and release them?  That will be worth a try.  That's a significant shift so it might be next week before I can report back on that.
0
 
LVL 12

Expert Comment

by:Lee_Nover
ID: 7096187
no I'm thinking about an object like
TMyObject = class(TInterfacedObject, IMyObject)
so instead of obj.Free simply use obj:=nil;
store the object in a TList or some other container
TList would probably be the fastest

but I doubt it will have any performance gains ... more probably it will be slower
but hey .. you could still try it :)

the idea about threads is good
have you thought of "caching" the objects ?
add only the needed objects and reuse the ones that are not being in use anymore
so you actually don't free all of them only mark them as 'notinuse' (some property or sumtin)
adding an object would be slower but you would reduce the create/free counts therefore reducing memory allocations
that keeping the memory space less fragmented
that should keep it a bit faster
I'm just theorizing now but hope it helps :)
0
 
LVL 12

Expert Comment

by:Lee_Nover
ID: 7096192
ugh that "cahching" I was talking is commonly known as "pooling" :)
0
 
LVL 6

Expert Comment

by:DrDelphi
ID: 7097314
I  too considered a TList, but unfortunately, it seems to be slower if anything. A thought might be to have the "Inuse" as a property of the object itself. Imagine this (psuedocoded):

MyObject=class
 UseThread:TThread;
 Inuse:Boolean;
 TimeOut:cardinal;
 Procedure Foo;
 Procedure Bar;  
end;

Procedure Foo;
begin
  InUse:=True
  TimeOut:=0;
 //processing
  InUse:=False;
end;

Procedure Bar;
begin
  InUse:=True
  TimeOut:=0;
//processing
  InUse:=False;
end;


Constructor MyObject.Create;
begin
   InUse:=True;
   timeOut:=1;
   Create UseThread;
end;

UseThread's execute:

if Myobject.inuse then
begin
  Inc(MyObject.TimeOut);
  exit;
end
else
  begin
    if myobject.timeOut>=5000 then ///five seconds of non-use
    begin
      FreeAndNil(MyObject);
      Terminate;
    end;
  end;
end;

This way, every time you accesss any other the object's methods, the InUse property getsd updated and the thread leaves it alone.... but on the other hand, if 5 seconds go by and the object has not been used, the thread frees the object, effectively killing itself in the process.Forget keeping track of the objects and freeing them, let the objects handle that for themselves.  Mind you, there are bound to be wrinkles in this plan, since I am doing it off the top of my head, but it looks like it could work.


Good luck!!  
0
 
LVL 14

Expert Comment

by:DragonSlayer
ID: 7097722
I think it should be FreeAndNil(Self) instead? :)
but if you free it in the Thread, what's gonna happen to the thread? Just wondering.

Anyway, if David is planning to create 5million objects, umm... then there's gonna be 5 million threads...hehe, so much for multithreading ;)

But again, that's a good suggestion Dr Delphi.
0
 
LVL 12

Expert Comment

by:Lee_Nover
ID: 7097861
5 mil threads ... yeah right
I couldn't get them over 2048 !!!
I simply created each thread and added it to a TThreadList
it crashed everytime when it reached 2048
Win2k, d6 ent.

DrDelphi:
I have a similar thing in my threads
it's AutoSuspend for nonworking threads :)
http://lee.nover.has.his.evilside.org/isapi/pas2html.dll/pas2html?File=/delphi/MiscFiles/vn_common/lnVidTypes.pas
checkout the TlnThread and it's descendants :)


swift99:  where do you store your objects references now ?
0
 
LVL 6

Author Comment

by:swift99
ID: 7099119
DragonSlayer:  FreeAndNil (Self) should not work - 'self' is not an assignable variable, but thanks for the try.

Lee Nover:

Object references are guaranteed to be stored in a master TList, currently. The object is responsible for placing itself into the master list on Create and removing itself from the master list on Destroy.  At maximum number of objects it takes 8 minutes to traverse the TList sequentially.

When objects refer to each other they keep their own private references.  The TList is used only when searching for an object that may or may not be loaded, for identifying "orphaned" objects whose sessions are terminated, and for guaranteeing that objects are freed at the end of the session.

I can't predict in advance how many users will be working on different projects on the application sever, or the size of the projects.  This obviously limits how useful the garbage collection thread would be.  Past experience suggests that 5 million objects will represent several large projects (say 5 or 6 projects in the $30 to $50 million range).

I never expected memory management to be my biggest bottleneck when I started this project (lol).

Pooling sounds like an interesting thought.  It could be workable too.  I will have to see how the polymorphism of the business objects will hammer memory with fragmentation.  That's going to take some math - feedback to come.

I have seen some interesting timing figures from Java, and I may have time to duplicate the structures for the memory performance test this weekend.  I'll post the test results when I get them.
0
 
LVL 6

Author Comment

by:swift99
ID: 7099172
BTW: I also tried disabling the code that removes the objects from the list and also tried traversing the list in reverse order (most recent to oldest) with no changes in timing.
0
How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

 
LVL 6

Author Comment

by:swift99
ID: 7099333
Dr Delphi: Your suggestion is similar to what I was thinking of in many respects.  The InUse flag is already implemented as actually the handle of the thread that is processing the object.  It is set by the object's "ClaimMutex" method, and cleared by the "ReleaseMutex" method.  

This is (approximately) what I had in mind.  There will be some obvious adjustments that need to be made.  The test scripts to exercise this realistically will be <<fun>> to write.  :o)

MyObject = class
  LastUse: TDateTime;
end;

Procedure MyObject.Foo;
begin
     ClaimMutex; // Existing
     LastUse:=now
     //processing
     ReleaseMutex; // Existing
end;

Procedure GCThread.excute
begin
    While not terminated do
    begin
        for each active object
        begin
            if (Object.OwningThread = 0)
            and ((Object.LastUse - now) > <<TimeOutSetting>>) then
               Object.Free;
        end;
        sleep (<<ReasonableAmountOfTime>>);
     end;
end;

I'm also going to try the same test in Java as soon as I can, and I'll post the timings back here.
0
 
LVL 45

Expert Comment

by:aikimark
ID: 7102214
Have you tried removing/deleting objects in the reverse order from which they were created?
0
 
LVL 6

Author Comment

by:swift99
ID: 7106991
aikimark: Yes, that was my first thought.  No observable difference.
0
 
LVL 6

Author Comment

by:swift99
ID: 7107008
I haven't decided for sure what to do yet.  The options appear to be to migrate to a java platform using IBM JDK 1.3.1 (memory management is 10 times as fast as Delphi's on the "new" and apparently infinitely faster on the "free"), or to write my own memory manager.  If I write my own memory manager I believe that I will dust off my old macintosh manuals - the mac OS implemented an elegant garbage collection whose base requirements mirror very closely the buffering design I am considering.

The time has come to award points ...and for me go go back to the calculator for a bit.

Dr. Delphi and Lee Nover both actually tested some code, so I think splitting the points evenly between you would be most fair?
0
 
LVL 12

Expert Comment

by:Lee_Nover
ID: 7107036
maybe we should write about this to borlands staff ?
perhaps they will think about optimizing the destructor
hum ... I also found some alternate memory manager for delphi, said to be lots faster than delphis
but unfortunatelly I don't know where I saw that
I'll look it up :)
0
 
LVL 12

Accepted Solution

by:
Lee_Nover earned 50 total points
ID: 7107065
think I found it :)
check out : http://www.optimalcode.com/memmgr.htm
0
 
LVL 45

Expert Comment

by:aikimark
ID: 7107383
1. Have you tried allocating blocks of memory for multiple business objects?

2. Can you determine whether the performance bottleneck is in the application or Windows when freeing objects?

3. Since I would have implemented this in a database, I wonder if you might benefit from something akin to a MemoryMappedFile.  At its root, this is probably similar to (1).
0
 
LVL 6

Author Comment

by:swift99
ID: 7115149
aikimark:

Thanks for the feedback

The first implementation was built in a relational database.  The object model is more in line with the actual problem, and has significant performance benefits apart from the question of releasing the memory once you are done with it.  The implementation of the memory object model is, utlimately, exactly what you do to a database to improve its performance.  I don't think any PC databases can sustain the 1,000,000 plus synchronized transactions per second that the memory object model does.  Even the MVS mainframe database only supports a million or so before the system seems to bog down.

The bottleneck is in Windows, as near as I can tell.  I eliminated all application overhead code (so the app crashed rather than exited gracefully), but timing was the same.  Borland's default memory manager maps to Windows' memory management.  C++ programmers learn from day 1 that perfromance requires that you allocate a huge chunk of memory from Windows and then parcel it out internally in your application, or so I have been told.  This would be exactly what your suggestion #1  would implement.

#3 is quite different and results in much more complex code.
0
 
LVL 6

Author Comment

by:swift99
ID: 7115151
Lee: I will check this out.  Thanks.
0
 
LVL 6

Author Comment

by:swift99
ID: 7115309
Lee:

Now THAT is memory management!

6 minutes to create 4,000,000 objects, 3 minutes to release them all.  At lower counts (10,000 to 100,000 objcets) performanc is flat 60 ms to allocate 10,000 objects and 50 ms to release them.  A 700 MHZ PC can support about 9,000,000 integer object business transactions per second and 30,000 floating point transactions per second.

Full points to Lee I think.

Now I have to optimize some other stuff to catch up.  I like this idea!
0
 
LVL 6

Author Comment

by:swift99
ID: 7115312
Borland should be using this memory manager as their default!
0
 
LVL 12

Expert Comment

by:Lee_Nover
ID: 7115337
wow, didn't know it was that much faster
great :)
0
 
LVL 6

Author Comment

by:swift99
ID: 7115655
Final performance of this core framework, for those who are interested is:

10,000 objects created in 60 milliseconds.  Speed is only impacted once memory is full and virtual memopry comes into play.

Locating 1 object out of 4,000,000, one million times, required 600 milliseconds

10,000 objects created and destroyed in 220 milliseconds.  Speed to destroy is flat regardless of the number of objects in memory, impacted only by use of virtual memory.

Object transactions involving only integers performed at a speed of 9,500,500 transactions per second.

Maximum count of objects created was just over 4,100,000 in three minutes.  A clean shutdown then required 15 minutes including persisting all objects.  A dirty shutdown required 2 minutes.

My system is a 700 MHz Pentium III running Windows 98 with 256 MB RAM.
0
 
LVL 6

Author Comment

by:swift99
ID: 7115669
aikimark:

The 1,000,000 transactions for the MVS database were per hour, not per second.  The comparable object figure for integer transactions in the object model is 3,600 * 9,500,500 = 342,000,000,000 transactions per hour.

While I am not comparing apples to apples yet, the speed differential of 34,000 times vindicates the object application server approach from a performance standpoint.  

Even if I lose two orders of magnitude worth of performance once the framework is stable and I have sufficient real work going on to provide a good comparison, this model still gives me 340 times the performance of a relational model.  The limits to the application then become the system's limits in communicating with users.

This is why Java application servers are taking the world by storm right now.  We can do the same thing in Delphi, about 3 times as fast as Java, but we keep bogging ourselves down in the relational client/server model.

0

Featured Post

Better Security Awareness With Threat Intelligence

See how one of the leading financial services organizations uses Recorded Future as part of a holistic threat intelligence program to promote security awareness and proactively and efficiently identify threats.

Join & Write a Comment

Suggested Solutions

Creating an auto free TStringList The TStringList is a basic and frequently used object in Delphi. On many occasions, you may want to create a temporary list, process some items in the list and be done with the list. In such cases, you have to…
Hello everybody This Article will show you how to validate number with TEdit control, What's the TEdit control? TEdit is a standard Windows edit control on a form, it allows to user to write, read and copy/paste single line of text. Usua…
Illustrator's Shape Builder tool will let you combine shapes visually and interactively. This video shows the Mac version, but the tool works the same way in Windows. To follow along with this video, you can draw your own shapes or download the file…
When you create an app prototype with Adobe XD, you can insert system screens -- sharing or Control Center, for example -- with just a few clicks. This video shows you how. You can take the full course on Experts Exchange at http://bit.ly/XDcourse.

760 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now