Link to home
Start Free TrialLog in
Avatar of enkimute
enkimute

asked on

OpenGL active wait problem (100%cpu usage..)

I am seeing bizar sheduling behaviour with opengl on different types of graphics hardware. There is apparantly a big difference between nvidia and ati opengl drivers
as far as blocking is concerned. On ATI cards, the glswapbuffers command appears to be a blocking command, which caused the original problem. Opengl applications
using less than 5% cpu on a machine with an nvidia card were using 100% on a similar machine with an ATI card.

I induced an active wait in my renderloop, measuring the render time and trying to sleep the process till just before the vertical retrace or if that is disbled the requested
framerate. This works sometimes, there are some quirks though. For starters, the windows thread switching granularity and the precission of timers is very important,
but one can work around that and so I did. The result was that the same reference app was now using about 20 to 25 % cpu on systems with ATI cards.

But, switching from the reference app to something more intensive, demonstrated yet another problem. It appears that many opengl commands can block the cpu on
an ati equiped machine. On nvidia, the commands return immeadiately whereas on ATI the cpu blocks on unpredictable opengl commands. Nvidia has the NV_FENCE
extension that allows me to do fine grained synchronisation, so I have no problems there .. but there's no equivalent on ATI, so I'm all out of ideas as to how to get
reasonable CPU usage on ATI-equiped systems.

I was also wondering how tripple buffering fits into this picture.
Avatar of davebytes
davebytes
Flag of United States of America image

I can't say I've ever seen what you describe.  Both companies use command buffering schemes to pass data off to the GPU, and swapbuffers might be a sync point but should never hold up the CPU.  However, there are other gl commands like flush that will hard sync, as will any sort of readPixels or equivalent from the backbuffer.

However, it is possible that if you have VSYNC turned on, swapbuffers becomes a sync-point for ATI (and for some reason not with NV).  I can check with the ATI engineering folks if this is a problem.  I assume you are using a modern (radeon) ATI card, and latest drivers?  I also assume you are running double-buffered applications.  Windowed or fullscreen might make a difference.  Any other 'special' extensions being used might make a difference.

You should never wait-state yourself.  The drivers should take care of VSYNC if you in fact do want it, and the swap should happen IN HARDWARE in fullscreen at least.  Possible that windowed mode apps work differently with VSYNC enabled.

d
Avatar of enkimute
enkimute

ASKER

I've been testing on latest radeon and nvidia cards .. with latest drivers.

I am using opengl 1.4 with many extensions, but the problems occur even in verry simple setups. Drawing a few primitives and calling swapbuffers will yield a 100% cpu
usage on ATI cards, 0% on nvidia .. I could send you samples, both source and binaries, but it's really too trivial ..

I've been scooping around on google, there's an article somewhere about the radeaon linux drivers that says they do an active wait till vsync because of timing issues ..
I couldn't find anything about the situation on windows though ..

I'm running my apps full screen, but there's no way of telling that to openGl, so unless the driver concludes it from the dc size, I guess the situation is the same windowed
or full screen ..  Maybe ATI always has tripple buffering enabled, causing my app to generate to many frames, eventually resulting in a blocking call (if the command buffers
are full)..

This problem has been puzzeling me for a couple of weeks now, I just want my simple ogl applications not to cludge up my cpu ..
It's amazing that the most trivial of cases can show the worst performance.. ;)

Both vendors have some sort of frame-ahead buffering of the rendering queue.  That's so when they start going really fast, they can keep processing things optimally.  It may be that you are rendering something so small, so rapidly, that the ATI driver is doing a 'hard wait', while the NV driver is doing a 'soft wait' -- i.e., the ATI driver does a spinloop of sorts, while the NV driver still has control of the process tree but is giving up cycles to other threads.  Or you have a call that is instigating the issue.

I'd be more than happy to look at a small sample -- and if it's really doing something that seems incorrect, I can forward it to my friends at ATI and get a 'proper answer' for why you are getting this particular results.

d
Good, I'll send it to you personally. Please drop me a mail at coding_nospamn_@mutefantasies.com. That would
be without the _nospamn_ off course .. That way I have your email as I can't find it here anywhere ..
I'll try to remember to forward this to ATI at some point, but here's the basic answer.

The GPU isn't doing any work.  All you do every frame is clear the buffer, which is an async flush in the hardware.  The NV driver probably does some sort of sleep internally, sleeping your process, if you get too far ahead of them.  The ATI driver is apparently hard-waiting when you get too far ahead.

The moral of the story is that you are running like 100,000 FPS or something extremely high, and at that point the driver is actually getting bottlenecked on the pure speed of requests, and does a block wait at some point.  When you start to add in real rendering code (and yes, triple buffering yourself), you shouldn't see any such bottleneck -- you'll always be doing enough work either in GPU rendering, or app-side logic, that you'll be below say 300fps (or more realistic, on the average machine, below 60fps).

Try adding in some basic amount of rendering into the Draw function.  Grab another few NeHe samples, see what it looks like.

If with 'real world' rendering submittals you continue to see an issue, you've got my email: ping me, and I'll DEFINITELY make sure that ATI takes a look.  Include a DXDIAG dump, as they'll want to see that.

Hope that helps!

-d
www.chait.net
off course it goes wrong when I do draw something too, the point is it _even_ goes wrong when you do nothing.
Furthermore, this app is only running free if you have vsync disabled. If VSync is enabled, it should run at exactly
the refresh rate, which it does, BUT .. it uses a 100%cpu while waiting for the sync. So you call glswapbuffers,
and it does not only lock the rendering thread (I'm fine with that), But it does not allow _ANY_ thread to execute
as the driver is performing an active wait. (resulting in a 100% cpu usage .. nomatter how little or much you are
drawing .. ).

But I'll send you a somewhat more extensive example that loads some models and does some rendering ...

really want to get to the bottom of this ...

grtz ..

okay.  I'll take a look and try pinging ATI.  It will be at 100% CPU if you are basically doing nothing but thousands of frames per second of data.  If you are running at a framerate LESS than the vsync, and have vsync enabled, and it waits in the driver, that's bad.  If you are running OVER the vsync rate, which means you are providing an updated frame before the current frame is potentially flipped, and you aren't triple buffered, it could stall awaiting the vsync (since it needs the backbuffer unlocked to render to it!).

Make sure to send me a DXDIAG too, just in case they want that.

-d
ASKER CERTIFIED SOLUTION
Avatar of davebytes
davebytes
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Thanks a bunch, I've been kind off very busy last couple of days, but I'll try to wrap up a nice little demo this weekend, so that we have something to test from.
So you can expect that in your mailbox one of these days..

enki