Link to home
Start Free TrialLog in
Avatar of Marty543
Marty543

asked on

Getting the number of CPU cycles (commands per second) a process is using

Hello experts,

Does anyone know how to get the number of CPU cycles a certain process is using in Windows XP SP2? I am looking for a result in commands per second. I don't mind what programming language or application is used (C,C++,C#, etc.).

I have tried looking through MSDN, the Alt+Ctrl+Delete menu, and an application entitled "Process Explorer", but I couldn't find what I was looking for.

I'm not sure if this kind of question is too low level for Windows. I would appreciate it if someone let me know if getting the number of commands per second of an application was possible in Windows.

I'm not looking for an answer in terms of the percentage of the CPU used up unless there's no other answer or more accurate answer. Notwithstanding, would it be possible to approximate the number of cycles per second by using the percentage of the CPU of a process? That is, am I making any invalid assumptions with this math:

commands per second = CPU Percentage * (# of commands per second, max)

Processors are often advertised with a frequency such as 2.8 Ghz. I'm not sure what this exactly means, but would it mean commands per second? That is, could I simply do this, for example:

commands per second = 30% * (2,800,000,000 commands / second)

Thanks,
Marty
Avatar of Jase-Coder
Jase-Coder

Hi,

>> Processors are often advertised with a frequency such as 2.8 Ghz. I'm not sure what this exactly means, but would it mean commands per second? That is, could I simply do this, for example:

this is the number of cycles/operations a CPU does a second.
Avatar of Marty543

ASKER

Thanks. And I presume it may take many cycles, maybe 3-10, to actually perform one command.
SOLUTION
Avatar of mxjijo
mxjijo

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
ASKER CERTIFIED SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
You may want to read MS docs for the functions used in the answer above.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/winui/winui/windowsuserinterface/windowing/timers/timerreference/timerfunctions/queryperformancecounter.asp

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/winui/winui/windowsuserinterface/windowing/timers/timerreference/timerfunctions/queryperformancefrequency.asp



Note that my answer above is a way of timing the amount of real time (aka "wallclock" time) that something takes. If you want to know what percentage of CPU time your program takes, you can use GetThreadTimes.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/getthreadtimes.asp

This function returns, as FILETIMEs, the amount of CPU time used in user mode and kernel mode. (Kernel time is time spent deep inside system calls).

A FILETIME is simply a 64-bit number that is in increments of 100 nanoseconds (100 billionths of a second, or 0.1 microsecond).

Please let me know if this is really what you wanted, and I'll go into some detail...
Thanks for your comments and the code.

By command, I meant instruction.

Now I know that it is too low level and impractical to get the number of instructions that are executing in a certain time frame, but getting the number of cycles and an accurate time difference between code statements is easy.
There is a way to actually record the instruction count, but it is extremely low level. Using performance monitoring counters, you can track the number of instructions executed (as well as a ton of other processor internals). I say it is "extremely" low level because the RDMSR and WRMSR instructions must be executed in kernel mode, so they require a driver to execute them. They are also *very* non-portable, every cpu model (even those from the same manufacturer!) has their own list of MSR register meanings.
How would that be done?

Marty543,
     It looks like you ARE working in low level. I am not a CPU expert, but my understanding is: a given CPU always uses same number of clock ticks for a given instaruction. For example: say mov command require 10 clock ticks on pentium 4, all mov commands will be using 10 ticks each, no matter what the arguments are. So if there are 10 mov instructions in your program, you can just add them up (10x10 = 100 ticks).  This would give you the exact number of ticks that might need to execute "your" code alone.

     What you would need is the whitepaper from the CPU manufacturer which would give you the clock ticks requires for each instructions. I don't know whether/where this information is available. But may be worth considering.

hope that helps
~j

okay.. I just found this link. It gives you the number of clocks required for every instruction

http://home.comcast.net/~fbui/intel/a.html#aaa
>> "a given CPU always uses same number of clock ticks for a given instaruction"

That is not entirely correct. Older processors were quite predictable because they always executed instructions the same way. Newer processors use "out of order execution". This means that several internal processor resources are shared among execution units, and there are multiple execution units. Instructions are issued based on the availability of required instruction units. Also, instructions may be issued "out of order" based on the availability of operands. For example, assume the processor is executing instructions for the following code sequence:

mov eax,[_some_memory_operand]
mov ebx,1234
add ebx,[_some_other_memory_operand]
mov [_some_answer_variable],ebx
mov [_some_other_answer_variable],eax

This code:
- reads "some_memory_operand" into the eax register,
- loads 1234 into the ebx register,
- adds some_other_memory_operand to ebx,
- stores the sum in some_answer_variable,
- and stores eax in some_other_answer_variable.

Now let's assume that the memory for some_memory_operand is not in the cache (on the cpu core) and the processor must go all the way to the motherboard to read it. Let's also assume that the memory for some_other_memory_operand (the second instruction) IS in the cache. In this case, an older processor would *wait* until the first instruction pulled the data into the cache before continuing execution, even though it can immediately execute the second instruction. Newer processors use "out of order" execution, so it can actually get ahead and process the instructions after the first one even though it cannot complete the first one yet.

This causes many instructions to have widely varying timings even though they seem to be "simple" instructions. It all depends on the current execution context at the moment that an instruction is issued.

All processors since the pentium pro use out of order execution extensively. This makes them much faster and far less sensitive to the order of the program instructions. Because x86 processors have very few registers, this drastically improves performance because there are barely enough registers to properly "schedule" (put in the best order) the instructions.

Once all the data are in the cpu cache, instruction execution is a lot more predictable, but still varies in the amount of time taken to execute them, expecially in complex loops where incorrect branch prediction can cause pipeline flushes.

thank you for that posting adg, that was quite lot of information.
~j
   
>> 10 mov instructions in your program, you can just add them up (10x10 = 100 ticks).

Again, not true anymore. Modern processors have two concepts which must be considered together when analyzing the instruction timing: latency and throughput.

Latency means "how many clock ticks until I get the answer"

Throughput means "how many clock ticks until I can issue another instruction.

For example, say a "mov reg,mem" (read memory variable into processor register has a throughput of 1 and a latency of 8. Because the latency is 1, you can issue one of those instructions on EVERY clock tick, however, the answer will not be available until 8 clock ticks later.

The reason for this is the "pipelined" nature of processor internals. Think of it like a car assembly line. If you stand at the end of the assembly line, you will see a complete car come out, say, every 30 seconds. This means that the throughput of a car construction is 30 seconds (a car is completed every 30 seconds). However, if you followed a car through the assembly line, you would say that it takes an hour to complete a car - (latency is one hour). The inside of a processor is just like an assembly line, a new instruction enters the pipeline very frequently, but the instruction takes several steps to complete. You can put a new instruction in the pipeline very frequently, but the answer does not come out the other end of the pipeline until several cycles later.

adg,

Excellent example. Thanks for all of the information. mxjijo, thanks for the links.

-Marty