Solved

Getting the number of CPU cycles (commands per second) a process is using

Posted on 2006-11-07
15
627 Views
Last Modified: 2013-12-03
Hello experts,

Does anyone know how to get the number of CPU cycles a certain process is using in Windows XP SP2? I am looking for a result in commands per second. I don't mind what programming language or application is used (C,C++,C#, etc.).

I have tried looking through MSDN, the Alt+Ctrl+Delete menu, and an application entitled "Process Explorer", but I couldn't find what I was looking for.

I'm not sure if this kind of question is too low level for Windows. I would appreciate it if someone let me know if getting the number of commands per second of an application was possible in Windows.

I'm not looking for an answer in terms of the percentage of the CPU used up unless there's no other answer or more accurate answer. Notwithstanding, would it be possible to approximate the number of cycles per second by using the percentage of the CPU of a process? That is, am I making any invalid assumptions with this math:

commands per second = CPU Percentage * (# of commands per second, max)

Processors are often advertised with a frequency such as 2.8 Ghz. I'm not sure what this exactly means, but would it mean commands per second? That is, could I simply do this, for example:

commands per second = 30% * (2,800,000,000 commands / second)

Thanks,
Marty
0
Comment
Question by:Marty543
  • 5
  • 5
  • 4
  • +1
15 Comments
 
LVL 11

Expert Comment

by:Jase-Coder
ID: 17905893
Hi,

>> Processors are often advertised with a frequency such as 2.8 Ghz. I'm not sure what this exactly means, but would it mean commands per second? That is, could I simply do this, for example:

this is the number of cycles/operations a CPU does a second.
0
 

Author Comment

by:Marty543
ID: 17906276
Thanks. And I presume it may take many cycles, maybe 3-10, to actually perform one command.
0
 
LVL 8

Assisted Solution

by:mxjijo
mxjijo earned 100 total points
ID: 17910127

>> to actually perform one command
May be you need to explain what you mean by "command".

As you might alreay know CPU works with a a set of instructions called "instruction set".
Based on the architecture of the chip instructions can be as simple as move (mov) or complex mmx instructions.

The speed, often advertised with a cpu denotes its internal control clock speed.
Every instruction takes one or more of these clock ticks to get executed.
They also advertise a IPS (instructons per second) or MIPS (millions of instructons per second).

Back to your question,
        On complex OS's like windows/unix. There will be several programs running at the same time (time-shared).
So it will be virtually impossible to trace down cpu cycle usage  per process - because CPU keep switching between processes.
However, you may take a look at http://en.wikipedia.org/wiki/RDTSC to get an idea of other possibilities
0
 
LVL 8

Accepted Solution

by:
adg080898 earned 400 total points
ID: 17918043
It is easy to get the exact number of cycles taken. X86 processors have an instruction called RDTSC (read timestamp counter). It is a 64-bit register which is incremented (increased by one) every clock tick (meaning, on a 2GHz processor, it will increase by 2,000,000,000 per second.)

Here is a function to read the timestamp counter:

__int64 RDTSC()
{
  __asm rdtsc
}

(You will get a warning like "must return a value". Ignore it, the compiler can't understand the assembly language instruction)

Or....

If you don't need extreme precision, you can use QueryPerformanceCounter and QueryPerformanceFrequency. These functions usually have a precision of about 2 microseconds. (The actual frequency depends on the type of computer you have. Old systems are precise to about 1.19 microseconds).

It is fairly simple, you call QueryPerformanceFrequency to get the frequency. Then, you call QueryPerformanceCounter before and after the operation to be timed.

int TimeMe()
{
  // .....do work to be timed....
}

int TimeIt()
{
  LARGE_INTEGER liFreq, liStart, liEnd, liElap;

  QueryPerformanceFrequency(&liFreq);

  QueryPerformanceCounter(&liStart);
  TimeMe();
  QueryPerformanceCounter(&liEnd);

  liElap.QuadPart = liEnd.QuadPart - liStart.QuadPart;
 
  // Get nano (billionth) of a second accuracy
  liElap.QuadPart *= 1000000000;
  liElap.QuadPart /= liFreq.QuadPart;

  printf("Time was %I64d nanoseconds\n", liElap.QuadPart);
}

I'm away from my development machine so I can't compile and test the code above, but I have done this hundreds of times, it should be right. :)
0
 
LVL 8

Expert Comment

by:adg080898
ID: 17918099
You may want to read MS docs for the functions used in the answer above.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/winui/winui/windowsuserinterface/windowing/timers/timerreference/timerfunctions/queryperformancecounter.asp

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/winui/winui/windowsuserinterface/windowing/timers/timerreference/timerfunctions/queryperformancefrequency.asp



Note that my answer above is a way of timing the amount of real time (aka "wallclock" time) that something takes. If you want to know what percentage of CPU time your program takes, you can use GetThreadTimes.

http://msdn.microsoft.com/library/default.asp?url=/library/en-us/dllproc/base/getthreadtimes.asp

This function returns, as FILETIMEs, the amount of CPU time used in user mode and kernel mode. (Kernel time is time spent deep inside system calls).

A FILETIME is simply a 64-bit number that is in increments of 100 nanoseconds (100 billionths of a second, or 0.1 microsecond).

Please let me know if this is really what you wanted, and I'll go into some detail...
0
 

Author Comment

by:Marty543
ID: 17919539
Thanks for your comments and the code.

By command, I meant instruction.

Now I know that it is too low level and impractical to get the number of instructions that are executing in a certain time frame, but getting the number of cycles and an accurate time difference between code statements is easy.
0
 
LVL 8

Expert Comment

by:adg080898
ID: 17925782
There is a way to actually record the instruction count, but it is extremely low level. Using performance monitoring counters, you can track the number of instructions executed (as well as a ton of other processor internals). I say it is "extremely" low level because the RDMSR and WRMSR instructions must be executed in kernel mode, so they require a driver to execute them. They are also *very* non-portable, every cpu model (even those from the same manufacturer!) has their own list of MSR register meanings.
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 

Author Comment

by:Marty543
ID: 17926088
How would that be done?
0
 
LVL 8

Expert Comment

by:mxjijo
ID: 17931950

Marty543,
     It looks like you ARE working in low level. I am not a CPU expert, but my understanding is: a given CPU always uses same number of clock ticks for a given instaruction. For example: say mov command require 10 clock ticks on pentium 4, all mov commands will be using 10 ticks each, no matter what the arguments are. So if there are 10 mov instructions in your program, you can just add them up (10x10 = 100 ticks).  This would give you the exact number of ticks that might need to execute "your" code alone.

     What you would need is the whitepaper from the CPU manufacturer which would give you the clock ticks requires for each instructions. I don't know whether/where this information is available. But may be worth considering.

hope that helps
~j
0
 
LVL 8

Expert Comment

by:mxjijo
ID: 17931971

okay.. I just found this link. It gives you the number of clocks required for every instruction

http://home.comcast.net/~fbui/intel/a.html#aaa
0
 
LVL 8

Expert Comment

by:mxjijo
ID: 17931978
0
 
LVL 8

Expert Comment

by:adg080898
ID: 17932339
>> "a given CPU always uses same number of clock ticks for a given instaruction"

That is not entirely correct. Older processors were quite predictable because they always executed instructions the same way. Newer processors use "out of order execution". This means that several internal processor resources are shared among execution units, and there are multiple execution units. Instructions are issued based on the availability of required instruction units. Also, instructions may be issued "out of order" based on the availability of operands. For example, assume the processor is executing instructions for the following code sequence:

mov eax,[_some_memory_operand]
mov ebx,1234
add ebx,[_some_other_memory_operand]
mov [_some_answer_variable],ebx
mov [_some_other_answer_variable],eax

This code:
- reads "some_memory_operand" into the eax register,
- loads 1234 into the ebx register,
- adds some_other_memory_operand to ebx,
- stores the sum in some_answer_variable,
- and stores eax in some_other_answer_variable.

Now let's assume that the memory for some_memory_operand is not in the cache (on the cpu core) and the processor must go all the way to the motherboard to read it. Let's also assume that the memory for some_other_memory_operand (the second instruction) IS in the cache. In this case, an older processor would *wait* until the first instruction pulled the data into the cache before continuing execution, even though it can immediately execute the second instruction. Newer processors use "out of order" execution, so it can actually get ahead and process the instructions after the first one even though it cannot complete the first one yet.

This causes many instructions to have widely varying timings even though they seem to be "simple" instructions. It all depends on the current execution context at the moment that an instruction is issued.

All processors since the pentium pro use out of order execution extensively. This makes them much faster and far less sensitive to the order of the program instructions. Because x86 processors have very few registers, this drastically improves performance because there are barely enough registers to properly "schedule" (put in the best order) the instructions.

Once all the data are in the cpu cache, instruction execution is a lot more predictable, but still varies in the amount of time taken to execute them, expecially in complex loops where incorrect branch prediction can cause pipeline flushes.
0
 
LVL 8

Expert Comment

by:mxjijo
ID: 17932428

thank you for that posting adg, that was quite lot of information.
~j
   
0
 
LVL 8

Expert Comment

by:adg080898
ID: 17932450
>> 10 mov instructions in your program, you can just add them up (10x10 = 100 ticks).

Again, not true anymore. Modern processors have two concepts which must be considered together when analyzing the instruction timing: latency and throughput.

Latency means "how many clock ticks until I get the answer"

Throughput means "how many clock ticks until I can issue another instruction.

For example, say a "mov reg,mem" (read memory variable into processor register has a throughput of 1 and a latency of 8. Because the latency is 1, you can issue one of those instructions on EVERY clock tick, however, the answer will not be available until 8 clock ticks later.

The reason for this is the "pipelined" nature of processor internals. Think of it like a car assembly line. If you stand at the end of the assembly line, you will see a complete car come out, say, every 30 seconds. This means that the throughput of a car construction is 30 seconds (a car is completed every 30 seconds). However, if you followed a car through the assembly line, you would say that it takes an hour to complete a car - (latency is one hour). The inside of a processor is just like an assembly line, a new instruction enters the pipeline very frequently, but the instruction takes several steps to complete. You can put a new instruction in the pipeline very frequently, but the answer does not come out the other end of the pipeline until several cycles later.

0
 

Author Comment

by:Marty543
ID: 17933012
adg,

Excellent example. Thanks for all of the information. mxjijo, thanks for the links.

-Marty
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article describes how to add a user-defined command button to the Windows 7 Explorer toolbar.  In the previous article (http://www.experts-exchange.com/A_2172.html), we saw how to put the Delete button back there where it belongs.  "Delete" is …
This article surveys and compares options for encoding and decoding base64 data.  It includes source code in C++ as well as examples of how to use standard Windows API functions for these tasks. We'll look at the algorithms — how encoding and decodi…
This is Part 3 in a 3-part series on Experts Exchange to discuss error handling in VBA code written for Excel. Part 1 of this series discussed basic error handling code using VBA. http://www.experts-exchange.com/videos/1478/Excel-Error-Handlin…
This is used to tweak the memory usage for your computer, it is used for servers more so than workstations but just be careful editing registry settings as it may cause irreversible results. I hold no responsibility for anything you do to the regist…

910 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

22 Experts available now in Live!

Get 1:1 Help Now