Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win


Getting the number of CPU cycles (commands per second) a process is using

Posted on 2006-11-07
Medium Priority
Last Modified: 2013-12-03
Hello experts,

Does anyone know how to get the number of CPU cycles a certain process is using in Windows XP SP2? I am looking for a result in commands per second. I don't mind what programming language or application is used (C,C++,C#, etc.).

I have tried looking through MSDN, the Alt+Ctrl+Delete menu, and an application entitled "Process Explorer", but I couldn't find what I was looking for.

I'm not sure if this kind of question is too low level for Windows. I would appreciate it if someone let me know if getting the number of commands per second of an application was possible in Windows.

I'm not looking for an answer in terms of the percentage of the CPU used up unless there's no other answer or more accurate answer. Notwithstanding, would it be possible to approximate the number of cycles per second by using the percentage of the CPU of a process? That is, am I making any invalid assumptions with this math:

commands per second = CPU Percentage * (# of commands per second, max)

Processors are often advertised with a frequency such as 2.8 Ghz. I'm not sure what this exactly means, but would it mean commands per second? That is, could I simply do this, for example:

commands per second = 30% * (2,800,000,000 commands / second)

Question by:Marty543
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 5
  • 5
  • 4
  • +1
LVL 11

Expert Comment

ID: 17905893

>> Processors are often advertised with a frequency such as 2.8 Ghz. I'm not sure what this exactly means, but would it mean commands per second? That is, could I simply do this, for example:

this is the number of cycles/operations a CPU does a second.

Author Comment

ID: 17906276
Thanks. And I presume it may take many cycles, maybe 3-10, to actually perform one command.

Assisted Solution

mxjijo earned 400 total points
ID: 17910127

>> to actually perform one command
May be you need to explain what you mean by "command".

As you might alreay know CPU works with a a set of instructions called "instruction set".
Based on the architecture of the chip instructions can be as simple as move (mov) or complex mmx instructions.

The speed, often advertised with a cpu denotes its internal control clock speed.
Every instruction takes one or more of these clock ticks to get executed.
They also advertise a IPS (instructons per second) or MIPS (millions of instructons per second).

Back to your question,
        On complex OS's like windows/unix. There will be several programs running at the same time (time-shared).
So it will be virtually impossible to trace down cpu cycle usage  per process - because CPU keep switching between processes.
However, you may take a look at http://en.wikipedia.org/wiki/RDTSC to get an idea of other possibilities
Free learning courses: Active Directory Deep Dive

Get a firm grasp on your IT environment when you learn Active Directory best practices with Veeam! Watch all, or choose any amount, of this three-part webinar series to improve your skills. From the basics to virtualization and backup, we got you covered.


Accepted Solution

adg080898 earned 1600 total points
ID: 17918043
It is easy to get the exact number of cycles taken. X86 processors have an instruction called RDTSC (read timestamp counter). It is a 64-bit register which is incremented (increased by one) every clock tick (meaning, on a 2GHz processor, it will increase by 2,000,000,000 per second.)

Here is a function to read the timestamp counter:

__int64 RDTSC()
  __asm rdtsc

(You will get a warning like "must return a value". Ignore it, the compiler can't understand the assembly language instruction)


If you don't need extreme precision, you can use QueryPerformanceCounter and QueryPerformanceFrequency. These functions usually have a precision of about 2 microseconds. (The actual frequency depends on the type of computer you have. Old systems are precise to about 1.19 microseconds).

It is fairly simple, you call QueryPerformanceFrequency to get the frequency. Then, you call QueryPerformanceCounter before and after the operation to be timed.

int TimeMe()
  // .....do work to be timed....

int TimeIt()
  LARGE_INTEGER liFreq, liStart, liEnd, liElap;



  liElap.QuadPart = liEnd.QuadPart - liStart.QuadPart;
  // Get nano (billionth) of a second accuracy
  liElap.QuadPart *= 1000000000;
  liElap.QuadPart /= liFreq.QuadPart;

  printf("Time was %I64d nanoseconds\n", liElap.QuadPart);

I'm away from my development machine so I can't compile and test the code above, but I have done this hundreds of times, it should be right. :)

Expert Comment

ID: 17918099
You may want to read MS docs for the functions used in the answer above.



Note that my answer above is a way of timing the amount of real time (aka "wallclock" time) that something takes. If you want to know what percentage of CPU time your program takes, you can use GetThreadTimes.


This function returns, as FILETIMEs, the amount of CPU time used in user mode and kernel mode. (Kernel time is time spent deep inside system calls).

A FILETIME is simply a 64-bit number that is in increments of 100 nanoseconds (100 billionths of a second, or 0.1 microsecond).

Please let me know if this is really what you wanted, and I'll go into some detail...

Author Comment

ID: 17919539
Thanks for your comments and the code.

By command, I meant instruction.

Now I know that it is too low level and impractical to get the number of instructions that are executing in a certain time frame, but getting the number of cycles and an accurate time difference between code statements is easy.

Expert Comment

ID: 17925782
There is a way to actually record the instruction count, but it is extremely low level. Using performance monitoring counters, you can track the number of instructions executed (as well as a ton of other processor internals). I say it is "extremely" low level because the RDMSR and WRMSR instructions must be executed in kernel mode, so they require a driver to execute them. They are also *very* non-portable, every cpu model (even those from the same manufacturer!) has their own list of MSR register meanings.

Author Comment

ID: 17926088
How would that be done?

Expert Comment

ID: 17931950

     It looks like you ARE working in low level. I am not a CPU expert, but my understanding is: a given CPU always uses same number of clock ticks for a given instaruction. For example: say mov command require 10 clock ticks on pentium 4, all mov commands will be using 10 ticks each, no matter what the arguments are. So if there are 10 mov instructions in your program, you can just add them up (10x10 = 100 ticks).  This would give you the exact number of ticks that might need to execute "your" code alone.

     What you would need is the whitepaper from the CPU manufacturer which would give you the clock ticks requires for each instructions. I don't know whether/where this information is available. But may be worth considering.

hope that helps

Expert Comment

ID: 17931971

okay.. I just found this link. It gives you the number of clocks required for every instruction


Expert Comment

ID: 17931978

Expert Comment

ID: 17932339
>> "a given CPU always uses same number of clock ticks for a given instaruction"

That is not entirely correct. Older processors were quite predictable because they always executed instructions the same way. Newer processors use "out of order execution". This means that several internal processor resources are shared among execution units, and there are multiple execution units. Instructions are issued based on the availability of required instruction units. Also, instructions may be issued "out of order" based on the availability of operands. For example, assume the processor is executing instructions for the following code sequence:

mov eax,[_some_memory_operand]
mov ebx,1234
add ebx,[_some_other_memory_operand]
mov [_some_answer_variable],ebx
mov [_some_other_answer_variable],eax

This code:
- reads "some_memory_operand" into the eax register,
- loads 1234 into the ebx register,
- adds some_other_memory_operand to ebx,
- stores the sum in some_answer_variable,
- and stores eax in some_other_answer_variable.

Now let's assume that the memory for some_memory_operand is not in the cache (on the cpu core) and the processor must go all the way to the motherboard to read it. Let's also assume that the memory for some_other_memory_operand (the second instruction) IS in the cache. In this case, an older processor would *wait* until the first instruction pulled the data into the cache before continuing execution, even though it can immediately execute the second instruction. Newer processors use "out of order" execution, so it can actually get ahead and process the instructions after the first one even though it cannot complete the first one yet.

This causes many instructions to have widely varying timings even though they seem to be "simple" instructions. It all depends on the current execution context at the moment that an instruction is issued.

All processors since the pentium pro use out of order execution extensively. This makes them much faster and far less sensitive to the order of the program instructions. Because x86 processors have very few registers, this drastically improves performance because there are barely enough registers to properly "schedule" (put in the best order) the instructions.

Once all the data are in the cpu cache, instruction execution is a lot more predictable, but still varies in the amount of time taken to execute them, expecially in complex loops where incorrect branch prediction can cause pipeline flushes.

Expert Comment

ID: 17932428

thank you for that posting adg, that was quite lot of information.

Expert Comment

ID: 17932450
>> 10 mov instructions in your program, you can just add them up (10x10 = 100 ticks).

Again, not true anymore. Modern processors have two concepts which must be considered together when analyzing the instruction timing: latency and throughput.

Latency means "how many clock ticks until I get the answer"

Throughput means "how many clock ticks until I can issue another instruction.

For example, say a "mov reg,mem" (read memory variable into processor register has a throughput of 1 and a latency of 8. Because the latency is 1, you can issue one of those instructions on EVERY clock tick, however, the answer will not be available until 8 clock ticks later.

The reason for this is the "pipelined" nature of processor internals. Think of it like a car assembly line. If you stand at the end of the assembly line, you will see a complete car come out, say, every 30 seconds. This means that the throughput of a car construction is 30 seconds (a car is completed every 30 seconds). However, if you followed a car through the assembly line, you would say that it takes an hour to complete a car - (latency is one hour). The inside of a processor is just like an assembly line, a new instruction enters the pipeline very frequently, but the instruction takes several steps to complete. You can put a new instruction in the pipeline very frequently, but the answer does not come out the other end of the pipeline until several cycles later.


Author Comment

ID: 17933012

Excellent example. Thanks for all of the information. mxjijo, thanks for the links.


Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

With most software applications trying to cater to multiple user needs nowadays, the focus is to make them as configurable as possible. For e.g., when creating Silverlight applications which will connect to WCF services, the service end point usuall…
Whether you've completed a degree in computer sciences or you're a self-taught programmer, writing your first lines of code in the real world is always a challenge. Here are some of the most common pitfalls for new programmers.
This is Part 3 in a 3-part series on Experts Exchange to discuss error handling in VBA code written for Excel. Part 1 of this series discussed basic error handling code using VBA. http://www.experts-exchange.com/videos/1478/Excel-Error-Handlin…
This tutorial will teach you the special effect of super speed similar to the fictional character Wally West aka "The Flash" After Shake : http://www.videocopilot.net/presets/after_shake/ All lightning effects with instructions : http://www.mediaf…

636 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question