Posted on 2003-12-05
Last Modified: 2006-11-17
I`m starting to learn assembly language with the Art of Assembly book, I was wondering about Hyper-Threading technology, and how would that affect or change the asm commands, please give me more information about this subject related to assembly, thx
Question by:j_uan
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
LVL 22

Accepted Solution

grg99 earned 220 total points
ID: 9884131
Hyper-threading is well named, more hype than anything else.

The basic idea is you make a chip that looks like it's TWO CPUs.

But it's actually just ONE CPU that is overcommitted-- i.e. there's two sets of registers, but not much more of anything else.
All the adders, multipliers, shifters, and data paths are the same as on a single Pentium chip.

Now if you run TWO programs or threads, each of which is poorly-written, then maybe you'll get more thruput.
If program #1 isnt doing much with the multiplier, then maybe program #2 can take up the slack, IF it happens to do a lot of multiplies.

But if either program is already well-written, where it makes good use of the CPU, then there won't be much spare resoucres for the other thread or program.  And you'll actually get poorer overall performace due to the somewhat small but still present overhead of hyper-threading.

And hyper-threading is going to fight with the other parts of the instruction schedulers that are trying to do the opposite-- schedule as many of the functional units as possible for the current task.  

If you look carefully at some of the benchmarks you'll see this happening.   Hyper-threading can actually be slower than not.  
A clever idea, but basically a band-aid kludge that promises a lot more than it can ever deliver.


As to writing in assembler, well, it's going to be very difficult to write code that is hyper-friendly.
Why write code that doesnt use the CPU efficiently?   Sounds like a losing proposition most of the time.


Assisted Solution

terageek earned 150 total points
ID: 9884575
You don't NEED to do anything for a hyper-threaded processor that you normally wouldn't do.  You can see some benefits from hyper-threading if you can make a multi-threaded program.  If you have two complex tasks which can be done in parralel, creating a new process for the second task can show some performance gains.  For example, you can have one process which renders the screen while another works on the AI.

A hyper-threaded processor does not have 2 sets of registers.  It simply performs the register re-nameing function slightly differently.  The pentium processors all have a pool of registers (80 on the PII).  Whenever you write to a register (say it is AX), the processor goes into that pool and calls a particular locaction "AX".  If you have a second instruction which reads AX, it will need to wait for that space to be free.  If you have a third instruciton which also writes to AX, normally it can't execute, but with register renaming, the processor can give a different location in its register pool the name "AX".  The processor knows that the first AX will be needed for earlier instructions which are waiting to execute, and the second AX will be for any future instructions waiting to execute.  This is how a processor is able to do out of order execution.

What hyper-threading does is it marks each register int he pool with not just a name, but also a thread id.  Also, each instruction that is executed is tagged with a thread id.  Each instruciton which is executed will only read registers maked with it's thread id.  You do need to have two instruction pointers, but for the most part, the register pool is untouched.

As for performance, if you have a memory intensive task running, it doesn't matter how well it is programmed, you will get cache misses.  Each cache miss will leave the CPU idle for a good 100 clock cycles durring which a second program can use the CPU.  This is where hyper-threading get's its biggest gains.  While one process is waiting for data from memory, the second can execute.  There are also a number of times when your program will be sitting around waiting for you code to do a serial calculation.  For example, if I want to add a + b + c, no matter how well you program it, the processor will need to wait for the result of a + b before you can add c to it.  If your program doesn't have anything better to do while it waits, time is wasted.  By having an independant stream of cycles to execute, the processor can do some useful work while the first thread is executing.

The instruction scheduler is actually more efficient because it has 2 sets of instructions to choose from.  The reason that a hyper-threaded processor is slower on some benchmarks is actually because some of the CPU resources are dedicated to each thread.  A Hyper-threaded processor reserves 1/2 the FSB queues for each process, so if you only have 1 process which needs all of that bandwidth, you will have only 1/2 as many queues as on a non-hyperthreaded system.  This is why you will see that memory benchmarks seem to be especially slower on a hyper-threaded computer.  There are benchmarks which show that if you have multiple processes running, the total time to complete both tasks will be significantly faster on a hyper-threaded computer than a non-hyper-threaded computer.
LVL 22

Expert Comment

ID: 9888485
"As for performance, if you have a memory intensive task running, it doesn't matter how well it is programmed, you will get cache misses.  Each cache miss will leave the CPU idle for a good 100 clock cycles durring which a second program can use the CPU.  This is where hyper-threading get's its biggest gains.  While one process is waiting for data from memory, the second can execute."

... but only in the purely-mythical case where ONE thread has apparently maxed-out the cache so it has to wait for a slow memory read, while the other thread somehow just coincidentally is doing register-only operations.  Such a writer could make millions writing children's fairy tales.

The hyper-technologists better have an answer for the question:  what happens to a loop I've carefully written to be optimized for using the cache?  It seems that any other hyper-threaded task is going to screw up my cache.  
Those uber-geeks that have been tweaking their code will probably see a bad performance hit.  Not a nice OOB experience for those folks.

"For example, if I want to add a + b + c, no matter how well you program it, the processor will need to wait for the result of a + b before you can add c to it"

Again, compilers and assembly language progframmers have known for about a decade now to overlap operations as  much as possible.   So if it's a well-optimized program already, it's not going to benifit a lot, and may actually run consideably slower. If it's an old or poorly-tuned program, it may give other threads more time, sure, but it's sorta like saying that it's good to be dumb as it makes the smarter folks feel better.

Enroll in June's Course of the Month

June’s Course of the Month is now available! Experts Exchange’s Premium Members, Team Accounts, and Qualified Experts have access to a complimentary course each month as part of their membership—an extra way to sharpen your skills and increase training.


Author Comment

ID: 9890379
So, after all this info, is ASM for the programmer is the same with or without HT?
What about the OS programmer?

Expert Comment

ID: 9891115
As a programmer, if you want to see the benifits of HT, you should try to break your program into 2 threads which can execute in parallel.  Finding parallelism in programs is an entire area of reserch unto itself.  You can try to crunch two different sets of numbers simultaneously, or try to perform unrelated tasks simultaneously.  Chances are that you will not want to do this in ASM, but at a higher level, split your program into multiple threads, then code your tight loops in the individual threads in ASM.

For the OS programmer, a HT processor looks like 2 separate processors, so if the OS can support 2 processors, it will be able to support HT.

I really don't feel like getting into any further arguments about the benifits of hyper-threading.  There are those who dismiss it as useless.  It is a fact, that hyper-threading will do nothing to help a single threaded process run faster.  In fact, hyper-threading will usually slow down a single threaded process.  As a result, you will not see the benefits of hyper-threading in benchmarks which are all single threaded.  The benefit comes when a person tries to run two programs simultaneously, or a two threaded program.  Two threads will be able to execute together, and the overall time it takes to do two tasks simultaneously will be reduced.
LVL 22

Expert Comment

ID: 9892048
Programming in Aam shouldnt change much-- you should keep in mind that with hyper-threading, your carefully crafted cache strategies may be interfered with by the other threads, so your code may inexplicably run slower.

If you're programming an OS, hyperthreading gives you an opportunity to do a little more work in the background, maybe do a bit more garbage collection or virus scanning or disk cache tuning.  On the other hand, any time-critical loops, such as in audio or video streaming drivers, may nto be able to keep up if another tthread steals a lot of cache your driver was expecting to have exclusive access to.  They probably added some instructions to turn off hyper-threading in critical loops, at least I sure hope so.


Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

In this post we will be converting StringData saved within a text file into a hash table. This can be further used in a PowerShell script for replacing settings that are dynamic in nature from environment to environment.
Here's how to start interacting with our community through Post.
NetCrunch network monitor is a highly extensive platform for network monitoring and alert generation. In this video you'll see a live demo of NetCrunch with most notable features explained in a walk-through manner. You'll also get to know the philos…
Add bar graphs to Access queries using Unicode block characters. Graphs appear on every record in the color you want. Give life to numbers. Hopes this gives you ideas on visualizing your data in new ways ~ Create a calculated field in a query: …

707 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question