I`m starting to learn assembly language with the Art of Assembly book, I was wondering about Hyper-Threading technology, and how would that affect or change the asm commands, please give me more information about this subject related to assembly, thx
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Hyper-threading is well named, more hype than anything else.

The basic idea is you make a chip that looks like it's TWO CPUs.

But it's actually just ONE CPU that is overcommitted-- i.e. there's two sets of registers, but not much more of anything else.
All the adders, multipliers, shifters, and data paths are the same as on a single Pentium chip.

Now if you run TWO programs or threads, each of which is poorly-written, then maybe you'll get more thruput.
If program #1 isnt doing much with the multiplier, then maybe program #2 can take up the slack, IF it happens to do a lot of multiplies.

But if either program is already well-written, where it makes good use of the CPU, then there won't be much spare resoucres for the other thread or program.  And you'll actually get poorer overall performace due to the somewhat small but still present overhead of hyper-threading.

And hyper-threading is going to fight with the other parts of the instruction schedulers that are trying to do the opposite-- schedule as many of the functional units as possible for the current task.  

If you look carefully at some of the benchmarks you'll see this happening.   Hyper-threading can actually be slower than not.  
A clever idea, but basically a band-aid kludge that promises a lot more than it can ever deliver.


As to writing in assembler, well, it's going to be very difficult to write code that is hyper-friendly.
Why write code that doesnt use the CPU efficiently?   Sounds like a losing proposition most of the time.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
You don't NEED to do anything for a hyper-threaded processor that you normally wouldn't do.  You can see some benefits from hyper-threading if you can make a multi-threaded program.  If you have two complex tasks which can be done in parralel, creating a new process for the second task can show some performance gains.  For example, you can have one process which renders the screen while another works on the AI.

A hyper-threaded processor does not have 2 sets of registers.  It simply performs the register re-nameing function slightly differently.  The pentium processors all have a pool of registers (80 on the PII).  Whenever you write to a register (say it is AX), the processor goes into that pool and calls a particular locaction "AX".  If you have a second instruction which reads AX, it will need to wait for that space to be free.  If you have a third instruciton which also writes to AX, normally it can't execute, but with register renaming, the processor can give a different location in its register pool the name "AX".  The processor knows that the first AX will be needed for earlier instructions which are waiting to execute, and the second AX will be for any future instructions waiting to execute.  This is how a processor is able to do out of order execution.

What hyper-threading does is it marks each register int he pool with not just a name, but also a thread id.  Also, each instruction that is executed is tagged with a thread id.  Each instruciton which is executed will only read registers maked with it's thread id.  You do need to have two instruction pointers, but for the most part, the register pool is untouched.

As for performance, if you have a memory intensive task running, it doesn't matter how well it is programmed, you will get cache misses.  Each cache miss will leave the CPU idle for a good 100 clock cycles durring which a second program can use the CPU.  This is where hyper-threading get's its biggest gains.  While one process is waiting for data from memory, the second can execute.  There are also a number of times when your program will be sitting around waiting for you code to do a serial calculation.  For example, if I want to add a + b + c, no matter how well you program it, the processor will need to wait for the result of a + b before you can add c to it.  If your program doesn't have anything better to do while it waits, time is wasted.  By having an independant stream of cycles to execute, the processor can do some useful work while the first thread is executing.

The instruction scheduler is actually more efficient because it has 2 sets of instructions to choose from.  The reason that a hyper-threaded processor is slower on some benchmarks is actually because some of the CPU resources are dedicated to each thread.  A Hyper-threaded processor reserves 1/2 the FSB queues for each process, so if you only have 1 process which needs all of that bandwidth, you will have only 1/2 as many queues as on a non-hyperthreaded system.  This is why you will see that memory benchmarks seem to be especially slower on a hyper-threaded computer.  There are benchmarks which show that if you have multiple processes running, the total time to complete both tasks will be significantly faster on a hyper-threaded computer than a non-hyper-threaded computer.
"As for performance, if you have a memory intensive task running, it doesn't matter how well it is programmed, you will get cache misses.  Each cache miss will leave the CPU idle for a good 100 clock cycles durring which a second program can use the CPU.  This is where hyper-threading get's its biggest gains.  While one process is waiting for data from memory, the second can execute."

... but only in the purely-mythical case where ONE thread has apparently maxed-out the cache so it has to wait for a slow memory read, while the other thread somehow just coincidentally is doing register-only operations.  Such a writer could make millions writing children's fairy tales.

The hyper-technologists better have an answer for the question:  what happens to a loop I've carefully written to be optimized for using the cache?  It seems that any other hyper-threaded task is going to screw up my cache.  
Those uber-geeks that have been tweaking their code will probably see a bad performance hit.  Not a nice OOB experience for those folks.

"For example, if I want to add a + b + c, no matter how well you program it, the processor will need to wait for the result of a + b before you can add c to it"

Again, compilers and assembly language progframmers have known for about a decade now to overlap operations as  much as possible.   So if it's a well-optimized program already, it's not going to benifit a lot, and may actually run consideably slower. If it's an old or poorly-tuned program, it may give other threads more time, sure, but it's sorta like saying that it's good to be dumb as it makes the smarter folks feel better.

Starting with Angular 5

Learn the essential features and functions of the popular JavaScript framework for building mobile, desktop and web applications.

j_uanAuthor Commented:
So, after all this info, is ASM for the programmer is the same with or without HT?
What about the OS programmer?
As a programmer, if you want to see the benifits of HT, you should try to break your program into 2 threads which can execute in parallel.  Finding parallelism in programs is an entire area of reserch unto itself.  You can try to crunch two different sets of numbers simultaneously, or try to perform unrelated tasks simultaneously.  Chances are that you will not want to do this in ASM, but at a higher level, split your program into multiple threads, then code your tight loops in the individual threads in ASM.

For the OS programmer, a HT processor looks like 2 separate processors, so if the OS can support 2 processors, it will be able to support HT.

I really don't feel like getting into any further arguments about the benifits of hyper-threading.  There are those who dismiss it as useless.  It is a fact, that hyper-threading will do nothing to help a single threaded process run faster.  In fact, hyper-threading will usually slow down a single threaded process.  As a result, you will not see the benefits of hyper-threading in benchmarks which are all single threaded.  The benefit comes when a person tries to run two programs simultaneously, or a two threaded program.  Two threads will be able to execute together, and the overall time it takes to do two tasks simultaneously will be reduced.
Programming in Aam shouldnt change much-- you should keep in mind that with hyper-threading, your carefully crafted cache strategies may be interfered with by the other threads, so your code may inexplicably run slower.

If you're programming an OS, hyperthreading gives you an opportunity to do a little more work in the background, maybe do a bit more garbage collection or virus scanning or disk cache tuning.  On the other hand, any time-critical loops, such as in audio or video streaming drivers, may nto be able to keep up if another tthread steals a lot of cache your driver was expecting to have exclusive access to.  They probably added some instructions to turn off hyper-threading in critical loops, at least I sure hope so.

It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today

From novice to tech pro — start learning today.