Solved

Hyper-Threading

Posted on 2003-12-05
6
541 Views
Last Modified: 2006-11-17
I`m starting to learn assembly language with the Art of Assembly book, I was wondering about Hyper-Threading technology, and how would that affect or change the asm commands, please give me more information about this subject related to assembly, thx
0
Comment
Question by:j_uan
  • 3
  • 2
6 Comments
 
LVL 22

Accepted Solution

by:
grg99 earned 220 total points
Comment Utility
Hyper-threading is well named, more hype than anything else.

The basic idea is you make a chip that looks like it's TWO CPUs.

But it's actually just ONE CPU that is overcommitted-- i.e. there's two sets of registers, but not much more of anything else.
All the adders, multipliers, shifters, and data paths are the same as on a single Pentium chip.

Now if you run TWO programs or threads, each of which is poorly-written, then maybe you'll get more thruput.
If program #1 isnt doing much with the multiplier, then maybe program #2 can take up the slack, IF it happens to do a lot of multiplies.

But if either program is already well-written, where it makes good use of the CPU, then there won't be much spare resoucres for the other thread or program.  And you'll actually get poorer overall performace due to the somewhat small but still present overhead of hyper-threading.

And hyper-threading is going to fight with the other parts of the instruction schedulers that are trying to do the opposite-- schedule as many of the functional units as possible for the current task.  

If you look carefully at some of the benchmarks you'll see this happening.   Hyper-threading can actually be slower than not.  
A clever idea, but basically a band-aid kludge that promises a lot more than it can ever deliver.


----

As to writing in assembler, well, it's going to be very difficult to write code that is hyper-friendly.
Why write code that doesnt use the CPU efficiently?   Sounds like a losing proposition most of the time.





0
 
LVL 3

Assisted Solution

by:terageek
terageek earned 150 total points
Comment Utility
You don't NEED to do anything for a hyper-threaded processor that you normally wouldn't do.  You can see some benefits from hyper-threading if you can make a multi-threaded program.  If you have two complex tasks which can be done in parralel, creating a new process for the second task can show some performance gains.  For example, you can have one process which renders the screen while another works on the AI.

A hyper-threaded processor does not have 2 sets of registers.  It simply performs the register re-nameing function slightly differently.  The pentium processors all have a pool of registers (80 on the PII).  Whenever you write to a register (say it is AX), the processor goes into that pool and calls a particular locaction "AX".  If you have a second instruction which reads AX, it will need to wait for that space to be free.  If you have a third instruciton which also writes to AX, normally it can't execute, but with register renaming, the processor can give a different location in its register pool the name "AX".  The processor knows that the first AX will be needed for earlier instructions which are waiting to execute, and the second AX will be for any future instructions waiting to execute.  This is how a processor is able to do out of order execution.

What hyper-threading does is it marks each register int he pool with not just a name, but also a thread id.  Also, each instruction that is executed is tagged with a thread id.  Each instruciton which is executed will only read registers maked with it's thread id.  You do need to have two instruction pointers, but for the most part, the register pool is untouched.

As for performance, if you have a memory intensive task running, it doesn't matter how well it is programmed, you will get cache misses.  Each cache miss will leave the CPU idle for a good 100 clock cycles durring which a second program can use the CPU.  This is where hyper-threading get's its biggest gains.  While one process is waiting for data from memory, the second can execute.  There are also a number of times when your program will be sitting around waiting for you code to do a serial calculation.  For example, if I want to add a + b + c, no matter how well you program it, the processor will need to wait for the result of a + b before you can add c to it.  If your program doesn't have anything better to do while it waits, time is wasted.  By having an independant stream of cycles to execute, the processor can do some useful work while the first thread is executing.

The instruction scheduler is actually more efficient because it has 2 sets of instructions to choose from.  The reason that a hyper-threaded processor is slower on some benchmarks is actually because some of the CPU resources are dedicated to each thread.  A Hyper-threaded processor reserves 1/2 the FSB queues for each process, so if you only have 1 process which needs all of that bandwidth, you will have only 1/2 as many queues as on a non-hyperthreaded system.  This is why you will see that memory benchmarks seem to be especially slower on a hyper-threaded computer.  There are benchmarks which show that if you have multiple processes running, the total time to complete both tasks will be significantly faster on a hyper-threaded computer than a non-hyper-threaded computer.
0
 
LVL 22

Expert Comment

by:grg99
Comment Utility
"As for performance, if you have a memory intensive task running, it doesn't matter how well it is programmed, you will get cache misses.  Each cache miss will leave the CPU idle for a good 100 clock cycles durring which a second program can use the CPU.  This is where hyper-threading get's its biggest gains.  While one process is waiting for data from memory, the second can execute."


... but only in the purely-mythical case where ONE thread has apparently maxed-out the cache so it has to wait for a slow memory read, while the other thread somehow just coincidentally is doing register-only operations.  Such a writer could make millions writing children's fairy tales.

The hyper-technologists better have an answer for the question:  what happens to a loop I've carefully written to be optimized for using the cache?  It seems that any other hyper-threaded task is going to screw up my cache.  
Those uber-geeks that have been tweaking their code will probably see a bad performance hit.  Not a nice OOB experience for those folks.

"For example, if I want to add a + b + c, no matter how well you program it, the processor will need to wait for the result of a + b before you can add c to it"

Again, compilers and assembly language progframmers have known for about a decade now to overlap operations as  much as possible.   So if it's a well-optimized program already, it's not going to benifit a lot, and may actually run consideably slower. If it's an old or poorly-tuned program, it may give other threads more time, sure, but it's sorta like saying that it's good to be dumb as it makes the smarter folks feel better.



0
Threat Intelligence Starter Resources

Integrating threat intelligence can be challenging, and not all companies are ready. These resources can help you build awareness and prepare for defense.

 

Author Comment

by:j_uan
Comment Utility
So, after all this info, is ASM for the programmer is the same with or without HT?
What about the OS programmer?
0
 
LVL 3

Expert Comment

by:terageek
Comment Utility
As a programmer, if you want to see the benifits of HT, you should try to break your program into 2 threads which can execute in parallel.  Finding parallelism in programs is an entire area of reserch unto itself.  You can try to crunch two different sets of numbers simultaneously, or try to perform unrelated tasks simultaneously.  Chances are that you will not want to do this in ASM, but at a higher level, split your program into multiple threads, then code your tight loops in the individual threads in ASM.

For the OS programmer, a HT processor looks like 2 separate processors, so if the OS can support 2 processors, it will be able to support HT.

I really don't feel like getting into any further arguments about the benifits of hyper-threading.  There are those who dismiss it as useless.  It is a fact, that hyper-threading will do nothing to help a single threaded process run faster.  In fact, hyper-threading will usually slow down a single threaded process.  As a result, you will not see the benefits of hyper-threading in benchmarks which are all single threaded.  The benefit comes when a person tries to run two programs simultaneously, or a two threaded program.  Two threads will be able to execute together, and the overall time it takes to do two tasks simultaneously will be reduced.
0
 
LVL 22

Expert Comment

by:grg99
Comment Utility
Programming in Aam shouldnt change much-- you should keep in mind that with hyper-threading, your carefully crafted cache strategies may be interfered with by the other threads, so your code may inexplicably run slower.

If you're programming an OS, hyperthreading gives you an opportunity to do a little more work in the background, maybe do a bit more garbage collection or virus scanning or disk cache tuning.  On the other hand, any time-critical loops, such as in audio or video streaming drivers, may nto be able to keep up if another tthread steals a lot of cache your driver was expecting to have exclusive access to.  They probably added some instructions to turn off hyper-threading in critical loops, at least I sure hope so.


0

Featured Post

6 Surprising Benefits of Threat Intelligence

All sorts of threat intelligence is available on the web. Intelligence you can learn from, and use to anticipate and prepare for future attacks.

Join & Write a Comment

Possible fixes for Windows 7 and Windows Server 2008 updating problem. Solutions mentioned are from Microsoft themselves. I started a case with them from our Microsoft Silver Partner option to open a case and get direct support from Microsoft. If s…
HOW TO: Connect to the VMware vSphere Hypervisor 6.5 (ESXi 6.5) using the vSphere (HTML5 Web) Host Client 6.5, and perform a simple configuration task of adding a new VMFS 6 datastore.
Access reports are powerful and flexible. Learn how to create a query and then a grouped report using the wizard. Modify the report design after the wizard is done to make it look better. There will be another video to explain how to put the final p…
This demo shows you how to set up the containerized NetScaler CPX with NetScaler Management and Analytics System in a non-routable Mesos/Marathon environment for use with Micro-Services applications.

743 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

18 Experts available now in Live!

Get 1:1 Help Now