Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win



Posted on 2003-12-05
Medium Priority
Last Modified: 2006-11-17
I`m starting to learn assembly language with the Art of Assembly book, I was wondering about Hyper-Threading technology, and how would that affect or change the asm commands, please give me more information about this subject related to assembly, thx
Question by:j_uan
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
LVL 22

Accepted Solution

grg99 earned 880 total points
ID: 9884131
Hyper-threading is well named, more hype than anything else.

The basic idea is you make a chip that looks like it's TWO CPUs.

But it's actually just ONE CPU that is overcommitted-- i.e. there's two sets of registers, but not much more of anything else.
All the adders, multipliers, shifters, and data paths are the same as on a single Pentium chip.

Now if you run TWO programs or threads, each of which is poorly-written, then maybe you'll get more thruput.
If program #1 isnt doing much with the multiplier, then maybe program #2 can take up the slack, IF it happens to do a lot of multiplies.

But if either program is already well-written, where it makes good use of the CPU, then there won't be much spare resoucres for the other thread or program.  And you'll actually get poorer overall performace due to the somewhat small but still present overhead of hyper-threading.

And hyper-threading is going to fight with the other parts of the instruction schedulers that are trying to do the opposite-- schedule as many of the functional units as possible for the current task.  

If you look carefully at some of the benchmarks you'll see this happening.   Hyper-threading can actually be slower than not.  
A clever idea, but basically a band-aid kludge that promises a lot more than it can ever deliver.


As to writing in assembler, well, it's going to be very difficult to write code that is hyper-friendly.
Why write code that doesnt use the CPU efficiently?   Sounds like a losing proposition most of the time.


Assisted Solution

terageek earned 600 total points
ID: 9884575
You don't NEED to do anything for a hyper-threaded processor that you normally wouldn't do.  You can see some benefits from hyper-threading if you can make a multi-threaded program.  If you have two complex tasks which can be done in parralel, creating a new process for the second task can show some performance gains.  For example, you can have one process which renders the screen while another works on the AI.

A hyper-threaded processor does not have 2 sets of registers.  It simply performs the register re-nameing function slightly differently.  The pentium processors all have a pool of registers (80 on the PII).  Whenever you write to a register (say it is AX), the processor goes into that pool and calls a particular locaction "AX".  If you have a second instruction which reads AX, it will need to wait for that space to be free.  If you have a third instruciton which also writes to AX, normally it can't execute, but with register renaming, the processor can give a different location in its register pool the name "AX".  The processor knows that the first AX will be needed for earlier instructions which are waiting to execute, and the second AX will be for any future instructions waiting to execute.  This is how a processor is able to do out of order execution.

What hyper-threading does is it marks each register int he pool with not just a name, but also a thread id.  Also, each instruction that is executed is tagged with a thread id.  Each instruciton which is executed will only read registers maked with it's thread id.  You do need to have two instruction pointers, but for the most part, the register pool is untouched.

As for performance, if you have a memory intensive task running, it doesn't matter how well it is programmed, you will get cache misses.  Each cache miss will leave the CPU idle for a good 100 clock cycles durring which a second program can use the CPU.  This is where hyper-threading get's its biggest gains.  While one process is waiting for data from memory, the second can execute.  There are also a number of times when your program will be sitting around waiting for you code to do a serial calculation.  For example, if I want to add a + b + c, no matter how well you program it, the processor will need to wait for the result of a + b before you can add c to it.  If your program doesn't have anything better to do while it waits, time is wasted.  By having an independant stream of cycles to execute, the processor can do some useful work while the first thread is executing.

The instruction scheduler is actually more efficient because it has 2 sets of instructions to choose from.  The reason that a hyper-threaded processor is slower on some benchmarks is actually because some of the CPU resources are dedicated to each thread.  A Hyper-threaded processor reserves 1/2 the FSB queues for each process, so if you only have 1 process which needs all of that bandwidth, you will have only 1/2 as many queues as on a non-hyperthreaded system.  This is why you will see that memory benchmarks seem to be especially slower on a hyper-threaded computer.  There are benchmarks which show that if you have multiple processes running, the total time to complete both tasks will be significantly faster on a hyper-threaded computer than a non-hyper-threaded computer.
LVL 22

Expert Comment

ID: 9888485
"As for performance, if you have a memory intensive task running, it doesn't matter how well it is programmed, you will get cache misses.  Each cache miss will leave the CPU idle for a good 100 clock cycles durring which a second program can use the CPU.  This is where hyper-threading get's its biggest gains.  While one process is waiting for data from memory, the second can execute."

... but only in the purely-mythical case where ONE thread has apparently maxed-out the cache so it has to wait for a slow memory read, while the other thread somehow just coincidentally is doing register-only operations.  Such a writer could make millions writing children's fairy tales.

The hyper-technologists better have an answer for the question:  what happens to a loop I've carefully written to be optimized for using the cache?  It seems that any other hyper-threaded task is going to screw up my cache.  
Those uber-geeks that have been tweaking their code will probably see a bad performance hit.  Not a nice OOB experience for those folks.

"For example, if I want to add a + b + c, no matter how well you program it, the processor will need to wait for the result of a + b before you can add c to it"

Again, compilers and assembly language progframmers have known for about a decade now to overlap operations as  much as possible.   So if it's a well-optimized program already, it's not going to benifit a lot, and may actually run consideably slower. If it's an old or poorly-tuned program, it may give other threads more time, sure, but it's sorta like saying that it's good to be dumb as it makes the smarter folks feel better.

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.


Author Comment

ID: 9890379
So, after all this info, is ASM for the programmer is the same with or without HT?
What about the OS programmer?

Expert Comment

ID: 9891115
As a programmer, if you want to see the benifits of HT, you should try to break your program into 2 threads which can execute in parallel.  Finding parallelism in programs is an entire area of reserch unto itself.  You can try to crunch two different sets of numbers simultaneously, or try to perform unrelated tasks simultaneously.  Chances are that you will not want to do this in ASM, but at a higher level, split your program into multiple threads, then code your tight loops in the individual threads in ASM.

For the OS programmer, a HT processor looks like 2 separate processors, so if the OS can support 2 processors, it will be able to support HT.

I really don't feel like getting into any further arguments about the benifits of hyper-threading.  There are those who dismiss it as useless.  It is a fact, that hyper-threading will do nothing to help a single threaded process run faster.  In fact, hyper-threading will usually slow down a single threaded process.  As a result, you will not see the benefits of hyper-threading in benchmarks which are all single threaded.  The benefit comes when a person tries to run two programs simultaneously, or a two threaded program.  Two threads will be able to execute together, and the overall time it takes to do two tasks simultaneously will be reduced.
LVL 22

Expert Comment

ID: 9892048
Programming in Aam shouldnt change much-- you should keep in mind that with hyper-threading, your carefully crafted cache strategies may be interfered with by the other threads, so your code may inexplicably run slower.

If you're programming an OS, hyperthreading gives you an opportunity to do a little more work in the background, maybe do a bit more garbage collection or virus scanning or disk cache tuning.  On the other hand, any time-critical loops, such as in audio or video streaming drivers, may nto be able to keep up if another tthread steals a lot of cache your driver was expecting to have exclusive access to.  They probably added some instructions to turn off hyper-threading in critical loops, at least I sure hope so.


Featured Post

Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Sometimes Administrators rights are not enough. These cases call for the SYSTEM account. The process in this article outlines the steps required to execute commands using the SYSTEM account.
An overview of cyber security, cyber crime, and personal protection against hackers. Includes a brief summary of the Equifax breach and why everyone should be aware of it. Other subjects include: how cyber security has failed to advance with technol…
In this video, Percona Solution Engineer Rick Golba discuss how (and why) you implement high availability in a database environment. To discuss how Percona Consulting can help with your design and architecture needs for your database and infrastr…
Is your data getting by on basic protection measures? In today’s climate of debilitating malware and ransomware—like WannaCry—that may not be enough. You need to establish more than basics, like a recovery plan that protects both data and endpoints.…

610 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question