Want to win a PS4? Go Premium and enter to win our High-Tech Treats giveaway. Enter to Win

x
?
Solved

Comparing dwords, words, and bytes

Posted on 2004-04-23
4
Medium Priority
?
323 Views
Last Modified: 2012-05-04
I've seen that switching cmp ax, dx with cmp eax, edx can make for huge speed improvements (I measured 40% faster in one of my algorithims).

Of course you need to zero the unused bytes of the larger register before moving anything into them for comparison.

What about cmp al, ah ? Would it be better switch to using the full registers? What about, like in this case there's no free register avaliable? I can save one register to the stack while I make my compariosns (anywhere from 1 to n where the average might be about 10, would this still be worthwhile?

I have some benchmarking code from somone on this board, but I couldn't figure out how to work it:

mov eax, 0
cpuid   ; to serialize the instructions
rdtsc
mov [timeLo], eax
mov [timeHi], edx

...  ; your code

mov eax, 0
cpuid   ; to serialize the instructions
rdtsc
sub [timeLo], eax
sbc [timeHi], edx

I replace the [timeLO/Hi] with other registers, but the compiler doesn't like the last line of code with sbc.

Thanks,
-Sandra
0
Comment
Question by:Sandra-24
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
4 Comments
 
LVL 3

Author Comment

by:Sandra-24
ID: 10906000
What about for add/sub/inc/dec ops?
0
 
LVL 22

Accepted Solution

by:
grg99 earned 1000 total points
ID: 10907110
The main thing to remember is:  in 32-bit mode, ANY reference to a 16-bit quantity is going to cost you.  Any 16-bit operation is flagged by an extra prefix byte (0x66).  This prefix byte has serious repercussions:

(1)  It's an extra op-code byte, so it increases the instruction length.

(2)  It prevents the instruction from "pairing" and running concurrently with another instruction (on most Pentiums).

So that can be up to a 50% penalty.

So you're correct, use 32-bit instructions in 32-bit code, 16 in 16-bit code, as much as possible.  

*BUT* there's a whole nother set of rules regarding byte-sized registers.   Accessing these doesnt require a prefix byte.  But that doesnt mean it's cheap either.  It varies with CPU model, but at least for the old Pentiums, accessing a byte part of a register can cause all kinds of strange delays.  For example, there is some bizarre rule that accessing a byte register stalls some CPU actions up to two cycles away!

So there too I'd stay away from accesing byte registers.  But depending on the frequency of access, it may not be worthwhile wasting time clearing or sign-extending bytes to words or dwords.   Each case is different, and it's also different across CPU models, so you'll just have to time the code and see.

I don't see anything obviously wrong with the timing code, perhaps you could give more info?

0
 
LVL 11

Assisted Solution

by:dimitry
dimitry earned 1000 total points
ID: 10908576
1) It should be sbb (sub with borrow)...
--------------------------------------------------------------------------------
mov eax, 0
cpuid   ; to serialize the instructions
rdtsc
mov [timeLo], eax
mov [timeHi], edx

...  ; your code

mov eax, 0
cpuid   ; to serialize the instructions
rdtsc
sub [timeLo], eax
sbb [timeHi], edx

2) Rick Booth in his "Inner Loops" book recommends next things, for example:
  Replace
    movzx eax, bl
  with
    xor eax, eax
    mov al, bl
So I am 100% agree with grg99 that you need to try to use 32-bit commands with 32-bit registers
and 16-bit with 16-bit and not mess with them together.
0
 
LVL 3

Author Comment

by:Sandra-24
ID: 10910715
Interesting. So using byte ops is iffy, and should be measured in each scenario where it matters. Never would have guessed movzx is inferior to xor/mov combo, I've used that in a few inner loops that I could change.

Thanks also for fixing that benchmark code.

-Sandra
0

Featured Post

Free Tool: Site Down Detector

Helpful to verify reports of your own downtime, or to double check a downed website you are trying to access.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Are you looking for the options available for exporting EDB files to PST? You may be confused as they are different in different Exchange versions. Here, I will discuss some options available.
As much as Microsoft wants to kill off PST file support, just as they tried to do with public folders, there are still times when it is useful or downright necessary to export Exchange mailboxes to PST files. Thankfully, it is still possible to e…
This course is ideal for IT System Administrators working with VMware vSphere and its associated products in their company infrastructure. This course teaches you how to install and maintain this virtualization technology to store data, prevent vuln…
Want to learn how to record your desktop screen without having to use an outside camera. Click on this video and learn how to use the cool google extension called "Screencastify"! Step 1: Open a new google tab Step 2: Go to the left hand upper corn…
Suggested Courses

636 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question