Solved

Comparing dwords, words, and bytes

Posted on 2004-04-23
4
312 Views
Last Modified: 2012-05-04
I've seen that switching cmp ax, dx with cmp eax, edx can make for huge speed improvements (I measured 40% faster in one of my algorithims).

Of course you need to zero the unused bytes of the larger register before moving anything into them for comparison.

What about cmp al, ah ? Would it be better switch to using the full registers? What about, like in this case there's no free register avaliable? I can save one register to the stack while I make my compariosns (anywhere from 1 to n where the average might be about 10, would this still be worthwhile?

I have some benchmarking code from somone on this board, but I couldn't figure out how to work it:

mov eax, 0
cpuid   ; to serialize the instructions
rdtsc
mov [timeLo], eax
mov [timeHi], edx

...  ; your code

mov eax, 0
cpuid   ; to serialize the instructions
rdtsc
sub [timeLo], eax
sbc [timeHi], edx

I replace the [timeLO/Hi] with other registers, but the compiler doesn't like the last line of code with sbc.

Thanks,
-Sandra
0
Comment
Question by:Sandra-24
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
4 Comments
 
LVL 3

Author Comment

by:Sandra-24
ID: 10906000
What about for add/sub/inc/dec ops?
0
 
LVL 22

Accepted Solution

by:
grg99 earned 250 total points
ID: 10907110
The main thing to remember is:  in 32-bit mode, ANY reference to a 16-bit quantity is going to cost you.  Any 16-bit operation is flagged by an extra prefix byte (0x66).  This prefix byte has serious repercussions:

(1)  It's an extra op-code byte, so it increases the instruction length.

(2)  It prevents the instruction from "pairing" and running concurrently with another instruction (on most Pentiums).

So that can be up to a 50% penalty.

So you're correct, use 32-bit instructions in 32-bit code, 16 in 16-bit code, as much as possible.  

*BUT* there's a whole nother set of rules regarding byte-sized registers.   Accessing these doesnt require a prefix byte.  But that doesnt mean it's cheap either.  It varies with CPU model, but at least for the old Pentiums, accessing a byte part of a register can cause all kinds of strange delays.  For example, there is some bizarre rule that accessing a byte register stalls some CPU actions up to two cycles away!

So there too I'd stay away from accesing byte registers.  But depending on the frequency of access, it may not be worthwhile wasting time clearing or sign-extending bytes to words or dwords.   Each case is different, and it's also different across CPU models, so you'll just have to time the code and see.

I don't see anything obviously wrong with the timing code, perhaps you could give more info?

0
 
LVL 11

Assisted Solution

by:dimitry
dimitry earned 250 total points
ID: 10908576
1) It should be sbb (sub with borrow)...
--------------------------------------------------------------------------------
mov eax, 0
cpuid   ; to serialize the instructions
rdtsc
mov [timeLo], eax
mov [timeHi], edx

...  ; your code

mov eax, 0
cpuid   ; to serialize the instructions
rdtsc
sub [timeLo], eax
sbb [timeHi], edx

2) Rick Booth in his "Inner Loops" book recommends next things, for example:
  Replace
    movzx eax, bl
  with
    xor eax, eax
    mov al, bl
So I am 100% agree with grg99 that you need to try to use 32-bit commands with 32-bit registers
and 16-bit with 16-bit and not mess with them together.
0
 
LVL 3

Author Comment

by:Sandra-24
ID: 10910715
Interesting. So using byte ops is iffy, and should be measured in each scenario where it matters. Never would have guessed movzx is inferior to xor/mov combo, I've used that in a few inner loops that I could change.

Thanks also for fixing that benchmark code.

-Sandra
0

Featured Post

Free Tool: Path Explorer

An intuitive utility to help find the CSS path to UI elements on a webpage. These paths are used frequently in a variety of front-end development and QA automation tasks.

One of a set of tools we're offering as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

What if you have to shut down the entire Citrix infrastructure for hardware maintenance, software upgrades or "the unknown"? I developed this plan for "the unknown" and hope that it helps you as well. This article explains how to properly shut down …
What do responsible coders do? They don't take detrimental shortcuts. They do take reasonable security precautions, create important automation, implement sufficient logging, fix things they break, and care about users.
If you’ve ever visited a web page and noticed a cool font that you really liked the look of, but couldn’t figure out which font it was so that you could use it for your own work, then this video is for you! In this Micro Tutorial, you'll learn yo…
This is my first video review of Microsoft Bookings, I will be doing a part two with a bit more information, but wanted to get this out to you folks.

717 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question