?
Solved

Comparing dwords, words, and bytes

Posted on 2004-04-23
4
Medium Priority
?
319 Views
Last Modified: 2012-05-04
I've seen that switching cmp ax, dx with cmp eax, edx can make for huge speed improvements (I measured 40% faster in one of my algorithims).

Of course you need to zero the unused bytes of the larger register before moving anything into them for comparison.

What about cmp al, ah ? Would it be better switch to using the full registers? What about, like in this case there's no free register avaliable? I can save one register to the stack while I make my compariosns (anywhere from 1 to n where the average might be about 10, would this still be worthwhile?

I have some benchmarking code from somone on this board, but I couldn't figure out how to work it:

mov eax, 0
cpuid   ; to serialize the instructions
rdtsc
mov [timeLo], eax
mov [timeHi], edx

...  ; your code

mov eax, 0
cpuid   ; to serialize the instructions
rdtsc
sub [timeLo], eax
sbc [timeHi], edx

I replace the [timeLO/Hi] with other registers, but the compiler doesn't like the last line of code with sbc.

Thanks,
-Sandra
0
Comment
Question by:Sandra-24
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
4 Comments
 
LVL 3

Author Comment

by:Sandra-24
ID: 10906000
What about for add/sub/inc/dec ops?
0
 
LVL 22

Accepted Solution

by:
grg99 earned 1000 total points
ID: 10907110
The main thing to remember is:  in 32-bit mode, ANY reference to a 16-bit quantity is going to cost you.  Any 16-bit operation is flagged by an extra prefix byte (0x66).  This prefix byte has serious repercussions:

(1)  It's an extra op-code byte, so it increases the instruction length.

(2)  It prevents the instruction from "pairing" and running concurrently with another instruction (on most Pentiums).

So that can be up to a 50% penalty.

So you're correct, use 32-bit instructions in 32-bit code, 16 in 16-bit code, as much as possible.  

*BUT* there's a whole nother set of rules regarding byte-sized registers.   Accessing these doesnt require a prefix byte.  But that doesnt mean it's cheap either.  It varies with CPU model, but at least for the old Pentiums, accessing a byte part of a register can cause all kinds of strange delays.  For example, there is some bizarre rule that accessing a byte register stalls some CPU actions up to two cycles away!

So there too I'd stay away from accesing byte registers.  But depending on the frequency of access, it may not be worthwhile wasting time clearing or sign-extending bytes to words or dwords.   Each case is different, and it's also different across CPU models, so you'll just have to time the code and see.

I don't see anything obviously wrong with the timing code, perhaps you could give more info?

0
 
LVL 11

Assisted Solution

by:dimitry
dimitry earned 1000 total points
ID: 10908576
1) It should be sbb (sub with borrow)...
--------------------------------------------------------------------------------
mov eax, 0
cpuid   ; to serialize the instructions
rdtsc
mov [timeLo], eax
mov [timeHi], edx

...  ; your code

mov eax, 0
cpuid   ; to serialize the instructions
rdtsc
sub [timeLo], eax
sbb [timeHi], edx

2) Rick Booth in his "Inner Loops" book recommends next things, for example:
  Replace
    movzx eax, bl
  with
    xor eax, eax
    mov al, bl
So I am 100% agree with grg99 that you need to try to use 32-bit commands with 32-bit registers
and 16-bit with 16-bit and not mess with them together.
0
 
LVL 3

Author Comment

by:Sandra-24
ID: 10910715
Interesting. So using byte ops is iffy, and should be measured in each scenario where it matters. Never would have guessed movzx is inferior to xor/mov combo, I've used that in a few inner loops that I could change.

Thanks also for fixing that benchmark code.

-Sandra
0

Featured Post

Want to be a Web Developer? Get Certified Today!

Enroll in the Certified Web Development Professional course package to learn HTML, Javascript, and PHP. Build a solid foundation to work toward your dream job!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

The Summer 2017 Scholarship Winners have been announced!
In today's business world, data is more important than ever for informing marketing campaigns. Accessing and using data, however, may not come naturally to some creative marketing professionals. Here are four tips for adapting to wield data for insi…
This tutorial will teach you the special effect of super speed similar to the fictional character Wally West aka "The Flash" After Shake : http://www.videocopilot.net/presets/after_shake/ All lightning effects with instructions : http://www.mediaf…
How to fix incompatible JVM issue while installing Eclipse While installing Eclipse in windows, got one error like above and unable to proceed with the installation. This video describes how to successfully install Eclipse. How to solve incompa…
Suggested Courses
Course of the Month14 days, 13 hours left to enroll

771 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question