Solved

Comparing dwords, words, and bytes

Posted on 2004-04-23
4
305 Views
Last Modified: 2012-05-04
I've seen that switching cmp ax, dx with cmp eax, edx can make for huge speed improvements (I measured 40% faster in one of my algorithims).

Of course you need to zero the unused bytes of the larger register before moving anything into them for comparison.

What about cmp al, ah ? Would it be better switch to using the full registers? What about, like in this case there's no free register avaliable? I can save one register to the stack while I make my compariosns (anywhere from 1 to n where the average might be about 10, would this still be worthwhile?

I have some benchmarking code from somone on this board, but I couldn't figure out how to work it:

mov eax, 0
cpuid   ; to serialize the instructions
rdtsc
mov [timeLo], eax
mov [timeHi], edx

...  ; your code

mov eax, 0
cpuid   ; to serialize the instructions
rdtsc
sub [timeLo], eax
sbc [timeHi], edx

I replace the [timeLO/Hi] with other registers, but the compiler doesn't like the last line of code with sbc.

Thanks,
-Sandra
0
Comment
Question by:Sandra-24
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
4 Comments
 
LVL 3

Author Comment

by:Sandra-24
ID: 10906000
What about for add/sub/inc/dec ops?
0
 
LVL 22

Accepted Solution

by:
grg99 earned 250 total points
ID: 10907110
The main thing to remember is:  in 32-bit mode, ANY reference to a 16-bit quantity is going to cost you.  Any 16-bit operation is flagged by an extra prefix byte (0x66).  This prefix byte has serious repercussions:

(1)  It's an extra op-code byte, so it increases the instruction length.

(2)  It prevents the instruction from "pairing" and running concurrently with another instruction (on most Pentiums).

So that can be up to a 50% penalty.

So you're correct, use 32-bit instructions in 32-bit code, 16 in 16-bit code, as much as possible.  

*BUT* there's a whole nother set of rules regarding byte-sized registers.   Accessing these doesnt require a prefix byte.  But that doesnt mean it's cheap either.  It varies with CPU model, but at least for the old Pentiums, accessing a byte part of a register can cause all kinds of strange delays.  For example, there is some bizarre rule that accessing a byte register stalls some CPU actions up to two cycles away!

So there too I'd stay away from accesing byte registers.  But depending on the frequency of access, it may not be worthwhile wasting time clearing or sign-extending bytes to words or dwords.   Each case is different, and it's also different across CPU models, so you'll just have to time the code and see.

I don't see anything obviously wrong with the timing code, perhaps you could give more info?

0
 
LVL 11

Assisted Solution

by:dimitry
dimitry earned 250 total points
ID: 10908576
1) It should be sbb (sub with borrow)...
--------------------------------------------------------------------------------
mov eax, 0
cpuid   ; to serialize the instructions
rdtsc
mov [timeLo], eax
mov [timeHi], edx

...  ; your code

mov eax, 0
cpuid   ; to serialize the instructions
rdtsc
sub [timeLo], eax
sbb [timeHi], edx

2) Rick Booth in his "Inner Loops" book recommends next things, for example:
  Replace
    movzx eax, bl
  with
    xor eax, eax
    mov al, bl
So I am 100% agree with grg99 that you need to try to use 32-bit commands with 32-bit registers
and 16-bit with 16-bit and not mess with them together.
0
 
LVL 3

Author Comment

by:Sandra-24
ID: 10910715
Interesting. So using byte ops is iffy, and should be measured in each scenario where it matters. Never would have guessed movzx is inferior to xor/mov combo, I've used that in a few inner loops that I could change.

Thanks also for fixing that benchmark code.

-Sandra
0

Featured Post

On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Original post  on Monitis Blog. Web performance monitoring is broken into two camps: passive and active. Passive monitoring is defined as looking at real-world historical performance by monitoring actual log-ins, site hits, clicks, requests for…
This article discusses how to implement server side field validation and display customized error messages to the client.
A short tutorial showing how to set up an email signature in Outlook on the Web (previously known as OWA). For free email signatures designs, visit https://www.mail-signatures.com/articles/signature-templates/?sts=6651 If you want to manage em…
I've attached the XLSM Excel spreadsheet I used in the video and also text files containing the macros used below. https://filedb.experts-exchange.com/incoming/2017/03_w12/1151775/Permutations.txt https://filedb.experts-exchange.com/incoming/201…

733 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question