• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 1548
  • Last Modified:

implementing horizontal add for sse2?

Dear all,
 
I know that sse3 has haddps. However, currently I am implementing horizontal add for sse2-only CPU.  My code is:
 
 __asm pshufd  xmm6, xmm7,  00110001b;  
  __asm addps  xmm7, xmm6;
  __asm pshufd  xmm6, xmm7,  00000010b;  
  __asm addss   xmm6, xmm7;
 
however, addps and addss  are slow instruction with latency of 5.
Inserting the above code makes my program very slow, about 15% slower.
Is there a better way to code for horizontal add?
 
thank you.
0
hengck23
Asked:
hengck23
1 Solution
 
dimitryCommented:
In the next document Intel guys wrote next:
http://www.intel.com/technology/itj/2004/volume08issue01/art01_microarchitecture/vol8iss1_art01.pdf

The most common operation performed in a vertex shader is the scalar product, where 3 (or 4) pairs of
single-precision data elements are multiplied and the 3 (or 4) results summed. Due to the AOS organization of
the vertex database, evaluating the scalar product can be challenging with SSE because of the lack of horizontal
instructions. We have added horizontal floating-point addition/subtraction instructions to speed up the evaluation of scalar products.

Code with SSE3:
mulps xmm0, xmm1
haddps xmm0, xmm0
haddps xmm0, xmm0

Code without SSE3:
mulps xmm0, xmm1
movaps xmm1, xmm0
shufps xmm0, xmm1, 0xb1
addps xmm0, xmm1
movaps xmm1, xmm0
shufps xmm0, xmm0, 0x0a
addps xmm0, xmm1

Hope it helps...
0
 
mbizupCommented:
No comment has been added to this question in more than 21 days, so it is now classified as abandoned.

I will leave the following recommendation for this question in the Cleanup topic area:
    Accept: dimitry {http:#12552123}

Any objections should be posted here in the next 4 days. After that time, the question will be closed.

mbizup
EE Cleanup Volunteer
No comment has been added to this question in more than 21 days, so it is now classified as abandoned.

I will leave the following recommendation for this question in the Cleanup topic area:
    Accept: dimitry {http:#12552123}

Any objections should be posted here in the next 4 days. After that time, the question will be closed.

mbizup
EE Cleanup Volunteer
0

Featured Post

Hire Technology Freelancers with Gigs

Work with freelancers specializing in everything from database administration to programming, who have proven themselves as experts in their field. Hire the best, collaborate easily, pay securely, and get projects done right.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now