?
Solved

SIMD - Advantages/Disadvantages and the way to go ....

Posted on 2006-04-01
2
Medium Priority
?
1,751 Views
Last Modified: 2013-12-26
hi all :)

recently i got a cpu, which is able to use sse/sse2 .. since my projects are mainly
3D & physics - simulations, i started playing around with it a little and tried to
see, what advantages/disadvantages coming up from implementing sse into my basic
layers. i did lots of benchmarking and found, that basically vector-normalization
and matrix-transformation of vector-arrays really have a time improvement. sure both
are very important for my kind of librarys ..

so i have an important decision to make .. it affects all my libs and apps, since data
must be prepared for that functions and there would be no longer vector3 & vector4 - types,
each must be replaced with a homogeneous vector4 and 3x3 rotation matrices must be replaced
with 4x4 matrices

here are my pro & contras i see so far:

advantages
-------------
* prepared for the future ?!?
* time improvement

disadvantages
----------------
* data must be aligned to 16 byte and must fit into a 128-bit register
* to have a consistent library, i have to use vectors with 4 components always, even
   if i only need three components, the same for 3x3 matrices
* code-maintenance is more complex at the lower layer, since some functions are
   implemented in 2 ways
* library-runtime-checking for sse and set functionpointer to decide, which function
   to use, with or without sse
* pure c/c++ - code seems longer to be valid and is cpu-independent, and fpu's are
   getting faster
* increasing memory-size, but thats not really a point for me in these days ..

here are my benchmarks on win with vc71 and pentium4

V3_NORMALIZE            23%
V3_LENGTH_SQR        -11%
V3_LENGTH                   5%
V3_ADD                       -3%
V3_SUB                       -1%
V3_MUL                       -1%
V3_CROSSPRODUCT     -3%
V3_DOTPRODUCT         -3%
V4_DOTPRODUCT         -0%
M_MUL_V                      3%
M_BATCH_MUL_V         22%

M_MUL_V            -> vector4     = matrix44 * vector4
M_BATCH_MUL_V -> vector4[n] = matrix44 * vector4[n]

environment:
the processors to use are mainly intel & athlon 32-bit & 64-bit
platforms are win & linux


so my questions are:

1. did you face the same question, and how did you decide? what was your pro & contras

2. i'd like to have a discussion, to see some aspects i didn't see yet or that way..
    not only including time-improvement
   
   
actually my intuition tells me, its too much costs. but i think its an important decision,
so i'd like to have as much input as i can


so thanks for input in advance :)


ike
0
Comment
Question by:ikework
1 Comment
 
LVL 48

Accepted Solution

by:
AlexFM earned 2000 total points
ID: 16349686
Try to play with compiler optimization settings. Compilers make optimizations for specific processor types, and can generate SSE code. This can improve program performance.
If you want to use SSE, use compiler intrinsics instead of Assembly if they are availble in your compiler.
From my experience, using Assembly gives minimal anvantage over optimized C++ code. I think using SSE and other low-level technologies is important for library developers (like OpenGL or Intel's IPL and PPL), and not so important for application developers.
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

What is RenderMan: RenderMan is a not any particular piece of software. RenderMan is an industry standard, defining set of rules that any rendering software should use, to be RenderMan-compliant. Pixar's RenderMan is a flagship implementation of …
Performance in games development is paramount: every microsecond counts to be able to do everything in less than 33ms (aiming at 16ms). C# foreach statement is one of the worst performance killers, and here I explain why.
Look below the covers at a subform control , and the form that is inside it. Explore properties and see how easy it is to aggregate, get statistics, and synchronize results for your data. A Microsoft Access subform is used to show relevant calcul…
When cloud platforms entered the scene, users and companies jumped on board to take advantage of the many benefits, like the ability to work and connect with company information from various locations. What many didn't foresee was the increased risk…

807 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question