SIMD - Advantages/Disadvantages and the way to go ....

hi all :)

recently i got a cpu, which is able to use sse/sse2 .. since my projects are mainly
3D & physics - simulations, i started playing around with it a little and tried to
see, what advantages/disadvantages coming up from implementing sse into my basic
layers. i did lots of benchmarking and found, that basically vector-normalization
and matrix-transformation of vector-arrays really have a time improvement. sure both
are very important for my kind of librarys ..

so i have an important decision to make .. it affects all my libs and apps, since data
must be prepared for that functions and there would be no longer vector3 & vector4 - types,
each must be replaced with a homogeneous vector4 and 3x3 rotation matrices must be replaced
with 4x4 matrices

here are my pro & contras i see so far:

* prepared for the future ?!?
* time improvement

* data must be aligned to 16 byte and must fit into a 128-bit register
* to have a consistent library, i have to use vectors with 4 components always, even
   if i only need three components, the same for 3x3 matrices
* code-maintenance is more complex at the lower layer, since some functions are
   implemented in 2 ways
* library-runtime-checking for sse and set functionpointer to decide, which function
   to use, with or without sse
* pure c/c++ - code seems longer to be valid and is cpu-independent, and fpu's are
   getting faster
* increasing memory-size, but thats not really a point for me in these days ..

here are my benchmarks on win with vc71 and pentium4

V3_NORMALIZE            23%
V3_LENGTH_SQR        -11%
V3_LENGTH                   5%
V3_ADD                       -3%
V3_SUB                       -1%
V3_MUL                       -1%
V3_DOTPRODUCT         -3%
V4_DOTPRODUCT         -0%
M_MUL_V                      3%
M_BATCH_MUL_V         22%

M_MUL_V            -> vector4     = matrix44 * vector4
M_BATCH_MUL_V -> vector4[n] = matrix44 * vector4[n]

the processors to use are mainly intel & athlon 32-bit & 64-bit
platforms are win & linux

so my questions are:

1. did you face the same question, and how did you decide? what was your pro & contras

2. i'd like to have a discussion, to see some aspects i didn't see yet or that way..
    not only including time-improvement
actually my intuition tells me, its too much costs. but i think its an important decision,
so i'd like to have as much input as i can

so thanks for input in advance :)

LVL 20
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Try to play with compiler optimization settings. Compilers make optimizations for specific processor types, and can generate SSE code. This can improve program performance.
If you want to use SSE, use compiler intrinsics instead of Assembly if they are availble in your compiler.
From my experience, using Assembly gives minimal anvantage over optimized C++ code. I think using SSE and other low-level technologies is important for library developers (like OpenGL or Intel's IPL and PPL), and not so important for application developers.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
Game Programming

From novice to tech pro — start learning today.

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.