Link to home
Start Free TrialLog in
Avatar of milosr
milosr

asked on

Fast Array Bitwise OR?

I 've got one array with bit pattern, and other array which I would like to Bitwise OR with pattern.
Are there instructions (MMX, SSE, ...) which allow to do this fast, (like REP MOVSD for fast array copy).

This is part of code in C that I want to optimize:

                  for(i=0; i<iSize; i++) {
                        pArray[i] |= pattern[i % iPatternSize];
                  }

Thanks,
Milos
ASKER CERTIFIED SOLUTION
Avatar of mzvika
mzvika

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of grg99
grg99

it's not going to make much difference-- or'ing is something CPU's do very quickly-- much quicker than the typical RAM.

There's no point in speeding up code that's already many times faster than the memory bus can handle.

.
right, so we want to minimize memory access.
why read 1 byte at a time when we can read 32 bytes... ?
Most PC memory buses are only 4 or 8 bytes wide.   So there's not much gain in reading more than that at one gulp.

It is cool to think of reading 128 bits at a time!    Wow!

well, they didn't invent SSE2 for nothing... 8-)
processors that support SSE2 have sufficient FSB width (athlon 64 -> 128bit, some pentium 4 -> 256 bit).
there's even the newer SSE3.

on systems with "only" 64 bit wide busses you can use MMX, SSE or 3DNOW!.
It all depends on which processor he wants to optimize for. obviously, it may not run on older processors.
Avatar of milosr

ASKER


mzvika is probably right, bitwise or with 128 bits at once is faster then, 4 bitwise or with 32 bits.
For small arrays (less than proc cash) that would be 4 times speedup. (I'm doing this operation lot of times)

However pArray is more than 10 MBytes long, so processor spends lot of times waiting
for data to load in cash. So speedup probably won't be much significant.

I will try to optimize code in another way. I have different small patterns which I apply to array,
so maybe to precalculate all patterns, then to aply them in parallel, with one iteration through pArray.

Thanks for spending your time.