Fast Array Bitwise OR?

Posted on 2005-04-11
Medium Priority
Last Modified: 2012-05-05
I 've got one array with bit pattern, and other array which I would like to Bitwise OR with pattern.
Are there instructions (MMX, SSE, ...) which allow to do this fast, (like REP MOVSD for fast array copy).

This is part of code in C that I want to optimize:

                  for(i=0; i<iSize; i++) {
                        pArray[i] |= pattern[i % iPatternSize];

Question by:milosr
  • 3
  • 2

Accepted Solution

mzvika earned 375 total points
ID: 13754142
you could use the SSE2 instructions, which operate on 128bit registers at a time.

MOVDQU                will load a double-quadword from memory to a 128bit register
db 066h POR          will perform bitwise OR between it's operands

(NOTE: the db 066h is part of the instruction. using just POR will operate on MMX 64 bit registers. we want 128bit).
LVL 22

Expert Comment

ID: 13754327
it's not going to make much difference-- or'ing is something CPU's do very quickly-- much quicker than the typical RAM.

There's no point in speeding up code that's already many times faster than the memory bus can handle.


Expert Comment

ID: 13754350
right, so we want to minimize memory access.
why read 1 byte at a time when we can read 32 bytes... ?
Vote for the Most Valuable Expert

It’s time to recognize experts that go above and beyond with helpful solutions and engagement on site. Choose from the top experts in the Hall of Fame or on the right rail of your favorite topic page. Look for the blue “Nominate” button on their profile to vote.

LVL 22

Expert Comment

ID: 13755245
Most PC memory buses are only 4 or 8 bytes wide.   So there's not much gain in reading more than that at one gulp.

It is cool to think of reading 128 bits at a time!    Wow!


Expert Comment

ID: 13755978
well, they didn't invent SSE2 for nothing... 8-)
processors that support SSE2 have sufficient FSB width (athlon 64 -> 128bit, some pentium 4 -> 256 bit).
there's even the newer SSE3.

on systems with "only" 64 bit wide busses you can use MMX, SSE or 3DNOW!.
It all depends on which processor he wants to optimize for. obviously, it may not run on older processors.

Author Comment

ID: 13762780

mzvika is probably right, bitwise or with 128 bits at once is faster then, 4 bitwise or with 32 bits.
For small arrays (less than proc cash) that would be 4 times speedup. (I'm doing this operation lot of times)

However pArray is more than 10 MBytes long, so processor spends lot of times waiting
for data to load in cash. So speedup probably won't be much significant.

I will try to optimize code in another way. I have different small patterns which I apply to array,
so maybe to precalculate all patterns, then to aply them in parallel, with one iteration through pArray.

Thanks for spending your time.

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Most folks would know the basics of how Dropbox works, so that’s not the purpose of this article. Security is what it’s all about, so here I’ll share how I choose to secure my Dropbox Account and the Data it contains.
Strategic internal linking is often considered an SEO power technique, especially for content marketing. Do you need to hire an SEO agency to optimize you internal linking? No, this article will help you understand the basics of internal linking and…
This video shows how to quickly and easily deploy an email signature for all users in Office 365 and prevent it from being added to replies and forwards. (the resulting signature is applied on the server level in Exchange Online) The email signat…
In a question here at Experts Exchange (https://www.experts-exchange.com/questions/29062564/Adobe-acrobat-reader-DC.html), a member asked how to create a signature in Adobe Acrobat Reader DC (the free Reader product, not the paid, full Acrobat produ…

839 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question