Solved

# Non-linear check for a particular integer in an array

Posted on 2007-11-16
265 Views
Hey guys,

I have a function that loops through an int array and returns true and breaks out if it finds a 0.

However, is there any way to improve it and detect the 0 using only ONE comparison, to reduce overhead?
``````int LookForOne (int n , int x[]) {

int i;

for (i = 0; i < n; i++) {

if (x[i] == 0) {

return (TRUE);

}

}

return (FALSE);

}
``````
0
Question by:Koraxen
• 14
• 7
• 7
• +3

LVL 53

Expert Comment

Unless you know where the 0 is to be found, or unless the data is ordered in a certain way, you can't really speed this up.
0

LVL 53

Expert Comment

btw : what is this function supposed to do ? Don't you want to return the index of the 0 ?
0

Author Comment

Just return true or false, if it's there or not.

I was wondering if you could possibly do some math (add / subtract) to improve it.
0

LVL 45

Accepted Solution

Kdo earned 100 total points
Hi Koraxen,

Interesting question.  :)

You're going to have to do at least two compares.  One for the loop iterator, and another for the actual data.

But a bit of cleverness can reduce the number of loop iterators that you need to compare in exchange for slightly more programming.

Assume that you know the data well enough that you expect the loop to iterate over dozens (or even hundreds) of objects.  The conventional loop looks like this:

int FindZero (int   *Array, int *Length)
{
int idx;

for (idx = 0; idx < Length; ++idx)
if (Array[idx] == 0)
return idx;
return -1
}

You may get better performance by checking multiple items per iteration.

int FindZero (int   *Array, int *Length)
{
int idx;
int Limit;
int *Item;

//  Process most of the array in 8 word chunks

Limit = Length && ~3;
Item = Array;
for (idx = 0; idx < Limit; idx += 8)
{
if (*(Item++) == 0)
return Limit;
if (*(Item++) == 0)
return Limit + 1;
if (*(Item++) == 0)
return Limit + 2;
if (*(Item++) == 0)
return Limit + 3;
if (*(Item++) == 0)
return Limit+ 4;
if (*(Item++) == 0)
return Limit + 5;
if (*(Item++) == 0)
return Limit + 6;
if (*(Item++) == 0)
return Limit + 7;
}

//  Check the items after the last 8 word chunk

for (idx = Limit; idx < Length; ++idx)
if (Array[idx] == 0)
return idx;

return -1
}

And there may be minor improvements in this by consistently using array notation or consistently dereferencing the array as a pointer, but this gives you the ida.

Good Luck,
Kent
0

LVL 53

Expert Comment

btw, Kdo's technique is called "loop unrolling". You still have to do a compare for each int in the array though.
0

LVL 45

Expert Comment

Hi Koraxen,

And there's no charge for the typos.  That's called "fat fingers".   :)

Good Luck,
Kent
0

LVL 53

Expert Comment

>> Just return true or false, if it's there or not.

What I meant was : what will this function be used for ? When will you call it ? How often ? For what purpose ?

There might be other ways of optimizing this.
0

LVL 5

Expert Comment

If you AND all values in the array and the result is nonzero, there cannot be a zero in the array. However, this is not true the other way round.

Modern CPUs have a feature called "branch prediction" that, together with pipelining, eliminates much of the overhead of a conditional jump -- which is needed to handle the "if" construct. Try to benchmark your loop vs. a loop that ANDs all elements and checks the result, like in my code snippet. I'm afraid the run time will be dominated by the time needed to fetch the array values, at least for large arrays bigger than the L2 cache size.

Moreover, for example, x86 CPUs have special loop instructions that can implement tight loops that check arrays.. If my memory doesn't fail me, the core of your loop can be implemented using one (!) assembly instruction like

REPNE SCASD

with a consecutive check if the end of the array has been reached. The instruction reads like "REPeat while Not Equal (zero) SCAn the String Double-word-wise"

If you have a good compiler, it will issue an instruction like this for your code.

As Infinity08 already said, without pre-sorting or some other preprocessing, no substantial speed-ups are possible.
``````int LookForOneMaybe(int n , int x[]) {

int i;

int res = -1;

for (i = 0; i < n; i++) {

res &= x[i];

}

if (res != 0) {

return (FALSE);

}

return (MAYBE);

}
``````
0

LVL 84

Expert Comment

If you can multiply all the values in the array without overflow, then the result will be 0 iff there was a 0 in the array.
But that way doesn't reduce overhead.
0

Author Comment

Following ozo's train of thought, I came up with this... the multiplication way:
``````int CheckForOne(int x[], int n) {

long int test = 1;

int i;

int temp = 1;

int retVal = 1;

for (i = 0; i < n; i++) {

temp = test;

test = test * x[i];

while (test > temp) {

test--;

}

printf("test = %d\n",test);

}

if ( test != 0 ) {

retVal = 0;

}

return retVal;

}
``````
0

LVL 53

Expert Comment

I thought you wanted less comparisons ? You've now got MORE comparisons, and a lot more extra code. Even multiplications that are a lot more costly than a comparison.
0

Author Comment

You're right... the while acts as a conditional. Hmm... I'll have to keep looking into it.
0

LVL 84

Expert Comment

while (test > temp) {
test--;
}
that could be a lot of comparisons
And if test ever goes negative, it would stay negative even if it later sees x[i] = 0
0

LVL 84

Assisted Solution

ozo earned 100 total points
You could convert all non-0 values to 1 with something like
unsigned int temp = x[i];
temp |= temp>>1;
temp |= temp>>2;
temp |= temp>>4;
temp |= temp>>8;
temp |= temp>>16;
temp |= temp>>32; // depending on sizeof(int)*CHAR_BIT;
temp &=1;
then
test *= temp is guaranteed not to overflow, or test &= temp can also be used
0

LVL 53

Expert Comment

Do you really think that'll be faster, ozo ?
0

LVL 84

Expert Comment

No. I said it reduces comparisons bit not overhhead
Although it could be faster than
while (test > temp) {
test--;
}
0

LVL 84

Expert Comment

If ! doesn't count as a "comparison"
test |= !x[i];
0

LVL 5

Expert Comment

ozo: ! counts as a comparison on many architectures. However, your trick using five or six shifts is amazing. It could be combined with my suggestion, see below.

The problem with the code is that the data dependencies are tight: Each access to temp depends on the previous line. This kind of makes the CPU's pipeline useless. A good compiler would do some loop unrolling and interleaving, like I did in the second example. Only possible if there are enough CPU registers available.

Anyway, I suggest to do some benchmarks to check if the effort pays off.

Koraxen: Is multithreading an option for you? Multi-core CPUs become more and more common, and it should be easy to parallelize the code. Just need to check which access pattern would be best:

- Thread 1 works on even indexes of [0..N), thread 2 works on odd indexes of [0..N)
``````int LookForOne(int n , int x[]) {

int i;

unsigned int res = -1;

for (i = 0; i < n; i++) {

unsigned int temp = x[i];

temp |= temp>>1;

temp |= temp>>2;

temp |= temp>>4;

temp |= temp>>8;

temp |= temp>>16;

temp |= temp>>32; // depending on sizeof(int)*CHAR_BIT;

// Now, the LSB of temp is set iff x[i] <> 0

// We use a "full" AND for res,

// it's one machine instruction anyway,

// but we check the LSB after the loop

res &= temp;

}

if ( (res & 1) ) {

return (FALSE);

}

return (TRUE);

}

int LookForOneUnrolled(int n , int x[]) {

int i;

unsigned int res = -1;

for (i = 0; i < n; i += 6) {

unsigned int temp1 = x[i];

unsigned int temp2 = x[i+1];

unsigned int temp3 = x[i+2];

unsigned int temp4 = x[i+3];

unsigned int temp5 = x[i+4];

unsigned int temp6 = x[i+5];

temp1 |= temp1>>1;

temp2 |= temp2>>1;

temp3 |= temp3>>1;

temp4 |= temp4>>1;

temp5 |= temp5>>1;

temp6 |= temp6>>1;

temp1 |= temp1>>2;

temp2 |= temp2>>2;

temp3 |= temp3>>2;

temp4 |= temp4>>2;

temp5 |= temp5>>2;

temp6 |= temp6>>2;

temp1 |= temp1>>4;

temp2 |= temp2>>4;

temp3 |= temp3>>4;

temp4 |= temp4>>4;

temp5 |= temp5>>4;

temp6 |= temp6>>4;

temp1 |= temp1>>8;

temp2 |= temp2>>8;

temp3 |= temp3>>8;

temp4 |= temp4>>8;

temp5 |= temp5>>8;

temp6 |= temp6>>8;

temp1 |= temp1>>16;

temp2 |= temp2>>16;

temp3 |= temp3>>16;

temp4 |= temp4>>16;

temp5 |= temp5>>16;

temp6 |= temp6>>16;

temp1 |= temp1>>32;

temp2 |= temp2>>32;

temp3 |= temp3>>32;

temp4 |= temp4>>32;

temp5 |= temp5>>32;

temp6 |= temp6>>32;

temp1 &= temp2;

temp3 &= temp4;

temp5 &= temp6;

res &= temp1;

temp3 &= temp5;

res &= temp3;

}

if ( (res & 1) ) {

return (FALSE);

}

return (TRUE);

}
``````
0

LVL 84

Expert Comment

If you have an extra slot at the end of the array, you can eliminate the  i < n comparison
x[n]=0;
for( int *p=&n[0]; *p++; ){}
return( p <= &x[n] );
If your machine has hardware instuctions that strlen takes advantage of, it could be less overhead to use it to scan for candidate 0's dependng on how the int values are distributed.
0

LVL 5

Expert Comment

ozo: And even if there is no spare slot, you could abuse the last element as watchdog.

Koraxen: What is your target architecture, anyway? :-)
``````if (n == 0) return FALSE;

if (x[n-1] == 0) return TRUE;

int save = x[n-1];

x[n-1] = 0;

// ...

x[n-1] = save;

return RETVAL;
``````
0

LVL 53

Expert Comment

>> Anyway, I suggest to do some benchmarks to check if the effort pays off.

I'm pretty sure it won't. Be serious. The original code was a loop with 1 compare in it. Now you've made it a loop with multiple shifts, ors and movs.

I thought the goal was "to reduce overhead" ?
0

LVL 53

Expert Comment

Koraxen, have you had the time to answer my earlier questions ?

>> What will this function be used for ? When will you call it ? How often ? For what purpose ?
0

LVL 5

Expert Comment

infinity08: Not quite right: The original loop had TWO compares, the watchdog suggestion by ozo reduced it to one compare. And still, conditional jumps _might_ be more expensive than a couple of arithmetic operation, though I doubt it -- branch prediction should do a good job for this particular loop.
0

LVL 53

Assisted Solution

Infinity08 earned 100 total points
>> Not quite right: The original loop had TWO compares, the watchdog suggestion by ozo reduced it to one compare.

Yes, but as I said : it replaced 1 compare with multiple shifts, ors and movs. I doubt that that will be faster on any machine.

Btw, when the compiler does a good job, it will generate code that has only two jumps : 1 for the loop, and one to get out of the loop when a 0 is found. The first will obviously be taken multiple times, but the second will only be taken once. It might look something like this (pseudo assembler) :

``````      sl n, 2            ; n <<= 2

add n, x           ; n += x

lab1: cmp x, n           ;

jge lab2           ; if (x >= n) goto lab2

cmp 0, (x)         ;

je lab3            ; if (0 == *x) goto lab3

jmp lab1           ; goto lab1

lab2: ret 0              ; return 0

lab3: ret 1              ; return 1
``````
0

LVL 53

Expert Comment

It is very difficult to optimize generated assembler like that without having more information about the contents of the x array (for example a sorted array).
0

LVL 5

Expert Comment

infinity08: Again, no. Adding a watchdog and doing the nifty shift tricks do not affect each other.
``````int LookForOneWatchdog(int n, int x[]) {

int i;

if (n == 0) return (FALSE);

int watchdog = x[n-1];

int ret = FALSE;

for (i = 0; ; i++) {

if (x[i] == 0) {

ret = TRUE;

break;

}

}

x[n-1] = watchdog;

return ret;

}
``````
0

LVL 53

Expert Comment

>> infinity08: Again, no. Adding a watchdog and doing the nifty shift tricks do not affect each other.

I'm not sure what you mean by that, but I'd love you to show me how you can optimize assembler like what I posted.

btw, this is not true :

>> the watchdog suggestion by ozo reduced it to one compare.

because you still need a compare for the position at the end. Plus an extra compare at the beginning for n. That's 3 compares.

Btw, the code you posted will not work for several reasons :

1) what if there's no 0 at all in any memory location after the beginning of the array ? You'll have a pretty long loop that might even crash the program.

2) what if there isn't a 0 inside the array, but there is one AFTER the array ? You'll return TRUE anyway.
0

LVL 5

Assisted Solution

KirillMueller earned 100 total points
I'm so sorry for the confusion. I forgot one line in my source code. Hope it is clear now. A "watchdog" is a dummy element appended to the array to make sure that an iteration over the array stops, even if no comparison for the index is performed.

As I mentioned in one of my previous posts, the most efficient code on i386 for this task is

REPNE SCASD

together with some initialization and termination code.

I have posted optimized code that can be compiled directly to code that uses only one conditional jump in the loop (plus three conditional jumps outside the loop)..
``````int LookForOneWatchdog(int n, int x[]) {

int i;

if (n == 0) return (FALSE);

int watchdog = x[n-1];

x[n-1] = 0;            // forgot this line

int ret = FALSE;

for (i = 0; ; i++) {

if (x[i] == 0) {

ret = TRUE;

break;

}

}

x[n-1] = watchdog;

return ret;

}

int LookForOneWatchdogOptimized(int n, int x[]) {

int i;

if (n == 0) return (FALSE);

if (x[n-1] == 0) return (TRUE);

int watchdog = x[n-1];

x[n-1] = 0;

int ret = FALSE;

for (i = 0; ; i++) {

if (x[i] == 0) {

break;

}

}

x[n-1] = watchdog;

return (i < (n - 1)) ? TRUE : FALSE;

}
``````
0

LVL 53

Expert Comment

>> I forgot one line in my source code.

In other words : it will always return TRUE now, unless n == 0. That doesn't meet the requirements either ;) You need to add a few more compares.
And what if the value at n-1 is already 0 ?

No matter how you look at it, adding a watchdog, will get you MORE compares, because you need to check for more special cases.

>> I have posted optimized code that can be compiled directly to code that uses only one conditional jump in the loop (plus three conditional jumps outside the loop)..

But the overall performance will most likely still be worse than the original code.

Anyway ... let's stop here ... I'll be waiting for Koraxen's feedback.
0

LVL 5

Expert Comment

Infinity08: Right, the first version always returns TRUE. Doh... But the second routine should be correct. As for the number of compares, the original code takes 2*n compares (two in each step of the loop) , while my routine takes n+3 compares.

I'll be waiting for the author'S feedback, too.
0

LVL 53

Expert Comment

>> But the second routine should be correct.

Sorry ... missed the second one ;)
0

LVL 39

Assisted Solution

itsmeandnobodyelse earned 100 total points
You can do it with one comparision if the array is sorted and contains only positive integers.

So the statement is trivial, it might give you a way to optimize: add new entries to the array so that it keeps always sorted.

#define BLOCK_SIZE 64
void insertToArray(int ** parr, int * ps, int n, int i)
{
int * p;
int b, e, m;
int s = *ps;
if (n >= s)
{
p = (int*)malloc((s+BLOCK_SIZE)*sizeof(int));
if (s > 0)
{
memcpy(p, *parr, s*sizeof(int));
free(*parr);
}
memset(p+s, 0, BLOCK_SIZE*sizeof(int));
*parr = p;
*ps = s + BLOCK_SIZE;
n = s;
}
p = *parr;
b = 0;
e = n-1;
while (b <= e)
{
m = (b + e)>>1;  // middle
if (i < *(p + m))
e = m - 1;
else
b = m + 1;
}
/* now b is the pos to insert */
if (n > b)  /* use memmove  for overlapping copy */
memmove(p+b+1, p + b, (n - b)*sizeof(int));
*(p + b) = i;
}

Note, if you have negative integers you may do a binary search at the sorted array (look at the while loop) to check for 0 entry.

Regards, Alex
0

Author Comment

Your ideas and work is very much appreciated guys. What I'm going to consolidate these ideas and submit/present them. Thanks.
0

Author Comment

The question doesn't specify when this function is called, or what architecture it will be used on. The only requirement was to get around the conditional. I assume multi-threading was not an option either.

Again, thank you.
0

LVL 53

Expert Comment

>> The only requirement was to get around the conditional.

I still don't get the point though ... Was this an assignment ? If so, you should have said so from the beginning, and we could have helped you a lot better ...
0

LVL 39

Expert Comment

Did you ever heard of qbits? Whenever that term is 'wandering like a ghost' in some newpaper or magazine, they will tell you (nearly with identical words) that with a few qbits you may do an arbitrary number of  computations same time. I actually think it is a fake to get money from the faithful ones as even with quants you can not overcome human logics. But if I am wrong that is the way you should go on.

Regards, Alex
0

LVL 84

Expert Comment

I haven't heard that anyone hat yet built a working quantum computer with more than 28 qbits, not to mention implementing a C compiler on one, but if you had your list in a quantum computer you may be able to search it in O(sqrt(n)) instead of O(n)
http://en.wikipedia.org/wiki/Grover's_algorithm
0

## Featured Post

### Suggested Solutions

Windows programmers of the C/C++ variety, how many of you realise that since Window 9x Microsoft has been lying to you about what constitutes Unicode (http://en.wikipedia.org/wiki/Unicode)? They will have you believe that Unicode requires you to useâ€¦
This article will show, step by step, how to integrate R code into a R Sweave document
The viewer will learn how to implement Singleton Design Pattern in Java.
This video teaches viewers about errors in exception handling.