Non-linear check for a particular integer in an array

Hey guys,

I have a function that loops through an int array and returns true and breaks out if it finds a 0.

However, is there any way to improve it and detect the 0 using only ONE comparison, to reduce overhead?
int LookForOne (int n , int x[]) {
	int i;
 
	for (i = 0; i < n; i++) {
		if (x[i] == 0) {
			return (TRUE);
		}
	}
	return (FALSE);
}

Open in new window

KoraxenAsked:
Who is Participating?
 
Kent OlsenData Warehouse Architect / DBACommented:
Hi Koraxen,

Interesting question.  :)

You're going to have to do at least two compares.  One for the loop iterator, and another for the actual data.

But a bit of cleverness can reduce the number of loop iterators that you need to compare in exchange for slightly more programming.

Assume that you know the data well enough that you expect the loop to iterate over dozens (or even hundreds) of objects.  The conventional loop looks like this:

int FindZero (int   *Array, int *Length)
{
  int idx;

  for (idx = 0; idx < Length; ++idx)
    if (Array[idx] == 0)
      return idx;
  return -1
}

You may get better performance by checking multiple items per iteration.

int FindZero (int   *Array, int *Length)
{
  int idx;
  int Limit;
  int *Item;

//  Process most of the array in 8 word chunks  

  Limit = Length && ~3;        
  Item = Array;
  for (idx = 0; idx < Limit; idx += 8)
  {
    if (*(Item++) == 0)
      return Limit;
    if (*(Item++) == 0)
      return Limit + 1;
    if (*(Item++) == 0)
      return Limit + 2;
    if (*(Item++) == 0)
      return Limit + 3;
    if (*(Item++) == 0)
      return Limit+ 4;
    if (*(Item++) == 0)
      return Limit + 5;
    if (*(Item++) == 0)
      return Limit + 6;
    if (*(Item++) == 0)
      return Limit + 7;
  }

//  Check the items after the last 8 word chunk

  for (idx = Limit; idx < Length; ++idx)
    if (Array[idx] == 0)
      return idx;

  return -1
}

And there may be minor improvements in this by consistently using array notation or consistently dereferencing the array as a pointer, but this gives you the ida.



Good Luck,
Kent
0
 
Infinity08Commented:
Unless you know where the 0 is to be found, or unless the data is ordered in a certain way, you can't really speed this up.
0
 
Infinity08Commented:
btw : what is this function supposed to do ? Don't you want to return the index of the 0 ?
0
Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

 
KoraxenAuthor Commented:
Just return true or false, if it's there or not.

I was wondering if you could possibly do some math (add / subtract) to improve it.
0
 
Infinity08Commented:
btw, Kdo's technique is called "loop unrolling". You still have to do a compare for each int in the array though.
0
 
Kent OlsenData Warehouse Architect / DBACommented:
Hi Koraxen,

And there's no charge for the typos.  That's called "fat fingers".   :)



Good Luck,
Kent
0
 
Infinity08Commented:
>> Just return true or false, if it's there or not.

What I meant was : what will this function be used for ? When will you call it ? How often ? For what purpose ?

There might be other ways of optimizing this.
0
 
KirillMuellerCommented:
If you AND all values in the array and the result is nonzero, there cannot be a zero in the array. However, this is not true the other way round.

Modern CPUs have a feature called "branch prediction" that, together with pipelining, eliminates much of the overhead of a conditional jump -- which is needed to handle the "if" construct. Try to benchmark your loop vs. a loop that ANDs all elements and checks the result, like in my code snippet. I'm afraid the run time will be dominated by the time needed to fetch the array values, at least for large arrays bigger than the L2 cache size.

Moreover, for example, x86 CPUs have special loop instructions that can implement tight loops that check arrays.. If my memory doesn't fail me, the core of your loop can be implemented using one (!) assembly instruction like

REPNE SCASD

with a consecutive check if the end of the array has been reached. The instruction reads like "REPeat while Not Equal (zero) SCAn the String Double-word-wise"

If you have a good compiler, it will issue an instruction like this for your code.

As Infinity08 already said, without pre-sorting or some other preprocessing, no substantial speed-ups are possible.
int LookForOneMaybe(int n , int x[]) {
        int i;
 
        int res = -1;
 
        for (i = 0; i < n; i++) {
                res &= x[i];
        }
 
        if (res != 0) {
                return (FALSE);
        }
        return (MAYBE);
}

Open in new window

0
 
ozoCommented:
If you can multiply all the values in the array without overflow, then the result will be 0 iff there was a 0 in the array.  
But that way doesn't reduce overhead.
0
 
KoraxenAuthor Commented:
Following ozo's train of thought, I came up with this... the multiplication way:
int CheckForOne(int x[], int n) {
	long int test = 1;
	int i;
	int temp = 1;
	int retVal = 1;
	
	for (i = 0; i < n; i++) {
		temp = test;		
		test = test * x[i];
		while (test > temp) {
			test--;
		}
		printf("test = %d\n",test);
    }
 
	if ( test != 0 ) {
		retVal = 0;
	}
 
	return retVal;
}

Open in new window

0
 
Infinity08Commented:
I thought you wanted less comparisons ? You've now got MORE comparisons, and a lot more extra code. Even multiplications that are a lot more costly than a comparison.
0
 
KoraxenAuthor Commented:
You're right... the while acts as a conditional. Hmm... I'll have to keep looking into it.
0
 
ozoCommented:
               while (test > temp) {
                        test--;
                }
that could be a lot of comparisons
And if test ever goes negative, it would stay negative even if it later sees x[i] = 0
0
 
ozoCommented:
You could convert all non-0 values to 1 with something like
unsigned int temp = x[i];
temp |= temp>>1;
temp |= temp>>2;
temp |= temp>>4;
temp |= temp>>8;
temp |= temp>>16;
temp |= temp>>32; // depending on sizeof(int)*CHAR_BIT;
temp &=1;
then
test *= temp is guaranteed not to overflow, or test &= temp can also be used
0
 
Infinity08Commented:
Do you really think that'll be faster, ozo ?
0
 
ozoCommented:
No. I said it reduces comparisons bit not overhhead
Although it could be faster than
             while (test > temp) {
                        test--;
                }
0
 
ozoCommented:
If ! doesn't count as a "comparison"
test |= !x[i];
0
 
KirillMuellerCommented:
ozo: ! counts as a comparison on many architectures. However, your trick using five or six shifts is amazing. It could be combined with my suggestion, see below.

The problem with the code is that the data dependencies are tight: Each access to temp depends on the previous line. This kind of makes the CPU's pipeline useless. A good compiler would do some loop unrolling and interleaving, like I did in the second example. Only possible if there are enough CPU registers available.

Anyway, I suggest to do some benchmarks to check if the effort pays off.

Koraxen: Is multithreading an option for you? Multi-core CPUs become more and more common, and it should be easy to parallelize the code. Just need to check which access pattern would be best:

- Thread 1 works on [0..N/2), thread 2 works on [N/2..N)
- Thread 1 works on even indexes of [0..N), thread 2 works on odd indexes of [0..N)
int LookForOne(int n , int x[]) {
        int i;
 
        unsigned int res = -1;
 
        for (i = 0; i < n; i++) {
                unsigned int temp = x[i];
                temp |= temp>>1;
                temp |= temp>>2;
                temp |= temp>>4;
                temp |= temp>>8;
                temp |= temp>>16;
                temp |= temp>>32; // depending on sizeof(int)*CHAR_BIT;
 
                // Now, the LSB of temp is set iff x[i] <> 0
                // We use a "full" AND for res,
                // it's one machine instruction anyway,
                // but we check the LSB after the loop
                res &= temp;
        }
 
        if ( (res & 1) ) {
                return (FALSE);
        }
        return (TRUE);
}
 
 
 
 
 
int LookForOneUnrolled(int n , int x[]) {
        int i;
 
        unsigned int res = -1;
 
        for (i = 0; i < n; i += 6) {
                unsigned int temp1 = x[i];
                unsigned int temp2 = x[i+1];
                unsigned int temp3 = x[i+2];
                unsigned int temp4 = x[i+3];
                unsigned int temp5 = x[i+4];
                unsigned int temp6 = x[i+5];
 
                temp1 |= temp1>>1;
                temp2 |= temp2>>1;
                temp3 |= temp3>>1;
                temp4 |= temp4>>1;
                temp5 |= temp5>>1;
                temp6 |= temp6>>1;
 
                temp1 |= temp1>>2;
                temp2 |= temp2>>2;
                temp3 |= temp3>>2;
                temp4 |= temp4>>2;
                temp5 |= temp5>>2;
                temp6 |= temp6>>2;
 
                temp1 |= temp1>>4;
                temp2 |= temp2>>4;
                temp3 |= temp3>>4;
                temp4 |= temp4>>4;
                temp5 |= temp5>>4;
                temp6 |= temp6>>4;
 
                temp1 |= temp1>>8;
                temp2 |= temp2>>8;
                temp3 |= temp3>>8;
                temp4 |= temp4>>8;
                temp5 |= temp5>>8;
                temp6 |= temp6>>8;
 
                temp1 |= temp1>>16;
                temp2 |= temp2>>16;
                temp3 |= temp3>>16;
                temp4 |= temp4>>16;
                temp5 |= temp5>>16;
                temp6 |= temp6>>16;
 
                temp1 |= temp1>>32;
                temp2 |= temp2>>32;
                temp3 |= temp3>>32;
                temp4 |= temp4>>32;
                temp5 |= temp5>>32;
                temp6 |= temp6>>32;
 
                temp1 &= temp2;
                temp3 &= temp4;
                temp5 &= temp6;
                res &= temp1;
                temp3 &= temp5;
                res &= temp3;
        }
 
        if ( (res & 1) ) {
                return (FALSE);
        }
        return (TRUE);
}

Open in new window

0
 
ozoCommented:
If you have an extra slot at the end of the array, you can eliminate the  i < n comparison
x[n]=0;
for( int *p=&n[0]; *p++; ){}
return( p <= &x[n] );
If your machine has hardware instuctions that strlen takes advantage of, it could be less overhead to use it to scan for candidate 0's dependng on how the int values are distributed.
0
 
KirillMuellerCommented:
ozo: And even if there is no spare slot, you could abuse the last element as watchdog.

Koraxen: What is your target architecture, anyway? :-)
if (n == 0) return FALSE;
if (x[n-1] == 0) return TRUE;
int save = x[n-1];
x[n-1] = 0;
// ...
x[n-1] = save;
return RETVAL;

Open in new window

0
 
Infinity08Commented:
>> Anyway, I suggest to do some benchmarks to check if the effort pays off.

I'm pretty sure it won't. Be serious. The original code was a loop with 1 compare in it. Now you've made it a loop with multiple shifts, ors and movs.

I thought the goal was "to reduce overhead" ?
0
 
Infinity08Commented:
Koraxen, have you had the time to answer my earlier questions ?

>> What will this function be used for ? When will you call it ? How often ? For what purpose ?
0
 
KirillMuellerCommented:
infinity08: Not quite right: The original loop had TWO compares, the watchdog suggestion by ozo reduced it to one compare. And still, conditional jumps _might_ be more expensive than a couple of arithmetic operation, though I doubt it -- branch prediction should do a good job for this particular loop.
0
 
Infinity08Commented:
>> Not quite right: The original loop had TWO compares, the watchdog suggestion by ozo reduced it to one compare.

Yes, but as I said : it replaced 1 compare with multiple shifts, ors and movs. I doubt that that will be faster on any machine.

Btw, when the compiler does a good job, it will generate code that has only two jumps : 1 for the loop, and one to get out of the loop when a 0 is found. The first will obviously be taken multiple times, but the second will only be taken once. It might look something like this (pseudo assembler) :


      sl n, 2            ; n <<= 2
      add n, x           ; n += x
lab1: cmp x, n           ; 
      jge lab2           ; if (x >= n) goto lab2
      cmp 0, (x)         ;
      je lab3            ; if (0 == *x) goto lab3
      add x, 4           ; x++
      jmp lab1           ; goto lab1
lab2: ret 0              ; return 0
lab3: ret 1              ; return 1

Open in new window

0
 
Infinity08Commented:
It is very difficult to optimize generated assembler like that without having more information about the contents of the x array (for example a sorted array).
0
 
KirillMuellerCommented:
infinity08: Again, no. Adding a watchdog and doing the nifty shift tricks do not affect each other.
int LookForOneWatchdog(int n, int x[]) {
        int i;
 
        if (n == 0) return (FALSE);
 
        int watchdog = x[n-1];
        int ret = FALSE;
        for (i = 0; ; i++) {
                if (x[i] == 0) {
                        ret = TRUE;
                        break;
                }
        }
        x[n-1] = watchdog;
        return ret;
}

Open in new window

0
 
Infinity08Commented:
>> infinity08: Again, no. Adding a watchdog and doing the nifty shift tricks do not affect each other.

I'm not sure what you mean by that, but I'd love you to show me how you can optimize assembler like what I posted.

btw, this is not true :

>> the watchdog suggestion by ozo reduced it to one compare.

because you still need a compare for the position at the end. Plus an extra compare at the beginning for n. That's 3 compares.

Btw, the code you posted will not work for several reasons :

1) what if there's no 0 at all in any memory location after the beginning of the array ? You'll have a pretty long loop that might even crash the program.

2) what if there isn't a 0 inside the array, but there is one AFTER the array ? You'll return TRUE anyway.
0
 
KirillMuellerCommented:
I'm so sorry for the confusion. I forgot one line in my source code. Hope it is clear now. A "watchdog" is a dummy element appended to the array to make sure that an iteration over the array stops, even if no comparison for the index is performed.

As I mentioned in one of my previous posts, the most efficient code on i386 for this task is

REPNE SCASD

together with some initialization and termination code.

I have posted optimized code that can be compiled directly to code that uses only one conditional jump in the loop (plus three conditional jumps outside the loop)..
int LookForOneWatchdog(int n, int x[]) {
        int i;
 
        if (n == 0) return (FALSE);
 
        int watchdog = x[n-1];
        x[n-1] = 0;            // forgot this line
        int ret = FALSE;
        for (i = 0; ; i++) {
                if (x[i] == 0) {
                        ret = TRUE;
                        break;
                }
        }
        x[n-1] = watchdog;
        return ret;
}
 
int LookForOneWatchdogOptimized(int n, int x[]) {
        int i;
 
        if (n == 0) return (FALSE);
        if (x[n-1] == 0) return (TRUE);
 
        int watchdog = x[n-1];
        x[n-1] = 0;
        int ret = FALSE;
        for (i = 0; ; i++) {
                if (x[i] == 0) {
                        break;
                }
        }
        x[n-1] = watchdog;
        return (i < (n - 1)) ? TRUE : FALSE;
}

Open in new window

0
 
Infinity08Commented:
>> I forgot one line in my source code.

In other words : it will always return TRUE now, unless n == 0. That doesn't meet the requirements either ;) You need to add a few more compares.
And what if the value at n-1 is already 0 ?

No matter how you look at it, adding a watchdog, will get you MORE compares, because you need to check for more special cases.


>> I have posted optimized code that can be compiled directly to code that uses only one conditional jump in the loop (plus three conditional jumps outside the loop)..

But the overall performance will most likely still be worse than the original code.

Anyway ... let's stop here ... I'll be waiting for Koraxen's feedback.
0
 
KirillMuellerCommented:
Infinity08: Right, the first version always returns TRUE. Doh... But the second routine should be correct. As for the number of compares, the original code takes 2*n compares (two in each step of the loop) , while my routine takes n+3 compares.

I'll be waiting for the author'S feedback, too.
0
 
Infinity08Commented:
>> But the second routine should be correct.

Sorry ... missed the second one ;)
0
 
itsmeandnobodyelseCommented:
You can do it with one comparision if the array is sorted and contains only positive integers.

So the statement is trivial, it might give you a way to optimize: add new entries to the array so that it keeps always sorted.

#define BLOCK_SIZE 64
void insertToArray(int ** parr, int * ps, int n, int i)
{
      int * p;
      int b, e, m;
      int s = *ps;
      if (n >= s)
      {
           p = (int*)malloc((s+BLOCK_SIZE)*sizeof(int));
           if (s > 0)
           {
                memcpy(p, *parr, s*sizeof(int));
                free(*parr);
           }
           memset(p+s, 0, BLOCK_SIZE*sizeof(int));
           *parr = p;
           *ps = s + BLOCK_SIZE;
           n = s;
      }
      p = *parr;
      b = 0;
      e = n-1;
      while (b <= e)
      {
          m = (b + e)>>1;  // middle
          if (i < *(p + m))
              e = m - 1;
          else
              b = m + 1;
      }
      /* now b is the pos to insert */
      if (n > b)  /* use memmove  for overlapping copy */
             memmove(p+b+1, p + b, (n - b)*sizeof(int));
      *(p + b) = i;
}

Note, if you have negative integers you may do a binary search at the sorted array (look at the while loop) to check for 0 entry.

Regards, Alex
0
 
KoraxenAuthor Commented:
Your ideas and work is very much appreciated guys. What I'm going to consolidate these ideas and submit/present them. Thanks.
0
 
KoraxenAuthor Commented:
To answer your questions:

The question doesn't specify when this function is called, or what architecture it will be used on. The only requirement was to get around the conditional. I assume multi-threading was not an option either.

Again, thank you.
0
 
Infinity08Commented:
>> The only requirement was to get around the conditional.

I still don't get the point though ... Was this an assignment ? If so, you should have said so from the beginning, and we could have helped you a lot better ...
0
 
itsmeandnobodyelseCommented:
Did you ever heard of qbits? Whenever that term is 'wandering like a ghost' in some newpaper or magazine, they will tell you (nearly with identical words) that with a few qbits you may do an arbitrary number of  computations same time. I actually think it is a fake to get money from the faithful ones as even with quants you can not overcome human logics. But if I am wrong that is the way you should go on.

Regards, Alex
0
 
ozoCommented:
I haven't heard that anyone hat yet built a working quantum computer with more than 28 qbits, not to mention implementing a C compiler on one, but if you had your list in a quantum computer you may be able to search it in O(sqrt(n)) instead of O(n)
http://en.wikipedia.org/wiki/Grover's_algorithm
0
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

All Courses

From novice to tech pro — start learning today.