Solved

Non-linear check for a particular integer in an array

Posted on 2007-11-16
37
265 Views
Last Modified: 2010-04-15
Hey guys,

I have a function that loops through an int array and returns true and breaks out if it finds a 0.

However, is there any way to improve it and detect the 0 using only ONE comparison, to reduce overhead?
int LookForOne (int n , int x[]) {

	int i;
 

	for (i = 0; i < n; i++) {

		if (x[i] == 0) {

			return (TRUE);

		}

	}

	return (FALSE);

}

Open in new window

0
Comment
Question by:Koraxen
  • 14
  • 7
  • 7
  • +3
37 Comments
 
LVL 53

Expert Comment

by:Infinity08
Comment Utility
Unless you know where the 0 is to be found, or unless the data is ordered in a certain way, you can't really speed this up.
0
 
LVL 53

Expert Comment

by:Infinity08
Comment Utility
btw : what is this function supposed to do ? Don't you want to return the index of the 0 ?
0
 

Author Comment

by:Koraxen
Comment Utility
Just return true or false, if it's there or not.

I was wondering if you could possibly do some math (add / subtract) to improve it.
0
 
LVL 45

Accepted Solution

by:
Kdo earned 100 total points
Comment Utility
Hi Koraxen,

Interesting question.  :)

You're going to have to do at least two compares.  One for the loop iterator, and another for the actual data.

But a bit of cleverness can reduce the number of loop iterators that you need to compare in exchange for slightly more programming.

Assume that you know the data well enough that you expect the loop to iterate over dozens (or even hundreds) of objects.  The conventional loop looks like this:

int FindZero (int   *Array, int *Length)
{
  int idx;

  for (idx = 0; idx < Length; ++idx)
    if (Array[idx] == 0)
      return idx;
  return -1
}

You may get better performance by checking multiple items per iteration.

int FindZero (int   *Array, int *Length)
{
  int idx;
  int Limit;
  int *Item;

//  Process most of the array in 8 word chunks  

  Limit = Length && ~3;        
  Item = Array;
  for (idx = 0; idx < Limit; idx += 8)
  {
    if (*(Item++) == 0)
      return Limit;
    if (*(Item++) == 0)
      return Limit + 1;
    if (*(Item++) == 0)
      return Limit + 2;
    if (*(Item++) == 0)
      return Limit + 3;
    if (*(Item++) == 0)
      return Limit+ 4;
    if (*(Item++) == 0)
      return Limit + 5;
    if (*(Item++) == 0)
      return Limit + 6;
    if (*(Item++) == 0)
      return Limit + 7;
  }

//  Check the items after the last 8 word chunk

  for (idx = Limit; idx < Length; ++idx)
    if (Array[idx] == 0)
      return idx;

  return -1
}

And there may be minor improvements in this by consistently using array notation or consistently dereferencing the array as a pointer, but this gives you the ida.



Good Luck,
Kent
0
 
LVL 53

Expert Comment

by:Infinity08
Comment Utility
btw, Kdo's technique is called "loop unrolling". You still have to do a compare for each int in the array though.
0
 
LVL 45

Expert Comment

by:Kdo
Comment Utility
Hi Koraxen,

And there's no charge for the typos.  That's called "fat fingers".   :)



Good Luck,
Kent
0
 
LVL 53

Expert Comment

by:Infinity08
Comment Utility
>> Just return true or false, if it's there or not.

What I meant was : what will this function be used for ? When will you call it ? How often ? For what purpose ?

There might be other ways of optimizing this.
0
 
LVL 5

Expert Comment

by:KirillMueller
Comment Utility
If you AND all values in the array and the result is nonzero, there cannot be a zero in the array. However, this is not true the other way round.

Modern CPUs have a feature called "branch prediction" that, together with pipelining, eliminates much of the overhead of a conditional jump -- which is needed to handle the "if" construct. Try to benchmark your loop vs. a loop that ANDs all elements and checks the result, like in my code snippet. I'm afraid the run time will be dominated by the time needed to fetch the array values, at least for large arrays bigger than the L2 cache size.

Moreover, for example, x86 CPUs have special loop instructions that can implement tight loops that check arrays.. If my memory doesn't fail me, the core of your loop can be implemented using one (!) assembly instruction like

REPNE SCASD

with a consecutive check if the end of the array has been reached. The instruction reads like "REPeat while Not Equal (zero) SCAn the String Double-word-wise"

If you have a good compiler, it will issue an instruction like this for your code.

As Infinity08 already said, without pre-sorting or some other preprocessing, no substantial speed-ups are possible.
int LookForOneMaybe(int n , int x[]) {

        int i;
 

        int res = -1;

 

        for (i = 0; i < n; i++) {

                res &= x[i];

        }
 

        if (res != 0) {

                return (FALSE);

        }

        return (MAYBE);

}

Open in new window

0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
If you can multiply all the values in the array without overflow, then the result will be 0 iff there was a 0 in the array.  
But that way doesn't reduce overhead.
0
 

Author Comment

by:Koraxen
Comment Utility
Following ozo's train of thought, I came up with this... the multiplication way:
int CheckForOne(int x[], int n) {

	long int test = 1;

	int i;

	int temp = 1;

	int retVal = 1;

	

	for (i = 0; i < n; i++) {

		temp = test;		

		test = test * x[i];

		while (test > temp) {

			test--;

		}

		printf("test = %d\n",test);

    }
 

	if ( test != 0 ) {

		retVal = 0;

	}
 

	return retVal;

}

Open in new window

0
 
LVL 53

Expert Comment

by:Infinity08
Comment Utility
I thought you wanted less comparisons ? You've now got MORE comparisons, and a lot more extra code. Even multiplications that are a lot more costly than a comparison.
0
 

Author Comment

by:Koraxen
Comment Utility
You're right... the while acts as a conditional. Hmm... I'll have to keep looking into it.
0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
               while (test > temp) {
                        test--;
                }
that could be a lot of comparisons
And if test ever goes negative, it would stay negative even if it later sees x[i] = 0
0
 
LVL 84

Assisted Solution

by:ozo
ozo earned 100 total points
Comment Utility
You could convert all non-0 values to 1 with something like
unsigned int temp = x[i];
temp |= temp>>1;
temp |= temp>>2;
temp |= temp>>4;
temp |= temp>>8;
temp |= temp>>16;
temp |= temp>>32; // depending on sizeof(int)*CHAR_BIT;
temp &=1;
then
test *= temp is guaranteed not to overflow, or test &= temp can also be used
0
 
LVL 53

Expert Comment

by:Infinity08
Comment Utility
Do you really think that'll be faster, ozo ?
0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
No. I said it reduces comparisons bit not overhhead
Although it could be faster than
             while (test > temp) {
                        test--;
                }
0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
If ! doesn't count as a "comparison"
test |= !x[i];
0
 
LVL 5

Expert Comment

by:KirillMueller
Comment Utility
ozo: ! counts as a comparison on many architectures. However, your trick using five or six shifts is amazing. It could be combined with my suggestion, see below.

The problem with the code is that the data dependencies are tight: Each access to temp depends on the previous line. This kind of makes the CPU's pipeline useless. A good compiler would do some loop unrolling and interleaving, like I did in the second example. Only possible if there are enough CPU registers available.

Anyway, I suggest to do some benchmarks to check if the effort pays off.

Koraxen: Is multithreading an option for you? Multi-core CPUs become more and more common, and it should be easy to parallelize the code. Just need to check which access pattern would be best:

- Thread 1 works on [0..N/2), thread 2 works on [N/2..N)
- Thread 1 works on even indexes of [0..N), thread 2 works on odd indexes of [0..N)
int LookForOne(int n , int x[]) {

        int i;

 

        unsigned int res = -1;

 

        for (i = 0; i < n; i++) {

                unsigned int temp = x[i];

                temp |= temp>>1;

                temp |= temp>>2;

                temp |= temp>>4;

                temp |= temp>>8;

                temp |= temp>>16;

                temp |= temp>>32; // depending on sizeof(int)*CHAR_BIT;
 

                // Now, the LSB of temp is set iff x[i] <> 0

                // We use a "full" AND for res,

                // it's one machine instruction anyway,

                // but we check the LSB after the loop

                res &= temp;

        }

 

        if ( (res & 1) ) {

                return (FALSE);

        }

        return (TRUE);

}
 
 
 
 
 

int LookForOneUnrolled(int n , int x[]) {

        int i;

 

        unsigned int res = -1;

 

        for (i = 0; i < n; i += 6) {

                unsigned int temp1 = x[i];

                unsigned int temp2 = x[i+1];

                unsigned int temp3 = x[i+2];

                unsigned int temp4 = x[i+3];

                unsigned int temp5 = x[i+4];

                unsigned int temp6 = x[i+5];
 

                temp1 |= temp1>>1;

                temp2 |= temp2>>1;

                temp3 |= temp3>>1;

                temp4 |= temp4>>1;

                temp5 |= temp5>>1;

                temp6 |= temp6>>1;
 

                temp1 |= temp1>>2;

                temp2 |= temp2>>2;

                temp3 |= temp3>>2;

                temp4 |= temp4>>2;

                temp5 |= temp5>>2;

                temp6 |= temp6>>2;
 

                temp1 |= temp1>>4;

                temp2 |= temp2>>4;

                temp3 |= temp3>>4;

                temp4 |= temp4>>4;

                temp5 |= temp5>>4;

                temp6 |= temp6>>4;
 

                temp1 |= temp1>>8;

                temp2 |= temp2>>8;

                temp3 |= temp3>>8;

                temp4 |= temp4>>8;

                temp5 |= temp5>>8;

                temp6 |= temp6>>8;
 

                temp1 |= temp1>>16;

                temp2 |= temp2>>16;

                temp3 |= temp3>>16;

                temp4 |= temp4>>16;

                temp5 |= temp5>>16;

                temp6 |= temp6>>16;
 

                temp1 |= temp1>>32;

                temp2 |= temp2>>32;

                temp3 |= temp3>>32;

                temp4 |= temp4>>32;

                temp5 |= temp5>>32;

                temp6 |= temp6>>32;
 

                temp1 &= temp2;

                temp3 &= temp4;

                temp5 &= temp6;

                res &= temp1;

                temp3 &= temp5;

                res &= temp3;

        }

 

        if ( (res & 1) ) {

                return (FALSE);

        }

        return (TRUE);

}

Open in new window

0
How to improve team productivity

Quip adds documents, spreadsheets, and tasklists to your Slack experience
- Elevate ideas to Quip docs
- Share Quip docs in Slack
- Get notified of changes to your docs
- Available on iOS/Android/Desktop/Web
- Online/Offline

 
LVL 84

Expert Comment

by:ozo
Comment Utility
If you have an extra slot at the end of the array, you can eliminate the  i < n comparison
x[n]=0;
for( int *p=&n[0]; *p++; ){}
return( p <= &x[n] );
If your machine has hardware instuctions that strlen takes advantage of, it could be less overhead to use it to scan for candidate 0's dependng on how the int values are distributed.
0
 
LVL 5

Expert Comment

by:KirillMueller
Comment Utility
ozo: And even if there is no spare slot, you could abuse the last element as watchdog.

Koraxen: What is your target architecture, anyway? :-)
if (n == 0) return FALSE;

if (x[n-1] == 0) return TRUE;

int save = x[n-1];

x[n-1] = 0;

// ...

x[n-1] = save;

return RETVAL;

Open in new window

0
 
LVL 53

Expert Comment

by:Infinity08
Comment Utility
>> Anyway, I suggest to do some benchmarks to check if the effort pays off.

I'm pretty sure it won't. Be serious. The original code was a loop with 1 compare in it. Now you've made it a loop with multiple shifts, ors and movs.

I thought the goal was "to reduce overhead" ?
0
 
LVL 53

Expert Comment

by:Infinity08
Comment Utility
Koraxen, have you had the time to answer my earlier questions ?

>> What will this function be used for ? When will you call it ? How often ? For what purpose ?
0
 
LVL 5

Expert Comment

by:KirillMueller
Comment Utility
infinity08: Not quite right: The original loop had TWO compares, the watchdog suggestion by ozo reduced it to one compare. And still, conditional jumps _might_ be more expensive than a couple of arithmetic operation, though I doubt it -- branch prediction should do a good job for this particular loop.
0
 
LVL 53

Assisted Solution

by:Infinity08
Infinity08 earned 100 total points
Comment Utility
>> Not quite right: The original loop had TWO compares, the watchdog suggestion by ozo reduced it to one compare.

Yes, but as I said : it replaced 1 compare with multiple shifts, ors and movs. I doubt that that will be faster on any machine.

Btw, when the compiler does a good job, it will generate code that has only two jumps : 1 for the loop, and one to get out of the loop when a 0 is found. The first will obviously be taken multiple times, but the second will only be taken once. It might look something like this (pseudo assembler) :


      sl n, 2            ; n <<= 2

      add n, x           ; n += x

lab1: cmp x, n           ; 

      jge lab2           ; if (x >= n) goto lab2

      cmp 0, (x)         ;

      je lab3            ; if (0 == *x) goto lab3

      add x, 4           ; x++

      jmp lab1           ; goto lab1

lab2: ret 0              ; return 0

lab3: ret 1              ; return 1

Open in new window

0
 
LVL 53

Expert Comment

by:Infinity08
Comment Utility
It is very difficult to optimize generated assembler like that without having more information about the contents of the x array (for example a sorted array).
0
 
LVL 5

Expert Comment

by:KirillMueller
Comment Utility
infinity08: Again, no. Adding a watchdog and doing the nifty shift tricks do not affect each other.
int LookForOneWatchdog(int n, int x[]) {

        int i;

 

        if (n == 0) return (FALSE);

 

        int watchdog = x[n-1];

        int ret = FALSE;

        for (i = 0; ; i++) {

                if (x[i] == 0) {

                        ret = TRUE;

                        break;

                }

        }

        x[n-1] = watchdog;

        return ret;

}

Open in new window

0
 
LVL 53

Expert Comment

by:Infinity08
Comment Utility
>> infinity08: Again, no. Adding a watchdog and doing the nifty shift tricks do not affect each other.

I'm not sure what you mean by that, but I'd love you to show me how you can optimize assembler like what I posted.

btw, this is not true :

>> the watchdog suggestion by ozo reduced it to one compare.

because you still need a compare for the position at the end. Plus an extra compare at the beginning for n. That's 3 compares.

Btw, the code you posted will not work for several reasons :

1) what if there's no 0 at all in any memory location after the beginning of the array ? You'll have a pretty long loop that might even crash the program.

2) what if there isn't a 0 inside the array, but there is one AFTER the array ? You'll return TRUE anyway.
0
 
LVL 5

Assisted Solution

by:KirillMueller
KirillMueller earned 100 total points
Comment Utility
I'm so sorry for the confusion. I forgot one line in my source code. Hope it is clear now. A "watchdog" is a dummy element appended to the array to make sure that an iteration over the array stops, even if no comparison for the index is performed.

As I mentioned in one of my previous posts, the most efficient code on i386 for this task is

REPNE SCASD

together with some initialization and termination code.

I have posted optimized code that can be compiled directly to code that uses only one conditional jump in the loop (plus three conditional jumps outside the loop)..
int LookForOneWatchdog(int n, int x[]) {

        int i;

 

        if (n == 0) return (FALSE);

 

        int watchdog = x[n-1];

        x[n-1] = 0;            // forgot this line

        int ret = FALSE;

        for (i = 0; ; i++) {

                if (x[i] == 0) {

                        ret = TRUE;

                        break;

                }

        }

        x[n-1] = watchdog;

        return ret;

}
 

int LookForOneWatchdogOptimized(int n, int x[]) {

        int i;

 

        if (n == 0) return (FALSE);

        if (x[n-1] == 0) return (TRUE);

 

        int watchdog = x[n-1];

        x[n-1] = 0;

        int ret = FALSE;

        for (i = 0; ; i++) {

                if (x[i] == 0) {

                        break;

                }

        }

        x[n-1] = watchdog;

        return (i < (n - 1)) ? TRUE : FALSE;

}

Open in new window

0
 
LVL 53

Expert Comment

by:Infinity08
Comment Utility
>> I forgot one line in my source code.

In other words : it will always return TRUE now, unless n == 0. That doesn't meet the requirements either ;) You need to add a few more compares.
And what if the value at n-1 is already 0 ?

No matter how you look at it, adding a watchdog, will get you MORE compares, because you need to check for more special cases.


>> I have posted optimized code that can be compiled directly to code that uses only one conditional jump in the loop (plus three conditional jumps outside the loop)..

But the overall performance will most likely still be worse than the original code.

Anyway ... let's stop here ... I'll be waiting for Koraxen's feedback.
0
 
LVL 5

Expert Comment

by:KirillMueller
Comment Utility
Infinity08: Right, the first version always returns TRUE. Doh... But the second routine should be correct. As for the number of compares, the original code takes 2*n compares (two in each step of the loop) , while my routine takes n+3 compares.

I'll be waiting for the author'S feedback, too.
0
 
LVL 53

Expert Comment

by:Infinity08
Comment Utility
>> But the second routine should be correct.

Sorry ... missed the second one ;)
0
 
LVL 39

Assisted Solution

by:itsmeandnobodyelse
itsmeandnobodyelse earned 100 total points
Comment Utility
You can do it with one comparision if the array is sorted and contains only positive integers.

So the statement is trivial, it might give you a way to optimize: add new entries to the array so that it keeps always sorted.

#define BLOCK_SIZE 64
void insertToArray(int ** parr, int * ps, int n, int i)
{
      int * p;
      int b, e, m;
      int s = *ps;
      if (n >= s)
      {
           p = (int*)malloc((s+BLOCK_SIZE)*sizeof(int));
           if (s > 0)
           {
                memcpy(p, *parr, s*sizeof(int));
                free(*parr);
           }
           memset(p+s, 0, BLOCK_SIZE*sizeof(int));
           *parr = p;
           *ps = s + BLOCK_SIZE;
           n = s;
      }
      p = *parr;
      b = 0;
      e = n-1;
      while (b <= e)
      {
          m = (b + e)>>1;  // middle
          if (i < *(p + m))
              e = m - 1;
          else
              b = m + 1;
      }
      /* now b is the pos to insert */
      if (n > b)  /* use memmove  for overlapping copy */
             memmove(p+b+1, p + b, (n - b)*sizeof(int));
      *(p + b) = i;
}

Note, if you have negative integers you may do a binary search at the sorted array (look at the while loop) to check for 0 entry.

Regards, Alex
0
 

Author Comment

by:Koraxen
Comment Utility
Your ideas and work is very much appreciated guys. What I'm going to consolidate these ideas and submit/present them. Thanks.
0
 

Author Comment

by:Koraxen
Comment Utility
To answer your questions:

The question doesn't specify when this function is called, or what architecture it will be used on. The only requirement was to get around the conditional. I assume multi-threading was not an option either.

Again, thank you.
0
 
LVL 53

Expert Comment

by:Infinity08
Comment Utility
>> The only requirement was to get around the conditional.

I still don't get the point though ... Was this an assignment ? If so, you should have said so from the beginning, and we could have helped you a lot better ...
0
 
LVL 39

Expert Comment

by:itsmeandnobodyelse
Comment Utility
Did you ever heard of qbits? Whenever that term is 'wandering like a ghost' in some newpaper or magazine, they will tell you (nearly with identical words) that with a few qbits you may do an arbitrary number of  computations same time. I actually think it is a fake to get money from the faithful ones as even with quants you can not overcome human logics. But if I am wrong that is the way you should go on.

Regards, Alex
0
 
LVL 84

Expert Comment

by:ozo
Comment Utility
I haven't heard that anyone hat yet built a working quantum computer with more than 28 qbits, not to mention implementing a C compiler on one, but if you had your list in a quantum computer you may be able to search it in O(sqrt(n)) instead of O(n)
http://en.wikipedia.org/wiki/Grover's_algorithm
0

Featured Post

Highfive Gives IT Their Time Back

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

Join & Write a Comment

Windows programmers of the C/C++ variety, how many of you realise that since Window 9x Microsoft has been lying to you about what constitutes Unicode (http://en.wikipedia.org/wiki/Unicode)? They will have you believe that Unicode requires you to useā€¦
This article will show, step by step, how to integrate R code into a R Sweave document
The viewer will learn how to implement Singleton Design Pattern in Java.
This video teaches viewers about errors in exception handling.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

12 Experts available now in Live!

Get 1:1 Help Now