Solved

Another question about optimization

Posted on 2009-05-16
46
2,338 Views
Last Modified: 2012-05-07
Hey,

       I need some advice about optimizing some code.

The smooth function, returns a "smoothed" version of an image, by taking the averages of all the pixels.

My question is... since the smooth function is calling the avg function repeatedly, would my time be better spent trying to speed up the avg function? or the smooth function?

I'm supposed to make it go at least twice as fast by using a technique called blocking... but I'm a little unsure about how to go about that.

Appreciate any help.
static pixel avg(int dim, int i, int j, pixel *src) 

{

    int ii, jj;

    pixel_sum sum;

    pixel current_pixel;

 

    initialize_pixel_sum(&sum);

    for(ii = max(i-1, 0); ii <= min(i+1, dim-1); ii++) 

	for(jj = max(j-1, 0); jj <= min(j+1, dim-1); jj++) 

	    accumulate_sum(&sum, src[RIDX(ii, jj, dim)]);

 

    assign_sum_to_pixel(&current_pixel, sum);

    return current_pixel;

}

 

 

void naive_smooth(int dim, pixel *src, pixel *dst) 

{

    int i, j;

 

    for (i = 0; i < dim; i++)

	for (j = 0; j < dim; j++)

	    dst[RIDX(i, j, dim)] = avg(dim, i, j, src);

}

Open in new window

0
Comment
Question by:errang
  • 23
  • 12
  • 11
46 Comments
 
LVL 84

Expert Comment

by:ozo
ID: 24405119
You might do better by restructuring the program so that the smooth function does not call the average function.
0
 

Author Comment

by:errang
ID: 24405125
hm... ok.

I am somewhat confused by that code though, where is the division taking place here?

 initialize_pixel_sum(&sum);
    for(ii = max(i-1, 0); ii <= min(i+1, dim-1); ii++)
        for(jj = max(j-1, 0); jj <= min(j+1, dim-1); jj++)
            accumulate_sum(&sum, src[RIDX(ii, jj, dim)]);
 
    assign_sum_to_pixel(&current_pixel, sum);
    return current_pixel;
0
 

Author Comment

by:errang
ID: 24405132
I'm thinking the accumulate_sum function collects the values so far.

And the assign_sum_to_pixel function does not take the number of items that were added to make the sum... so where is the division taking place to find out the average?
0
 
LVL 84

Expert Comment

by:ozo
ID: 24405137
could it be in either accumulate_sum or assign_sum_to_pixel?
0
 

Author Comment

by:errang
ID: 24405507
Ok... the average is computed in assign_sum_to_pixel.

There's no way to speed up assign_sum_to_pixel... is there?

And if we can't speed up anything with assign_sum_to_pixel, the only thing we can change is:

    for(ii = max(i-1, 0); ii <= min(i+1, dim-1); ii++)
        for(jj = max(j-1, 0); jj <= min(j+1, dim-1); jj++)
            accumulate_sum(&sum, src[RIDX(ii, jj, dim)]);

correct?
typedef struct {

    int red;

    int green;

    int blue;

    int num;

} pixel_sum;
 

typedef struct {

   unsigned short red;

   unsigned short green;

   unsigned short blue;

} pixel
 

/* 

 * accumulate_sum - Accumulates field values of p in corresponding 

 * fields of sum 

 */

static void accumulate_sum(pixel_sum *sum, pixel p) 

{

    sum->red += (int) p.red;

    sum->green += (int) p.green;

    sum->blue += (int) p.blue;

    sum->num++;

    return;

}
 

/* 

 * assign_sum_to_pixel - Computes averaged pixel value in current_pixel 

 */

static void assign_sum_to_pixel(pixel *current_pixel, pixel_sum sum) 

{

    current_pixel->red = (unsigned short) (sum.red/sum.num);

    current_pixel->green = (unsigned short) (sum.green/sum.num);

    current_pixel->blue = (unsigned short) (sum.blue/sum.num);

    return;

}

Open in new window

0
 
LVL 11

Expert Comment

by:climbgunks
ID: 24405718
If you look at what the code is trying to do, it's averaging all the pixels around any given pixel to produce the new value.   So in most cases, you're averaging 9 values, on edges 6 values, and in corners, only 4 values.

Since there is some data re-use, cache blocking will help a bit.   The biggest gains would come from collapsing all the functions into one.  In that case, the blocking would be more apparent, you wouldn't have all the function calls (a huge win), etc.  

If you go the latter route, realize that as you walk the pixels, the former sum contains most (2/3 in most cases) of the sum needed for the current sum.    Basically if you use a moving window, you can avoid re-adding the same numbers each time.
0
 

Author Comment

by:errang
ID: 24405744
hm... so what you are saying is... to get the value of pixel #2, I would have to do:

(pixel #1 + pixel #2 + pixel #3)/3 ?


0
 
LVL 11

Expert Comment

by:climbgunks
ID: 24405756

No.  Think of the problem as a 2-D array... each pixel is the avg of all the pixels it touches plus itself.

0
 

Author Comment

by:errang
ID: 24405783
oh... now that... complicates things.  So to get the value for pixel[1][2], I would have to do:

(pixel[1][1] + pixel[1][3] + pixel[0][2] + pixel[2][2] + pixel[0][1] + pixel[0][3] + pixel[2][1] + pixel[2][3] + pixel[1][2])/9??
0
 

Author Comment

by:errang
ID: 24405786
Also... could you please explain the concept of cache blocking?

Is it the idea of limiting your statements to the limit of the cache memory? or is it a bit more complicated than that, and involves wording your loops just right?
0
 
LVL 84

Expert Comment

by:ozo
ID: 24406557
maybe you can average in one dimension, then average in the other dimension
0
 

Author Comment

by:errang
ID: 24407688
>>maybe you can average in one dimension, then average in the other dimension

hm? could you please explain that?
0
 

Author Comment

by:errang
ID: 24407737
Hm... so far we've found out that I should try to take out the avg function and combine smooth and avg?

I was just wondering... would it make any difference if we take out procedure calls and replace them with the body of the function?
static pixel avg(int dim, int i, int j, pixel *src) 

{

    int ii, jj;

    pixel_sum sum;

    pixel current_pixel;
 

    initialize_pixel_sum(&sum); <---- Do we need this?

    for(ii = max(i-1, 0); ii <= min(i+1, dim-1); ii++) 

	for(jj = max(j-1, 0); jj <= min(j+1, dim-1); jj++) 

	    accumulate_sum(&sum, src[RIDX(ii, jj, dim)]);
 

    assign_sum_to_pixel(&current_pixel, sum);

    return current_pixel;

}

Open in new window

0
 
LVL 84

Expert Comment

by:ozo
ID: 24408741
It may make a small difference.
How much of your time is spent in procedure calls?
How much of your time is spent in the body of the function?
How much of your time is spent accessing src?
How much of your time is spent updating sum?
How much of your time is spent in cache misses?
How much of your time is spent doing the same thing more often than you need to?
0
 

Author Comment

by:errang
ID: 24409301
I tried looking up cache blocking...

http://www.cs.berkeley.edu/~richie/cs267/mg/report/node35.html

To be honest... after reading that stuff, I think it just confirmed my original idea about cache blocks... which was limiting your statements so they fit in the cache perfectly.

Am I right?
0
 
LVL 84

Expert Comment

by:ozo
ID: 24409337
The same source mentions another kind of blocking.
http://www.cs.berkeley.edu/~richie/cs267/mg/report/node8.html

But why not start with the optimizations that are easiest for you to do first?
0
 
LVL 84

Expert Comment

by:ozo
ID: 24409425
You may notice that each element of src is read 9 times.
It may be faster to cut that down to 6 or 2 times.
0
 

Author Comment

by:errang
ID: 24409558
>>You may notice that each element of src is read 9 times. It may be faster to cut that down to 6 or 2 times.

Hm... do you mean to say like store the elements in the array in another variable and when we move from each variable, we only take what is needed?
0
 

Author Comment

by:errang
ID: 24414947
Hm... I've since loop unrolling worked well on the last one, I tried on this one:

But it doesn't seem to be increasing its performance...

Could I please get some more pointers on this?
void smooth(int dim, pixel *src, pixel *dst)

{

    int i, j;
 

    for (i = 0; i < dim; i++)

        for (j = 0; j < dim; j+=8){

          dst[RIDX(i, j, dim)] = avg(dim, i, j, src);

          dst[RIDX(i, (j+1), dim)] = avg(dim, i, (j+1), src);

          dst[RIDX(i, (j+2), dim)] = avg(dim, i, (j+2), src);

          dst[RIDX(i, (j+3), dim)] = avg(dim, i, (j+3), src);

          dst[RIDX(i, (j+4), dim)] = avg(dim, i, (j+4), src);

          dst[RIDX(i, (j+5), dim)] = avg(dim, i, (j+5), src);

          dst[RIDX(i, (j+6), dim)] = avg(dim, i, (j+6), src);

          dst[RIDX(i, (j+7), dim)] = avg(dim, i, (j+7), src);

         }

}

Open in new window

0
 
LVL 84

Expert Comment

by:ozo
ID: 24415015
can you make avg faster?
where is your program spending most of its time?
Is there any place where you are making the same computation multiple times when you could be doing it just once?
0
 

Author Comment

by:errang
ID: 24415149
>>Is there any place where you are making the same computation multiple times when you could be doing it just once?

Hm... that would be the avg function, right? if it computes the average of every cell around the given cell... if it is reading 2 cells right next to each other... it would read 6 cells twice... right?

But... if we wanted to save the value of these cells... we would have to incorporate the avg function inside the smooth function, right?
0
 
LVL 11

Expert Comment

by:climbgunks
ID: 24415982

The following code does most of what I talked about earlier.   It uses a moving window to calculate series of sums.   I checked that it compiles, but it might have logic errors since I didn't run it against any actual images.

Basically,  for every row compute the sum of the src plus its neighbors.   For the end of the rows, that's only 2 cells, for every other element the sum consists of 3 cells.

Next we need to add the neighbors above and below us to our sum, and compute the average and assign it to the dst.   Now, we don't have to compute the partial sums across the entire array before we start computing averages, we can work (in general) 3 rows at a time.   The top and bottom rows only require 2 rows of sums.   So, we special case the first and rows again.

What does this have to do w/ blocking?   Because we're using a moving window across 3 rows, we only need that to remain in cache, rather than the entire array.  

One further optimization would be to interleave the calculation of the partial sums with the dst calculations, so you only need a moving window that is 3x3.     The trade-off would be that while you would use less memory, you'll have to do more pointer arithmetic to get the right things to happen at the right time.    My guess is the following would fit in cache, even for dim == 1K.

static inline void do_row_sum(pixel_sum *sums, int dim, pixel *row) {

    int i;

    *sums++ = *row++ + *row;

    for (i=1; i<dim-1; i++)

        *sums++ = *(row-1) + *row++ + *row;

    *sums++ = *(row-1) + *row;

}
 

void smooth(int dim, pixel *src, pixel *dst) {

    pixel_sum *row_sums = malloc(3*dim*sizeof(pixel_sum));

    pixel_sum *r0 = row_sums;

    pixel_sum *r1 = r0+dim;

    pixel_sum *r2;

    int i, j;
 

    do_row_sum(r0, dim, src);

    do_row_sum(r1, dim, src+i*dim);

    // calculate row 0

    *dst++ = (*r0++ + *r1++)/4;

    for (i=1; i<dim-1; i++)

        *dst++ = (*r0++ + *r1++)/6;

    *dst++ = (*r0 + *r1)/4;
 

    // calculate all the middle rows

    for (i=1; i<dim-1; i++) {

        r0 = row_sums+((i-1)%3)*dim;

        r1 = row_sums+(i%3)*dim;

        r2 = row_sums+((i+1)%3)*dim;

        do_row_sum(r2, dim, src+(i+1)*dim);

        *dst++ = (*r0++ + *r1++ + *r2++)/6;

        for (j=1; j<dim-1; j++)

            *dst++ = (*r0++ + *r1++ + *r2++)/9;

        *dst++ = (*r0 + *r1 + *r2)/6;

    }
 

    // calculate the last row

    r0 = row_sums+((i-1)%3)*dim;

    r1 = row_sums+(i%3)*dim;

    *dst++ = (*r0++ + *r1++)/4;

    for (i=1; i<dim-1; i++)

        *dst++ = (*r0++ + *r1++)/6;

    *dst = (*r0 + *r1)/4;

}

Open in new window

0
 
LVL 11

Expert Comment

by:climbgunks
ID: 24416095

probably want to throw the following in before that last closing } ... otherwise it leaks memory.

free(row_sums);
0
How your wiki can always stay up-to-date

Quip doubles as a “living” wiki and a project management tool that evolves with your organization. As you finish projects in Quip, the work remains, easily accessible to all team members, new and old.
- Increase transparency
- Onboard new hires faster
- Access from mobile/offline

 
LVL 84

Expert Comment

by:ozo
ID: 24416361
what is the range of values in pixel?
0
 
LVL 84

Expert Comment

by:ozo
ID: 24416378
What is the range of values in pixel?
Does it use less than 1/9th of the range of a short?
0
 
LVL 11

Expert Comment

by:climbgunks
ID: 24416407

It doesn't matter...     if the original code ran, this will run as well.

there are two different type_defs.

pixel  and pixel_sum.

The original code stuffed the sum of all 9 (at most) pixels into pixel_sum, then computed the avg.   My code does exactly the same thing.

0
 
LVL 84

Expert Comment

by:ozo
ID: 24416455
I understand the original code (although I'm not sure how sizeof(unsigned short) compares to sizeof(int))
the question was about optimizations.
0
 
LVL 84

Expert Comment

by:ozo
ID: 24416598
also, how does CHAR_MAX<<(sizeof(int)-sizeof(unsigned short)) compare to dim or dim*dim?
0
 

Author Comment

by:errang
ID: 24418207
first of all, thanks for the code =D

but... I'm getting a few errors, all of them are error: invalid operands to binary +.

I think its because of (*r0++ + *r1++).

Am I right? and if it is, I'd just have to replace that with a call to accumulate_sum, right?
0
 
LVL 11

Expert Comment

by:climbgunks
ID: 24418235

the code I sent compiles fine under gcc ...  can you show your current code?


0
 

Author Comment

by:errang
ID: 24418492
Hm, I just copy pasted the whole thing into my program.

Here it is though.
static inline void do_row_sum(pixel_sum *sums, int dim, pixel *row) {

    int i;

    *sums++ = *row++ + *row;

    for (i=1; i<dim-1; i++)

        *sums++ = *(row-1) + *row++ + *row;

    *sums++ = *(row-1) + *row;

}
 

void smooth(int dim, pixel *src, pixel *dst) 

{

    pixel_sum *row_sums = malloc(3*dim*sizeof(pixel_sum));

    pixel_sum *r0 = row_sums;

    pixel_sum *r1 = r0+dim;

    pixel_sum *r2;

    int i, j;

 

    do_row_sum(r0, dim, src);

    do_row_sum(r1, dim, src+i*dim);

    // calculate row 0

    *dst++ = (*r0++ + *r1++)/4;

    for (i=1; i<dim-1; i++)

        *dst++ = (*r0++ + *r1++)/6;

    *dst++ = (*r0 + *r1)/4;

 

    // calculate all the middle rows

    for (i=1; i<dim-1; i++) {

        r0 = row_sums+((i-1)%3)*dim;

        r1 = row_sums+(i%3)*dim;

        r2 = row_sums+((i+1)%3)*dim;

        do_row_sum(r2, dim, src+(i+1)*dim);

        *dst++ = (*r0++ + *r1++ + *r2++)/6;

        for (j=1; j<dim-1; j++)

            *dst++ = (*r0++ + *r1++ + *r2++)/9;

        *dst++ = (*r0 + *r1 + *r2)/6;

    }

 

    // calculate the last row

    r0 = row_sums+((i-1)%3)*dim;

    r1 = row_sums+(i%3)*dim;

    *dst++ = (*r0++ + *r1++)/4;

    for (i=1; i<dim-1; i++)

        *dst++ = (*r0++ + *r1++)/6;

    *dst = (*r0 + *r1)/4;
 

    free(row_sums);

}

Open in new window

0
 

Author Comment

by:errang
ID: 24418507
Oh... I'm using a gcc compiler too. I'm using a makefile to compile it though, here's what I'm using.
CC = gcc

CFLAGS = -Wall -O2

LIBS = -lm
 

OBJS = driver.o kernels.o fcyc.o clock.o
 

all: driver
 

driver: $(OBJS) fcyc.h clock.h defs.h config.h

	$(CC) $(CFLAGS) $(OBJS) $(LIBS) -o driver
 

handin:

	cp kernels.c $(HANDINDIR)/$(TEAM)-$(VERSION)-kernels.c
 

clean: 

	-rm -f $(OBJS) driver core *~ *.o

Open in new window

0
 
LVL 11

Expert Comment

by:climbgunks
ID: 24418601

Can you show all the gcc error messages?   Because somewhere in that file or an included file pixel_sum and pixel have to be declared.   Otherwise gcc would be giving more error messages than just the one you showed.

In fact, to test gcc    add the following to the top of my code

#include <stdlib.h>
typedef unsigned short pixel;
typedef int pixel_sum;



call the whole thing foo.c

this should then compile cleanly

gcc -c -o foo.o foo.c

If it does, it means you're not including this code w/ the right typedefs   and includes



0
 

Author Comment

by:errang
ID: 24418845
>>In fact, to test gcc    add the following to the top of my code

#include <stdlib.h>
typedef unsigned short pixel;
typedef int pixel_sum;

ahh... here's the problem.

pixel_sum is not an int, at least not in my program =(
typedef struct {

    int red;

    int green;

    int blue;

    int num;

} pixel_sum;
 

static void initialize_pixel_sum(pixel_sum *sum) 

{

    sum->red = sum->green = sum->blue = 0;

    sum->num = 0;

    return;

}
 

static void accumulate_sum(pixel_sum *sum, pixel p) 

{

    sum->red += (int) p.red;

    sum->green += (int) p.green;

    sum->blue += (int) p.blue;

    sum->num++;

    return;

}

Open in new window

0
 

Author Comment

by:errang
ID: 24418883
>>call the whole thing foo.c

>>this should then compile cleanly

Ok, that does compile =)

>>If it does, it means you're not including this code w/ the right typedefs   and includes

But like I said in my previous post, pixel and pixel_sum are sturcts =(.

>>Can you show all the gcc error messages?
fudge[36] [~/perflab-handout/]> gcc -c -o kernels.o kernels.c
kernels.c: In function `do_row_sum':
kernels.c:657: error: invalid operands to binary +
kernels.c:659: error: invalid operands to binary +
kernels.c:660: error: invalid operands to binary +
kernels.c: In function `smooth':
kernels.c:715: error: invalid operands to binary +
kernels.c:717: error: invalid operands to binary +
kernels.c:718: error: invalid operands to binary +
kernels.c:726: error: invalid operands to binary +
kernels.c:728: error: invalid operands to binary +
kernels.c:729: error: invalid operands to binary +
kernels.c:735: error: invalid operands to binary +
kernels.c:737: error: invalid operands to binary +
kernels.c:738: error: invalid operands to binary +

0
 
LVL 11

Accepted Solution

by:
climbgunks earned 500 total points
ID: 24418966
ah.. my bad.

this increases the memory used by a factor of 3.


#include <stdlib.h>
 

typedef struct {

    int red;

    int green;

    int blue;

    int num;

} pixel_sum;
 

typedef struct {

    unsigned short red;

    unsigned short green;

    unsigned short blue;

} pixel;
 

static inline void add_pixel_2(pixel_sum *sum, pixel *a, pixel *b) {

    sum->red = a->red + b->red;

    sum->green = a->green + b->green;

    sum->blue = a->blue + b->blue;

}
 

static inline void add_pixel_3(pixel_sum *sum, pixel *a, pixel *b, pixel *c) {

    sum->red = a->red + b->red + c->red;

    sum->green = a->green + b->green + c->green;

    sum->blue = a->blue + b->blue + c->blue;

}
 

static inline void do_row_sum(pixel_sum *sums, int dim, pixel *row) {

    int i;

    add_pixel_2(sums++, row++, row);

    for (i=1; i<dim-1; i++)

        add_pixel_3(sums++, row-1, row++, row);

    add_pixel_2(sums, row-1, row);

}
 

static inline void avg_pixel_sums_2(pixel *dst, pixel_sum *a, pixel_sum *b, int num) {

    dst->red = (a->red + b->red)/num;

    dst->green = (a->green + b->green)/num;

    dst->blue = (a->blue + b->blue)/num;

}
 

static inline void avg_pixel_sums_3(pixel *dst, pixel_sum *a, pixel_sum *b, pixel_sum *c, int num) {

    dst->red = (a->red + b->red + c->red)/num;

    dst->green = (a->green + b->green + c->green)/num;

    dst->blue = (a->blue + b->blue + c->blue)/num;

}
 

void smooth(int dim, pixel *src, pixel *dst) {

    pixel_sum *row_sums = malloc(3*dim*sizeof(pixel_sum));

    pixel_sum *r0 = row_sums;

    pixel_sum *r1 = r0+dim;

    pixel_sum *r2;

    int i, j;
 

    do_row_sum(r0, dim, src);

    do_row_sum(r1, dim, src+i*dim);

    // calculate row 0

    avg_pixel_sums_2(dst++, r0++, r1++, 4);

    for (i=1; i<dim-1; i++)

        avg_pixel_sums_2(dst++, r0++, r1++, 6);

    avg_pixel_sums_2(dst++, r0, r1, 4);
 

    // calculate all the middle rows

    for (i=1; i<dim-1; i++) {

        r0 = row_sums+((i-1)%3)*dim;

        r1 = row_sums+(i%3)*dim;

        r2 = row_sums+((i+1)%3)*dim;

        do_row_sum(r2, dim, src+(i+1)*dim);

        avg_pixel_sums_3(dst++, r0++, r1++, r2++, 6);

        for (j=1; j<dim-1; j++)

            avg_pixel_sums_3(dst++, r0++, r1++, r2++, 9);

        avg_pixel_sums_3(dst++, r0, r1, r2, 6);

    }
 

    // calculate the last row

    r0 = row_sums+((i-1)%3)*dim;

    r1 = row_sums+(i%3)*dim;

    avg_pixel_sums_2(dst++, r0++, r1++, 4);

    for (i=1; i<dim-1; i++)

        avg_pixel_sums_2(dst++, r0++, r1++, 6);

    avg_pixel_sums_2(dst, r0, r1, 4);

}

Open in new window

0
 

Author Comment

by:errang
ID: 24419198
Hm... got a few more errors =(

kernels.c:669: warning: operation on `row' may be undefined
kernels.c:671: warning: operation on `row' may be undefined
kernels.c:671: warning: operation on `row' may be undefined

I think those are referring to this function:
static inline void do_row_sum(pixel_sum *sums, int dim, pixel *row) {
  int i;
  add_pixel_2(sums++, row++, row);
  for (i=1; i<dim-1; i++)
    add_pixel_3(sums++, row-1, row++, row);
  add_pixel_2(sums, row-1, row);
}

And apparently the program thinks there's one error here:
ERROR: Dimension=96, 288 errors
E.g.,
You have dst[2][95].{red,green,blue} = {20835,18223,11591}
It should be dst[2][95].{red,green,blue} = {18245,16789,16176}
Benchmark "smooth: Current working version" failed correctness check for dimension 96.


0
 
LVL 11

Assisted Solution

by:climbgunks
climbgunks earned 500 total points
ID: 24419319

If you replace the do_row_sum() function with this one, does this make the warning on compilation go away?


static inline void do_row_sum(pixel_sum *sums, int dim, pixel *row) {

    int i;

    add_pixel_2(sums++, row, row+1);

    row++;

    for (i=1; i<dim-1; i++) {

        add_pixel_3(sums++, row-1, row, row+1);

        row++;

    }

    add_pixel_2(sums, row-1, row);

}

Open in new window

0
 
LVL 11

Assisted Solution

by:climbgunks
climbgunks earned 500 total points
ID: 24419356
found one other problem...   changed the last line shown below.  Initially it had src+i*dim.


void smooth(int dim, pixel *src, pixel *dst) {

    pixel_sum *row_sums = malloc(3*dim*sizeof(pixel_sum));

    pixel_sum *r0 = row_sums;

    pixel_sum *r1 = r0+dim;

    pixel_sum *r2;

    int i, j;
 

    do_row_sum(r0, dim, src);

    do_row_sum(r1, dim, src+dim);

Open in new window

0
 

Author Comment

by:errang
ID: 24424875
>>If you replace the do_row_sum() function with this one, does this make the warning on compilation go away?

yes it does.

It works perfectly now =), thanks a lot for your help.

Initially it shows that it increases the performance by a factor of 2:

Smooth: Version = smooth: Current working version:
Dim             32      64      128     256     512     Mean
Your CPEs       165.4   177.0   180.0   179.5   333.5
Baseline CPEs   379.5   401.6   403.2   398.8   397.2
Speedup         2.3     2.3     2.2     2.2     1.2     2.0

But it can go much higher right? I ask this because of the silly random number generator our program is getting tested by. After 10 or so runs its improved its performance a bit more:

Smooth: Version = smooth: Current working version:
Dim             32      64      128     256     512     Mean
Your CPEs       83.8    160.0   171.7   173.4   326.8
Baseline CPEs   379.5   401.6   403.2   398.8   397.2
Speedup         4.5     2.5     2.3     2.3     1.2     2.4

0
 

Author Comment

by:errang
ID: 24424891
Just wanted to know how high it could go =)
0
 

Author Comment

by:errang
ID: 24424948
See, now it increases performance by 3...

Smooth: Version = smooth: Current working version:
Dim             32      64      128     256     512     Mean
Your CPEs       40.9    150.4   164.4   169.0   219.1
Baseline CPEs   379.5   401.6   403.2   398.8   397.2
Speedup         9.3     2.7     2.5     2.4     1.8     3.0

Just wanted to know how high it was designed to improve performance so I know how long I need to recompile and run it =)
0
 
LVL 11

Expert Comment

by:climbgunks
ID: 24424977

I would guess it's just the test server being more or less busy...or cache effects on loading the program.   Re-compiling won't help, but running it multiple times in succession might.

Btw, those number don't make a lot of sense...  there's something wrong about the 512 and Mean columns.   They don't divide properly, and the Mean can't be higher than all of the other CPEs in the row.

0
 

Author Comment

by:errang
ID: 24425047
>>Btw, those number don't make a lot of sense...  there's something wrong about the 512 and Mean columns.   They don't divide properly, and the Mean can't be higher than all of the other CPEs in the row.

Yes, I agree with you there. The mean column doesn't work properly for the rotate function...

>>Re-compiling won't help, but running it multiple times in succession might.
yeah, I meant running it over and over again, so far I've ran it bout 10 times.  I just wanted to get your opinion on how fast it can go =), because I'd probably be running this till midnight to see if it can keep topping itself... lol

But the formatting got messed up here, this is what it looks like:
Smooth: Version = smooth: Current working version:

Dim             32      64      128     256     512     Mean

Your CPEs       40.9    150.4   164.4   169.0   219.1

Baseline CPEs   379.5   401.6   403.2   398.8   397.2

Speedup         9.3     2.7     2.5     2.4     1.8     3.0

Open in new window

0
 
LVL 11

Expert Comment

by:climbgunks
ID: 24425144
ah...ok.. properly formatted, it looks ok.    I had the columns misaligned in my head.    I don't like the drop-off in performance for larger N.    Might look at something else later to try to keep more of it in cache.

0
 

Author Comment

by:errang
ID: 24425168
>>Might look at something else later to try to keep more of it in cache.

Ah, ok, thanks =)
0

Featured Post

Top 6 Sources for Identifying Threat Actor TTPs

Understanding your enemy is essential. These six sources will help you identify the most popular threat actor tactics, techniques, and procedures (TTPs).

Join & Write a Comment

An Outlet in Cocoa is a persistent reference to a GUI control; it connects a property (a variable) to a control.  For example, it is common to create an Outlet for the text field GUI control and change the text that appears in this field via that Ou…
This is a short and sweet, but (hopefully) to the point article. There seems to be some fundamental misunderstanding about the function prototype for the "main" function in C and C++, more specifically what type this function should return. I see so…
The goal of this video is to provide viewers with basic examples to understand opening and writing to files in the C programming language.
The goal of this video is to provide viewers with basic examples to understand recursion in the C programming language.

762 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

19 Experts available now in Live!

Get 1:1 Help Now