Solved

# Matrix Blocking Code Optimization in C. HELP!!!

Posted on 2010-11-19

I have to optimize this code by improving locality in order to make my program run in less clock cycles overall. I realize that the cache size and exact standard of clock cycle increase isn't exactly specified yet, but I'm just trying to get started.

int readArray(int* arr, int n, int r, int c)

{

return *(arr + (r * n) + c);

}

void writeArray(int* arr, int n, int r, int c, int value)

{

*(arr + (r * n) + c) = value;

}

void finalMatrix(int n, int* A, int* result)

{

int r, c;

for (r = 0; r < n; r++)

{

for (c = 0; c < n; c++)

{

// compute and write the value for the result array

writeArray( result, n, r, c, ( readArray(A, n, r, c) * readArray(A, n, r, c) ) );`

}

}

}

CODE IDEA: The initial code computes a final matrix whose elements ( row , column ) is the product of an input matrice's (here called "A") similar (row, column) * (column, row).

I realize that I need to use matrix blocking to allow for less cache misses when accessing the elements of A to produce the result matrix.

I'm just wondering how I should go about this.

I will eventually play around with the matrix blocking size, but for starters I wanted to grab like n/4 x n/4 blocks at a time.

obviously the matrix "A" has good spatial locality when it goes in row major order (r,c) but when it goes by column this is terrible. I was initially thinking of transposing the matrix "A" into another matrix in which I can then use row major order again, but I think just using the same way as (r,c)*(c,r) with smaller matrix blocks will produce efficient results.

Please help me get on the road here.

---------------------------------------------------------------------