Solved
Matrix Blocking Code Optimization in C. HELP!!!
Posted on 2010-11-19
I have to optimize this code by improving locality in order to make my program run in less clock cycles overall. I realize that the cache size and exact standard of clock cycle increase isn't exactly specified yet, but I'm just trying to get started.
int readArray(int* arr, int n, int r, int c)
{
return *(arr + (r * n) + c);
}
void writeArray(int* arr, int n, int r, int c, int value)
{
*(arr + (r * n) + c) = value;
}
void finalMatrix(int n, int* A, int* result)
{
int r, c;
for (r = 0; r < n; r++)
{
for (c = 0; c < n; c++)
{
// compute and write the value for the result array
writeArray( result, n, r, c, ( readArray(A, n, r, c) * readArray(A, n, r, c) ) );`
}
}
}
CODE IDEA: The initial code computes a final matrix whose elements ( row , column ) is the product of an input matrice's (here called "A") similar (row, column) * (column, row).
I realize that I need to use matrix blocking to allow for less cache misses when accessing the elements of A to produce the result matrix.
I'm just wondering how I should go about this.
I will eventually play around with the matrix blocking size, but for starters I wanted to grab like n/4 x n/4 blocks at a time.
obviously the matrix "A" has good spatial locality when it goes in row major order (r,c) but when it goes by column this is terrible. I was initially thinking of transposing the matrix "A" into another matrix in which I can then use row major order again, but I think just using the same way as (r,c)*(c,r) with smaller matrix blocks will produce efficient results.
Please help me get on the road here.
---------------------------------------------------------------------