Link to home
Start Free TrialLog in
Avatar of mrwad99
mrwad99Flag for United Kingdom of Great Britain and Northern Ireland

asked on

Cache hits and optimizing 'for' loops: query

Ah hello.

I am reading about spatial locality of reference from http://goo.gl/ZOsHta, and how it affects how we should loop through 2D arrays.  I refer to this article throughout this question, as I think it is wrong, or my understanding is wrong (more likely), which is what this question is all about:)

(Note: I have picked the zones I feel experts may reside in that could help with this; I apologize if I could have made better choices)

Consider the following array:
int arr[4][4] = { {0,1,2,3}, {4,5,6,7}, {8,9,10,11}, {12,13,14,15} };

Open in new window

Now, there are two ways this can be looped through.  The first:
for (int row = 0; row < 4; row++)
{
	for (int col = 0; col < 4; col++)
	{
		// Deal with element [row][col]
	}
}

Open in new window

and the second:
for (int col = 0; col < 4; col++)
{
	for (int row = 0; row < 4; row++)
	{
		// Deal with element [row][col]
	}
}

Open in new window

(For the purposes of this question, please ignore the fact that if we wanted to output the array elements in numeric order, we would have to use the first method.)

The linked article refers to a six byte cache which can be read from to save going to RAM, which seems to hold the memory being accessed plus the next five bytes,

Question 1: (This is fundamental to my understanding)

When this cache is exhausted (in my case of it being six bytes, when all six bytes have been accessed) *or* when we attempt to access a memory location that is not stored in the cache, do we then reload it with a new set of six bytes, starting at the memory location which we next access, or which caused the miss, then including the following five?  (I am assuming this is correct in what follows.)


What follows is a simple loop that outputs the array using both loops, with the output annotated to show what I think is happening regarding cache hits and misses:

First loop:
int arr[4][4] = { {0,1,2,3}, {4,5,6,7}, {8,9,10,11}, {12,13,14,15} };

for (int row = 0; row < 4; row++)
{
	for (int col = 0; col < 4; col++)
	{
		printf("[%d][%d]\n", row, col);
	}
}

Open in new window

[0][0]       Hit (cache:[0][0], [0][1], [0][2], [0][3], [1][0], [1][1])
[0][1]       Hit
[0][2]       Hit
[0][3]       Hit
[1][0]       Hit
[1][1]       Hit
[1][2]       Miss: reload cache, its now [1][2], [1][3], [2][0], [2][1], [2][2], [2][3]
[1][3]       Hit
[2][0]       Hit
[2][1]       Hit
[2][2]       Hit
[2][3]       Hit
[3][0]       Miss: reload cache, its now [3][1], [3][2], [3][3]
[3][1]       Hit
[3][2]       Hit
[3][3]       Hit

Now, the article states that this method will cause three cache misses; however, I can see only two.  The only other miss that could possibly occur is right at the very start; but I don't see really how this could be a miss, as the cache is not yet filled..,? (Or is it??)

Question 2:
Are there three misses, or am I right in the above with two?

Second loop:
int arr[4][4] = { {0,1,2,3}, {4,5,6,7}, {8,9,10,11}, {12,13,14,15} };

for (int col = 0; col < 4; col++)
{
	for (int row = 0; row < 4; row++)
	{
		printf("[%d][%d] is %d\n", row, col, arr[row][col]);
	}
}

Open in new window

[0][0]       Hit (cache:[0][0], [0][1], [0][2], [0][3], [1][0], [1][1])
[1][0]       Hit
[2][0]       Miss: reload cache, its now [2][0], [2][1], [2][2], [2][3], [3][0], [3][1]
[3][0]       Hit
[0][1]       Miss: reload cache, its now [0][1], [0][2], [0][3], [1][0], [1][1], [1][2]
[1][1]       Hit
[2][1]       Miss: reload cache, its now [2][1], [2][2], [2][3], [3][0], [3][1], [3][2]
[3][1]       Hit
[0][2]       Miss: reload cache, its now [0][2], [0][3], [1][0], [1][1], [1][2], [1][3]
[1][2]       Hit
[2][2]       Miss: reload cache, its now [2][2], [2][3], [3][0], [3][1], [3][2], [3][3]
[3][2]       Hit
[0][3]       Miss: reload cache, its now [0][3], [1][0], [1][1], [1][2], [1][3], [2][0]
[1][3]       Hit
[2][3]       Miss: reload cache, its now [2][3], [3][0], [3][1], [3][2], [3][3]
[3][3]       Hit

Question 3

Now, the article states this looping method will cause 12 cache misses; I only see 7.  Have I gotten this wrong, or is the article wrong?

I would greatly appreciate input on these three queries.

TIA
Avatar of d-glitch
d-glitch
Flag of United States of America image

For Q1, you don't load the cache until you try to read [0][0]

For Q2, the fist cache load is the same as in Q1:
    cache:[0][0], [0][1], [0][2], [0][3], [1][0], [1][1]

The array is stored in memory the same way ineach case.
The point is to know how it is stored and go with the flow.
You are correct about Q1.

My comments for Q1 and Q2 above are for Q2 and Q3 respectively.
On 2nd thought, I agree with your analysis for Q3 except that the first read is a miss too.
Avatar of mrwad99

ASKER

Thanks for coming back to me on this one.

"For Q2, the fist cache load is the same as in Q1:
    cache:[0][0], [0][1], [0][2], [0][3], [1][0], [1][1]"

Now, applying what you said about "comments for Q1 and Q2 above are for Q2 and Q3 respectively" I am not sure what you mean with the first comment: I have checked my original question and I show the caches as the same in both question 2 and question 3: they are both (cache:[0][0], [0][1], [0][2], [0][3], [1][0], [1][1]).  Can you clarify please?

Secondly, you say that the first read is a cache miss too; in this case, the number of misses would go from 2 to 3 in the first instance, and from 7 to eight in the second instance?  Is this correct - you mention only that the first read in Q3 is a miss, with no reference to Q2, so that's why I am checking :)
ASKER CERTIFIED SOLUTION
Avatar of d-glitch
d-glitch
Flag of United States of America image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of mrwad99

ASKER

Thanks very much :)