siakhooi
asked on
Performance of Register variable and Array variable
Hi,
I am using RedHat 8 running on Intel Pentium III 533Mhz Dell PC, GCC 3.2
I have some codes that I need to read data from file, process it, and write back to file, and the total time used are recorded. as below:
// declaration
register unsigned long A, B, C, D;
unsigned long data[2048];
I will read data from file into data[2048]
then do a loop processing of the data using the A, B, C, D (4 unsigned long per loop) and result in A, B, C, D (it is actually modified MD5 processing)
and lastly I need to XOR the result back with the data:
data[i++]^=A; data[i++]^=B; data[i++]^=C; data[i++]^=D;
after process all 2048 unsigned long, write data[2048] to output file.
and I found with the the following xor
data[i++]^=A; data[i++]^=B; data[i++]^=C; data[i++]^=D;
the result is around 3.6 seconds.
but if the statement become
A^=data[i++]; B^=data[i++]; C^=data[i++]; D^=data[i++];
the result is only 1.9 seconds.
but it is not the case because I can't write A, B, C, D to output file directly. (Even I used putchar, the result even worst)
I need your advice how to solve the problem.
thanks.
I am using RedHat 8 running on Intel Pentium III 533Mhz Dell PC, GCC 3.2
I have some codes that I need to read data from file, process it, and write back to file, and the total time used are recorded. as below:
// declaration
register unsigned long A, B, C, D;
unsigned long data[2048];
I will read data from file into data[2048]
then do a loop processing of the data using the A, B, C, D (4 unsigned long per loop) and result in A, B, C, D (it is actually modified MD5 processing)
and lastly I need to XOR the result back with the data:
data[i++]^=A; data[i++]^=B; data[i++]^=C; data[i++]^=D;
after process all 2048 unsigned long, write data[2048] to output file.
and I found with the the following xor
data[i++]^=A; data[i++]^=B; data[i++]^=C; data[i++]^=D;
the result is around 3.6 seconds.
but if the statement become
A^=data[i++]; B^=data[i++]; C^=data[i++]; D^=data[i++];
the result is only 1.9 seconds.
but it is not the case because I can't write A, B, C, D to output file directly. (Even I used putchar, the result even worst)
I need your advice how to solve the problem.
thanks.
ASKER
Oh. is my problem. :-)
OK. What I actually need is how to reduce the time from 3.6s to less than 2s.
thanks.
OK. What I actually need is how to reduce the time from 3.6s to less than 2s.
thanks.
You can probably squeeze a little more speed out of it by making it an array of 512 of struct {unsigned long A, unsigned long B, unsigned long C, unsigned long D} and thus incrementing the index 1/4 as many times. Gcc's -funroll_loops optimization option may help a little too.
But you really can't get around the time it takes to store 2048 words into memory.
But you really can't get around the time it takes to store 2048 words into memory.
hi,
i am not too sure,but will O3 flag in compilation help?
i am not too sure,but will O3 flag in compilation help?
ASKER
no, even I put -mcpu=pentium3 also no help.
No comment has been added lately, so it's time to clean up this TA.
I will leave the following recommendation for this question in the Cleanup topic area:
PAQ with points refunded
Please leave any comments here within the next seven days.
PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER!
jmcg
EE Cleanup Volunteer
I will leave the following recommendation for this question in the Cleanup topic area:
PAQ with points refunded
Please leave any comments here within the next seven days.
PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER!
jmcg
EE Cleanup Volunteer
ASKER CERTIFIED SOLUTION
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
But you didn't say what the problem is.
Is the problem that you need for the job to be done in 1.9 seconds? It is apparently not possible on your system to do it that fast.
Is the problem that you don't understand why the 2nd variation takes less time than the first? Here's why: The data[i++] ^= A does all the same things as A ^= data[i++], PLUS stores the result in memory. An exlusive or operation takes place in registers. When memory locations (such as data[i] are involved, the program has to load or store in addition to doing the xor.
btw, the "register" data attribute is ignored by Gcc. Gcc decides for itself what variables should be registers.