• Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 192
  • Last Modified:

Performance of Register variable and Array variable

Hi,
  I am using RedHat 8 running on Intel Pentium III 533Mhz Dell PC, GCC 3.2
  I have some codes that I need to read data from file, process it, and write back to file, and the total time used are recorded. as below:

// declaration
  register unsigned long A, B, C, D;
  unsigned long data[2048];

I will read data from file into data[2048]
then do a loop processing of the data using the A, B, C, D (4 unsigned long per loop) and result in A, B, C, D (it is actually modified MD5 processing)

  and lastly I need to XOR the result back with the data:
  data[i++]^=A; data[i++]^=B; data[i++]^=C; data[i++]^=D;

  after process all 2048 unsigned long, write data[2048] to output file.

and I found with the the following xor
  data[i++]^=A; data[i++]^=B; data[i++]^=C; data[i++]^=D;
the result is around 3.6 seconds.
but if the statement become
  A^=data[i++]; B^=data[i++]; C^=data[i++]; D^=data[i++];

the result is only 1.9 seconds.
but it is not the case because I can't write A, B, C, D to output file directly. (Even I used putchar, the result even worst)

I need your advice how to solve the problem.

thanks.

0
siakhooi
Asked:
siakhooi
1 Solution
 
bryanhCommented:
>I need your advice how to solve the problem

But you didn't say what the problem is.

Is the problem that you need for the job to be done in 1.9 seconds?  It is apparently not possible on your system to do it that fast.

Is the problem that you don't understand why the 2nd variation takes less time than the first?  Here's why:  The data[i++] ^= A does all the same things as A ^= data[i++], PLUS stores the result in memory.  An exlusive or operation takes place in registers.  When memory locations (such as data[i] are involved, the program has to load or store in addition to doing the xor.

btw, the "register" data attribute is ignored by Gcc.  Gcc decides for itself what variables should be registers.
0
 
siakhooiAuthor Commented:
Oh. is my problem.  :-)

OK. What I actually need is how to reduce the time from 3.6s to less than 2s.

thanks.
0
 
bryanhCommented:
You can probably squeeze a little more speed out of it by making it an array of 512 of struct {unsigned long A, unsigned long B, unsigned long C, unsigned long D} and thus incrementing the index 1/4 as many times.  Gcc's -funroll_loops optimization option may help a little too.

But you really can't get around the time it takes to store 2048 words into memory.
0
Technology Partners: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
veerunsCommented:
hi,
i am not too sure,but will O3 flag in compilation help?
0
 
siakhooiAuthor Commented:
no, even I put -mcpu=pentium3 also no help.

0
 
jmcgOwnerCommented:
No comment has been added lately, so it's time to clean up this TA.
I will leave the following recommendation for this question in the Cleanup topic area:

PAQ with points refunded

Please leave any comments here within the next seven days.
PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER!

jmcg
EE Cleanup Volunteer
0
 
moduloCommented:
PAQed, with points refunded (100)

modulo
Community Support Moderator
0

Featured Post

What does it mean to be "Always On"?

Is your cloud always on? With an Always On cloud you won't have to worry about downtime for maintenance or software application code updates, ensuring that your bottom line isn't affected.

Tackle projects and never again get stuck behind a technical roadblock.
Join Now