Link to home
Start Free TrialLog in
Avatar of siakhooi
siakhooi

asked on

Performance of Register variable and Array variable

Hi,
  I am using RedHat 8 running on Intel Pentium III 533Mhz Dell PC, GCC 3.2
  I have some codes that I need to read data from file, process it, and write back to file, and the total time used are recorded. as below:

// declaration
  register unsigned long A, B, C, D;
  unsigned long data[2048];

I will read data from file into data[2048]
then do a loop processing of the data using the A, B, C, D (4 unsigned long per loop) and result in A, B, C, D (it is actually modified MD5 processing)

  and lastly I need to XOR the result back with the data:
  data[i++]^=A; data[i++]^=B; data[i++]^=C; data[i++]^=D;

  after process all 2048 unsigned long, write data[2048] to output file.

and I found with the the following xor
  data[i++]^=A; data[i++]^=B; data[i++]^=C; data[i++]^=D;
the result is around 3.6 seconds.
but if the statement become
  A^=data[i++]; B^=data[i++]; C^=data[i++]; D^=data[i++];

the result is only 1.9 seconds.
but it is not the case because I can't write A, B, C, D to output file directly. (Even I used putchar, the result even worst)

I need your advice how to solve the problem.

thanks.

Avatar of bryanh
bryanh

>I need your advice how to solve the problem

But you didn't say what the problem is.

Is the problem that you need for the job to be done in 1.9 seconds?  It is apparently not possible on your system to do it that fast.

Is the problem that you don't understand why the 2nd variation takes less time than the first?  Here's why:  The data[i++] ^= A does all the same things as A ^= data[i++], PLUS stores the result in memory.  An exlusive or operation takes place in registers.  When memory locations (such as data[i] are involved, the program has to load or store in addition to doing the xor.

btw, the "register" data attribute is ignored by Gcc.  Gcc decides for itself what variables should be registers.
Avatar of siakhooi

ASKER

Oh. is my problem.  :-)

OK. What I actually need is how to reduce the time from 3.6s to less than 2s.

thanks.
You can probably squeeze a little more speed out of it by making it an array of 512 of struct {unsigned long A, unsigned long B, unsigned long C, unsigned long D} and thus incrementing the index 1/4 as many times.  Gcc's -funroll_loops optimization option may help a little too.

But you really can't get around the time it takes to store 2048 words into memory.
hi,
i am not too sure,but will O3 flag in compilation help?
no, even I put -mcpu=pentium3 also no help.

No comment has been added lately, so it's time to clean up this TA.
I will leave the following recommendation for this question in the Cleanup topic area:

PAQ with points refunded

Please leave any comments here within the next seven days.
PLEASE DO NOT ACCEPT THIS COMMENT AS AN ANSWER!

jmcg
EE Cleanup Volunteer
ASKER CERTIFIED SOLUTION
Avatar of modulo
modulo

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial