asked on

For Loop Question

What does the second for loop do? Specifically "*p", how does that for loop knows when to stop?

#include <stdio.h>
int main (int argc, char *argv[])
{
int i,n;
char *p;
printf("argc is: %d\n", argc);

for(i=0;i<argc;i++)
{
for(p=argv[i],n=0; *p;n++,p++);
printf("\n%s %d",argv[i],n);
};
}

ozo

the loop stops when *p is false, i.e. 0

lbertacco

The first loop loops over all arguments passed to the program. The second loop loops over all chars of a given argument counting them (and then it print the argument and its length).
for example if you run the program (say test.exe) as
test hi
it would print
test 4
hi 2
In other words it does the same as:

#include <stdio.h>
int main (int argc, char *argv[])
{
int i;
printf("argc is: %d\n", argc);
for(i=0;i<argc;i++) printf("\n%s %d",argv[i],strlen(argv[i]));
}

ASKER CERTIFIED SOLUTION

itsmeandnobodyelse

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

ozo

for(p=argv[i],n=0; *p;n++,p++);
is equivalent to
p=argv[i],n=0; while( *p ){ n++,p++}

lbertacco

Well if you need to spare any nanosecond, the code
>>>> for(p=argv[i],n=0; *p;n++,p++);
is surely not the fastest idea to go with. For example
for(p=argv[i];*p;p++);
n=p-argv[i];
is already twice as fast.
And you can get even faster using the assembly SCAN instruction.

itsmeandnobodyelse

>>>> is already twice as fast

Running both the loops a billion of times (using a 10 bytes string value) gives the times below (milliseconds on a W2k, 3.0 P4)

for(p=argv[0],n=0; *p;n++,p++) 10594
for(p=argv[i];*p;p++)n=p-argv[i]; 9312

So, it is only 12% advantage.

lbertacco

Well if you really tested that code, I"m surprise it is any faster at all.
In fact, you missed a ";" between "...p++)" and "n=...". Try with
for(p=argv[i];*p;p++); n=p-argv[i];
and eventually with longer strings ;-)
I'm just curious. It's not twice as fast, you are right. It makes half integer increments but still it does 1 increment, 1 dereference and a test per cycle versus 2 increments, 1 dereference and a test.

itsmeandnobodyelse

>>>> In fact, you missed a ";" between "...p++)"
It is only a printout.

Adding the strlen it gives

for(p=argv[0],n=0; *p;n++,p++); 10610
for(p=argv[0];*p;p++);n=p-argv[0]; 10547
n=strlen(argv[0]); 44437

what means that this time the differences between both the loops were < 0.1 ns and about 34 ns between strlen and either loop.

itsmeandnobodyelse

Running it a few times I got as well the first loop and the second loop best. Differences were from +300 to -1500.

itsmeandnobodyelse

Here the test code:
#include <windows.h> // GetTickCount();
int main(int argc, char* argv[])
{
int start = 0;
int end = 1000000000; // 1 billion
int c1 = 0;
char* p;
int n;
int sig = 1;
long l0 = GetTickCount();
{
for (int i = start; i < end; ++i)
{
for(p=argv[0],n=0; *p;n++,p++);
c1 += n * sig;
sig = -sig;
}
}
int c2 = 0;
long l1 = GetTickCount();
{
for (int i = start; i < end; ++i)
{
for(p=argv[0];*p;p++);
n=p-argv[0];
c2 += n * sig;
sig = -sig;
}

}
long l2 = GetTickCount();

int c3 = 0;
{
for (int i = start; i < end; ++i)
{
n=strlen(argv[0]);
c3 += n * sig;
sig = -sig;
}

}
long l3 = GetTickCount();

cout << "for(p=argv[0],n=0; *p;n++,p++); " << setw(10) << right << l1 - l0 << setw(10) << right << c1 << endl;
cout << "for(p=argv[0];*p;p++);n=p-argv[0];" << setw(10) << right << l2 - l1 << setw(10) << right << c2 << endl;
cout << "n=strlen(argv[0]); " << setw(10) << right << l3 - l2 << setw(10) << right << c3 << endl;
return 0;
}

The c1, c2, c3 is to prevent from optimizing the outer loop.

lbertacco

Even more out of topic and time wasting :-)

I modified your program to split the 3 versions inside 3 different functions, so that they can get optimized in the same way. I have added a test on the output value (otherwise the optimizer can figure out that it doesn't really need to compute the string length as that value is never used) but removed other computations on the result otherwise those can alter the results. See code below.
Compiling in debug mode and running it on a 30char long string I get:
for(p=s,n=0; *p;n++,p++); 11062
for(p=s;*p; p++);n=p-s; 11672
n=strlen(s); 2234
so I was wrong. But if I change from for(p=s;*p; p++);n=p-s; to for(p=s;*p++;);n=p-s-1;
I get
for(p=s,n=0; *p;n++,p++); 10984
for(p=s;*p++;);n=p-s; 9688
n=strlen(s); 2234
and in this case the second version wins over the first one by about 10%. And strlen is much faster.
In release/optimized mode, checking the assembly output I can see that the optimizer has figured out that it doesn't really need to increment both n and p in the first version, so they differ much less:
for(p=s,n=0; *p;n++,p++); 3422
for(p=s;*p++;);n=p-s-1; 3297
n=strlen(s); 3359
The second version wins again (by a smaller margin), however I haven't found out why now strlen is slower than its debug counterpart (which was the overall winner).

#include <windows.h> // GetTickCount();
#include <iostream>
#include <iomanip>
using namespace std;

#define ITER 100000000

void f1(char *s)
{
char* p;
int n;
for (int i = 0; i < ITER; ++i) {
for(p=s,n=0; *p;n++,p++);
if(n==0) exit(0);
}
}

void f2(char *s)
{
char* p;
int n;
for (int i = 0; i < ITER; ++i) {
for(p=s; *p++;);
if(p-s-1==0) exit(0);
}
}

void f3(char *s)
{
for (int i = 0; i < ITER; ++i) {
if(strlen(s)==0) exit(0);
}
}

int main(int argc, char* argv[])
{
long l0 = GetTickCount();
f1(argv[1]);
long l1 = GetTickCount();
f2(argv[1]);
long l2 = GetTickCount();
f3(argv[1]);
long l3 = GetTickCount();
cout << "for(p=s,n=0; *p;n++,p++); " << setw(10) << right << l1 - l0 << endl;
cout << "for(p=s;*p++;);n=p-s-1;" << setw(10) << right << l2 - l1 << endl;
cout << "n=strlen(s); " << setw(10) << right << l3 - l2 << endl;
return 0;
}

itsmeandnobodyelse

>>>> Compiling in debug mode
Actually you can't make any statements if running in debug mode. Especially if measuring statements which were compiled to a few assembler operations only.

I compiled it in release mode only and got these results:

for(p=s,n=0; *p;n++,p++); 4234
for(p=s;*p++;);n=p-s-1; 4219
n=strlen(s); 7172

what shows that my C runtime is slower than your's.

>>>> #define ITER 100000000
it is only 100 millions not a billion so we have to multiply your times by 10. They can be explained that in my sample I used a 7 bytes string while your's was about 30 char. Setting ITER to 1,000,000,000 and passing a 7 bytes argument I got

for(p=s,n=0; *p;n++,p++); 8891
for(p=s;*p++;);n=p-s-1; 9843
n=strlen(s); 44485

what is pretty much the same as above though it is not quite clear why the first loop now always is better.

Changing the functions by macros, e, g.

#define f1(s, c) \
{\
char* p; \
int n;\
for (int i = 0; i < ITER; ++i) {\
for(p=s,n=0; *p;n++,p++);\
if(!(c = n)) exit(0);\
}\
}

#define f2(s, c)\
{\
char* p;\
for (int i = 0; i < ITER; ++i) {\
for(p=s; *p++;);\
if(!(c = p-s)) exit(0);\
}\
}

#define f3(s, c)\
{\
for (int i = 0; i < ITER; ++i) {\
if(!(c = strlen(s))) exit(0);\
}\
}

Note, I passed a second variable which I later used in the output statements, but it doesn't give any difference:

itsmeandnobodyelse

>>>> Changing the functions by macros,

... doesn't make changes either.

lbertacco

Macros don't make any difference because the compiler is inlining the functions anyway in release mode (so it achieve the same result as your macros) and by the way the functions are called just 3 times overall , not 3 times 1 billion so you cannot notice the time difference.

The reason that the first version wins with short strings is because the second version has a faster inner cycle but needs to calculate a subtraction at the end. So , if the loop is short, the subtraction cost can exceed the time saved during the loop. The longer the strings, the faster the second version gets (compared to the first version)

Still can you see if your strlen is faster in debug mode (your strlen in release mode is surprisingly slow)?
I'll stop here because I don't want to annoy to much everybody else...

itsmeandnobodyelse

>>>> The reason that the first version wins with short strings is
It doesn't win. It only won with my code (sometimes).

>>>> can you see if your strlen is faster in debug mode
for(p=s,n=0; *p;n++,p++); 30390
for(p=s;*p++;);n=p-s-1; 27125
n=strlen(s); 16750

Yes. I assume it is cause the debugger actually is 'optimizing' by invoking strlen in run-time mode rather than in compiled mode. That way it can use processor and chipset features like the SCAN you mentioned above while the compiled assembly doesn't. Another idea is that the debugger actually is running a different C runtime or put it somewhere where it is not swapped.