C / C++ main function prototypes

evilrixEngineering Manager
CERTIFIED EXPERT
An expert in cross-platform ANSI C/C++ development, specialising in meta-template programming and low latency scalable architecture design.
Published:
Updated:
This is a short and sweet, but (hopefully) to the point article. There seems to be some fundamental misunderstanding about the function prototype for the "main" function in C and C++, more specifically what type this function should return. I see so many programmers use void as the return type. People, I'm sorry to tell you but that's just plain wrong!

The C/C++ standards are very (VERY) clear about the prototype for the main function. It can be one of the following two (and only the following two) formats:
 
int main(void) { /*...*/ }

Open in new window

and
 
int main(int argc, char *argv[]) { /*...*/ }

Open in new window


Any other prototype is ill-defined and will result in undefined behaviour. Don't be fooled into thinking that it must be okay to have void as the return type because, if it wasn't the compiler would chuck an error. Actually, the C / C++ standard does not require the compiler to do this. All the standard states regarding this matter is:

"If the main function executes a return that specifies no value, the termination status returned to the host environment is undefined."
Now, the chances are that you'll never see the effect of your mistake directly. By the time the brown stuff hits the fan, your program has likely ended. No, it won't be you who gets caught out, it'll be the user of your program who suffers at your hands.

You see, all processes return an exit code, that just happens to be the value returned from main. The actual value of this code is completely arbitrary but, nevertheless, the OS will expect it and the C/C++ runtime will deliver it -- even if you haven't provided it with one!

By de-facto standard, zero is returned for success and any other value means failure. If you define your function to return void, what will be returned will normally be the last value stored in the accumulator register. This is because the accumulator is normally used to store the return value of a function. The chances are that in our case this value will not be zero. In fact, it'll be whatever was left in the accumulator when the program exited.

Anyone attempting to write a script to use your program is going to have a head-scratchingly hard time trying to work out why your program randomly seems to fail when he tests the result code of your process. He will discover that it appears to be an arbitrary value. Guess what, it is! Your decision to completely ignore a simple standard will have properly ruined someone's day.

Interestingly, whilst the main function MUST be defined to return an int, in C++ you don't have to actually return anything from main. The main function is treated as a special case; whereas, if you omit a return value the C++ runtime will automatically return zero for you. In other words, the following is about the smallest and, yet, still perfectly valid (if pointless) C++ program one can possibly write.
 
int main() {}

Open in new window

NB. This does NOT apply to the C programming language, where you MUST always return a value from main.

Whether you consider it good practice to have a function defined as returning an int that doesn't actually return a specific value is entirely up to you, but in this one specific case you are allowed to violate what is normally a fundamental rule: functions defined as returning values MUST return values!
4
5,737 Views
evilrixEngineering Manager
CERTIFIED EXPERT
An expert in cross-platform ANSI C/C++ development, specialising in meta-template programming and low latency scalable architecture design.

Comments (2)

tigermattStaff Platform Engineer
CERTIFIED EXPERT
Most Valuable Expert 2011

Commented:
A nice article and clearly a bug bear issue in software which exhibits these traits. Thanks. If the programmer did not see fit to define a sound interface with their program through attention to detail over return values, I would assume quality issues abound with the rest of the software, and would not hesitate to assume compiling it could result in a program which deletes all my data. [Through programmer incompetence or compiler-writer foolhardiness. They are invoking undefined behaviour, after all. :-)]

Some questions / remarks:

Common case behaviour

In reality, despite the standards, how do compilers behave in the common case? If I wanted to see the effects of this for myself, in code I compile, how would I do so?

I suspect the common case is that a (sane) compiler assumes what the programmer intended and synthesises a return of 0? This is certainly the behaviour I experience with gcc on a generic GNU/Linux x86 box. For example, the below:
#include <iostream>

using namespace std;

int main(void)
{
        int x = 3+4 * 5;
        cout << x << endl;
}

Open in new window

compiles down to:
000000000040082d <main>:
  40082d:       55                      push   %rbp
  40082e:       48 89 e5                mov    %rsp,%rbp
  400831:       48 83 ec 10             sub    $0x10,%rsp
  400835:       c7 45 fc 17 00 00 00    movl   $0x17,-0x4(%rbp)
  40083c:       8b 45 fc                mov    -0x4(%rbp),%eax
  40083f:       89 c6                   mov    %eax,%esi
  400841:       bf 80 10 60 00          mov    $0x601080,%edi
  400846:       e8 75 fe ff ff          callq  4006c0 <_ZNSolsEi@plt>
  40084b:       be 30 07 40 00          mov    $0x400730,%esi
  400850:       48 89 c7                mov    %rax,%rdi
  400853:       e8 c8 fe ff ff          callq  400720 <_ZNSolsEPFRSoS_E@plt>
  400858:       b8 00 00 00 00          mov    $0x0,%eax
  40085d:       c9                      leaveq 
  40085e:       c3                      retq   

Open in new window

In particular, the compiler in this case seems to have synthesised an explicit mov of constant value zero to EAX before the return, presumably by assumption that this is what I intended when flowing off the end without an explicit return statement.

I appreciate the GNU Compiler Collection is a behemoth, so what do I need to do to actually observe random values returned from the accumulator register? Use some other simplistic / esoteric compiler, perhaps a very simple one which receives little attention and generates code for some relatively obscure architecture? Something else?

To be clear, I am not advocating a wilful disregard for the standard; quite the contrary. However, I often find constructive examples can be useful in motivating a change in programmer behaviour, rather than arguments from standards (which often seem to fall on deaf ears or false arguments of "it doesn't affect me" or "my compiler is better than that").

Compiler warnings

Should the compiler warn me if there exists a path in the flow graph of a function returning a non-void type which fails to explicitly call return before the end of the function? Should it do this for the main method? Details of specific compilers are well outside the scope of the article, but I note in gcc's man page, the return-type warning explicitly excludes warning about the main method. Why should this be the case?

For C++, a function without return type always produces a diagnostic message, even when -Wno-return-type is specified.  The only exceptions are main and functions defined in system headers.

Infinite loops

What are the correct semantics here? Do I need to / should I be forced to include a return value? What about if I do this outside main(), where switching on return-type warnings throws an error about a missing return statement which would otherwise never be reached? (Assuming C++)
int main(void)
{
        while (true)
        {
                // do something
        }
}

Open in new window


Other functions

The following compiles but I presume it would be unwise to assume that anything will be well-defined after the call to foo() in main() which excludes a return statement?
int foo(int x, int y)
{
        /* something happens here */
}

int main(void)
{
        int bar = foo(3, 4);
}

Open in new window


These questions / remarks are not intended as a criticism of the article, so please don't misconstrue them as such! There are a fair few nuances in such a seemingly simple topic which are worth further exploration.

So, thoughts?
evilrixEngineering Manager
CERTIFIED EXPERT

Author

Commented:
>> In reality, despite the standards, how do compilers behave in the common case

It's really hard to say because it depends on both the compiler and the optimization level to which the code has been compiled. Generally, compilers use the accumulator to return values so whatever was last placed in that register is what will be used. There are no guarantees to this; however, and the only sane (and correct) answer I can give you is that the behaviour is undefined.

>> compiles down to:
Remember, it depends what optimization level you built with. The outcome in terms of assembly code can be very different depending on whether you built optimised for speed or performance and to what level you optimise to.

>> seems to have synthesised an explicit mov of constant value zero to EAX before the return
If this is C++ then, yes, it will.

"Interestingly, whilst the main function MUST be defined to return an int, in C++ you don't have to actually return anything from main. The main function is treated as a special case; whereas, if you omit a return value the C++ runtime will automatically return zero for you."

I can't say for sure, but I have a feeling the C++ standards council made this change to account for the fact people were declaring main to return void. If you were to do the same here you would probably still see zero returned due to the fact the accumulator is being zero'd out; howver, in truth the behaviour is undefined - always was and alway will be (at least until the standards council say otherwise).

>> so what do I need to do to actually observe random values returned from the accumulator register?
Change there return type to void and compile as C not C++ code.

Or, you could try this...

#include <ctime>
#include <cstdlib>

int foo()
{
   return rand();
}

int bar()
{
   // look ma' no return value!
}

int main()
{
   srand((unsigned)time(0));
   
   foo();
   
   return bar(); // should return the return value of foo() (random)
}

Open in new window



>> Should the compiler warn me if there exists a path in the flow graph of a function returning a non-void type which fails to explicitly call return before the end of the function?
Interestingly enough this is not considered an error to not return a value from a function, it is considered to be undefined behaviour. This means the compiler does not have to generate an error but it will probably generate a warning (as long as warning levels are high enough). I suspect the reason is because it is possible to embed assembly into C/C++ code and so the function stack may be modified by the programmer outside the semantics of the C/C++ compiler's remit.

>> I note in gcc's man page, the return-type warning explicitly excludes warning about the main method. Why should this be the case?
Almost certainly because you can embed assembly into manipulate the function's stack-frame, which might be used to generate perfectly valid code but which the C/C++ compiler cannot validate and so will generate a warning.

>> Infinite loops - What are the correct semantics here? Do I need to / should I be forced to include a return value?
The compiler doesn't care - for the sake of sane program exit, maybe :)

>> where switching on return-type warnings throws an error about a missing return statement which would otherwise never be reached?
Well, that's a completely different problem - one of inaccessable control paths. Again, the standard does not mandate any warning be given for this and so that is compiler specific.

Other functions - The following compiles but I presume it would be unwise to assume that anything will be well-defined after the call to foo() in main() which excludes a return statement?
It's probably more correct to say the result is platform specific rather than undefined because it's really down to how the compiler works and it may be that this is actually intended (see my previous observation about embeding assembly). This code is not erroneous, it's just going to behave in a way that is arbitrary unless the programmer really knows what's going on at the assembly level and is controling it in some way.

>> These questions / remarks are not intended as a criticism of the article, so please don't misconstrue them as such!
All interesting points that makes me feel a separate article might be worth writing to cover in more detail the nuances of why there are somethings that would seem completely erroneous that the compiler will let you do and why somethings are undefined and some are unspecified (and what the difference is).

Matt, I hope my comments, above, help.

Have a question about something in this article? You can receive help directly from the article author. Sign up for a free trial to get started.