Solved

printf format specifiers: outputting long as int gives odd result!

Posted on 2014-02-17
18
724 Views
Last Modified: 2014-02-17
Ah hello.

Please consider the following code:

printf("unsigned long long int: %llu \n", 0);
printf("unsigned long long int: %llu \n", (unsigned long long int)0);

Open in new window


On 64-bit Windows (VS 2005), this outputs

unsigned long long int: 9109355120595828736
unsigned long long int: 0

Open in new window


On a second run, it outputs

unsigned long long int: 6927490199261806592
unsigned long long int: 0

Open in new window


Which shows it is pretty random.

On 64-bit Linux (Netbeans), the output is as expected:

unsigned long long int: 0 
unsigned long long int: 0 

Open in new window


I am assuming here the printf is exercising its right to exhibit undefined behaviour, since I have broken my "promise" that I am going to give it an unsigned long long integer by passing 0.

1) Am I correct in my assumption; if so, can someone point me at some documentation stating this?
2) In my real code, I am passing variables to printf whose underlying type depends on a typedef; it may be an int or it may be an unsigned long long.  Am I guaranteed safety by always casting to the larger of those two types (unsigned long long here), which seems to work in my code above?
3) Any comments on the difference between Windows and Linux?

TIA
0
Comment
Question by:mrwad99
  • 8
  • 4
  • 4
  • +2
18 Comments
 
LVL 142

Assisted Solution

by:Guy Hengel [angelIII / a3]
Guy Hengel [angelIII / a3] earned 125 total points
ID: 39864523
you should check the compiler, and enable this flag:
-Wformat

which should result in the following warning:
format ‘%llu’ expects type ‘long long unsigned int’, but argument 2 has type ‘int’

I read that you can be "sloppy" on data types that have "int" size or less (char ...), but not on longer type.
0
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 39864526
so, the difference between windows and linux is first the compiler, and second the hosting OS  ...
0
 
LVL 19

Author Comment

by:mrwad99
ID: 39864556
Thanks: so is casting to the larger type always safe?
0
 
LVL 142

Expert Comment

by:Guy Hengel [angelIII / a3]
ID: 39864633
it depends; if you know what you are doing (and what data you have), the explicit casting is perfectly Ok.
the value "0" can be casted to ANY numerical data type (afaik) without any issue.
0
 
LVL 40

Assisted Solution

by:evilrix
evilrix earned 125 total points
ID: 39864647
Am I right in thinking you're coding using C99 (since unsigned long long is a C99 type)? If so, C99 provides macros for format specifiers that might prove useful in trying to solve your problem.

Each of the following object-like macros185) expands to a character string literal
containing a conversion speci¿er, possibly modi¿ed by a length modi¿er, suitable for use
within the format argument of a formatted input/output function when converting the
corresponding integer type. These macro names have the general form of PRI (character
string literals for the fprintf and fwprintf family) or SCN (character string literals
for the fscanf and fwscanf family),186) followed by the conversion speci¿er,
followed by a name corresponding to a similar type name in 7.18.1. In these names, N
represents the width of the type as described in 7.18.1. For example, PRIdFAST32 can
be used in a format string to print the value of an integer of type int_fast32_t.
2 The fprintf macros for signed integers are:

PRIdN PRIdLEASTN PRIdFASTN PRIdMAX PRIdPTR
PRIiN PRIiLEASTN PRIiFASTN PRIiMAX PRIiPTR

Meanwhile, naked zero is a signed int type where as the format specifier of llu% expects a a 64 bit unsigned type. I'm not surprised the values you are getting are a little odd. Passing incorrect format specifiers to printf is defined as undefined behaviour.


7.19.6.1

9. If a conversion speci¿cation is invalid, the behavior is unde¿ned. If any argument is
not the correct type for the corresponding conversion speci¿cation, the behavior is
unde¿ned.

It's also a C++11 type, and supported by std::ostream.
0
 
LVL 40

Accepted Solution

by:
evilrix earned 125 total points
ID: 39864655
>> the value "0" can be casted to ANY numerical data type (afaik) without any issue.
Unfortunately, casting isn't what happens when you use printf. The printf function is not typesafe - all it knows about the types are what you tell it in the format specifier.

The values are passed using the va_arg framework, which does little more than read everything passed to it from memory as per the format specifier. Since 0 is a signed int (normally 32 but) and unsigned long long is a 64 bit unsigned type the result of trying to use printf is almost certainly going to result in undefined behaviour.

The solution is to explicitly cast and then it should work because then printf gets the correct type. This is exactly what we see in all the test cases, no?
0
 
LVL 142

Assisted Solution

by:Guy Hengel [angelIII / a3]
Guy Hengel [angelIII / a3] earned 125 total points
ID: 39864684
> going to result in undefined behaviour.
what will eventually happen is that as printf is expecting 64 bits, it will take 64 bits... your 32 bits (from the signed 0) and the next 32 bits also... which might be "just anything" on the heap. you might even go into a "bad memory address" crash of your app ...

and I agree that printf is not casting the types, hence the "issue".
0
 
LVL 19

Author Comment

by:mrwad99
ID: 39864706
>> what will eventually happen is that as printf is expecting 64 bits, it will take 64 bits... your 32 bits (from the signed 0) and the next 32 bits also... which might be "just anything"

That is very interesting, but also a bit alarming, kind of like a buffer overflow, and even a possibility to execute malicious code I guess...

Hi Rx!  Thanks for popping in on this one :)

>> The solution is to explicitly cast and then it should work because then printf gets the correct type

Excuse my possible dumbness, but isn't that what I am already doing?
0
 
LVL 40

Expert Comment

by:evilrix
ID: 39864713
>> That is very interesting, but also a bit alarming, kind of like a buffer overflow, and even a possibility to execute malicious code I guess...


BINGO!!! Hence C++ introduced type-safe streams :)

s/printf are evil and are, historically, a source of many exploits. Put simply, if you can avoid using them do so.

>> Excuse my possible dumbness, but isn't that what I am already doing?
Exactly, that's why I said, "This is exactly what we see in all the test cases, no?". As in, the cases where you cast it works as you'd expect, no?
0
What Is Threat Intelligence?

Threat intelligence is often discussed, but rarely understood. Starting with a precise definition, along with clear business goals, is essential.

 
LVL 40

Expert Comment

by:evilrix
ID: 39864716
>> Hi Rx!  Thanks for popping in on this one :)
No hay problema, señor wad99 :)
0
 
LVL 40

Expert Comment

by:evilrix
ID: 39864719
Just thinking you should upgrade your username from mrwad99 to mrwad++ :)
0
 
LVL 19

Author Comment

by:mrwad99
ID: 39864804
Fantastic as always, many thanks both :)
0
 
LVL 40

Expert Comment

by:evilrix
ID: 39864822
De nada, amigo!
0
 
LVL 32

Expert Comment

by:sarabande
ID: 39865145
it will take 64 bits... your 32 bits (from the signed 0) and the next 32 bits also... which might be "just anything" on the heap. you might even go into a "bad memory address" crash of your app ...
the printf makes a cast to the expected type at the address given. as the address is a valid 32-bit address, the cast would not fail beside on the most far end of the virtual memory space.

I have doubts that Linux always would return the expected output when a wrong 32-bit argument was given. I would guess the results you got are by accident and perhaps depending on debug or release mode or on the usage of heap memory before. the vs debugger explicitly writes non-zero contents to freed heap storage what increases the probability that uninitialized memory contains "garbage".

I made a little test and checked for the memory contents right "behind" of valid integer constants (vs10). it contains 0xcccccccc what is memory cleared by the debugger.

Sara
0
 
LVL 19

Author Comment

by:mrwad99
ID: 39865268
Thanks for participating even though the question has been answered Sara!  I don't quite understand what you are saying though with

"as the address is a valid 32-bit address, the cast would not fail beside on the most far end of the virtual memory space. "

Do you mean that the address at which *that zero* is stored is a valid 32 bit memory address, but the next 32 bits in memory are uninitialised, and that casting the 0, combined with value in the next 32 bits (which is uninitialised as far as we know) will fail?
0
 
LVL 40

Expert Comment

by:evilrix
ID: 39865429
>> the printf makes a cast to the expected type at the address given. as the address is a valid 32-bit address, the cast would not fail beside on the most far end of the virtual memory space.

No one is saying anything about casts failing. What we're saying is that if you have a 32 bit type and try to used it as a 64bit type the result is undefined and the value you get back is meaningless.

Semantically, it's the same as creating a union of an int32_t and an int64_t, initialised the 32 bit member and reading the 64 bit member. The value you'll get will probably be garbage but will definitely be undefined.

The printf function knows nothing about the type of the original variable, all it knows is what you tell it in the format specifier. If this doesn't match the type expected the C99 standard is clear, "If any argument is not the correct type for the corresponding conversion specification, the behavior is undefined"

>> Do you mean that the address at which *that zero* is stored is a valid 32 bit memory address

If that is what's being said it's an incorrect assertion. It may be on the compiler you use but all the C99 standard guarantees is that it will default to using a type int if it can represent the value using that type. If it can't, the standard has a well defined order of progressively bigger types that it will try; it will use the first one to match.

The size of an int; however, is platform specific and so you cannot make any assumptions such as this. More specifically, the standards states:

"A ‘‘plain’’ int object has the natural size suggested by the
architecture of the execution environment (large enough to contain any value in the range
INT_MIN to INT_MAX as de¿ned in the header <limits.h>)."

>> but the next 32 bits in memory are uninitialised, and that casting the 0, combined with value in the next 32 bits (which is uninitialised as far as we know) will fail?

If you use an explicit cast when passing the value to printf a temporary l-value will be created on the printf stack-frame. This temporary will be of the type as defined by the cast will be initialised according to the the C99 well defined behaviour for integer promotion:

6.3.1.3 Signed and unsigned integers

1 When a value with integer type is converted to another integer type other than _Bool, if
the value can be represented by the new type, it is unchanged.

2 Otherwise, if the new type is unsigned, the value is converted by repeatedly adding or
subtracting one more than the maximum value that can be represented in the new type
until the value is in the range of the new type.49)

3 Otherwise, the new type is signed and the value cannot be represented in it; either the
result is implementation-de¿ned or an implementation-de¿ned signal is raised.
0
 
LVL 32

Expert Comment

by:phoffric
ID: 39865443
Whenever you start seeing strange decimal outputs, it is helpful to see what the hex equivalent is:
9109355120595828736 = 0x7E6A EE38 0000 0000
6927490199261806592 = 0x6023 63A2 0000 0000
This often gives a clue by observing a pattern. In this case the most significant bits are garbage, and you actually do get your 32-bit 0's.
0
 
LVL 40

Expert Comment

by:evilrix
ID: 39866022
Good point, well made, Paul :)
0

Featured Post

Enabling OSINT in Activity Based Intelligence

Activity based intelligence (ABI) requires access to all available sources of data. Recorded Future allows analysts to observe structured data on the open, deep, and dark web.

Join & Write a Comment

Summary: This tutorial covers some basics of pointer, pointer arithmetic and function pointer. What is a pointer: A pointer is a variable which holds an address. This address might be address of another variable/address of devices/address of fu…
Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
The goal of this video is to provide viewers with basic examples to understand opening and reading files in the C programming language.
The viewer will be introduced to the member functions push_back and pop_back of the vector class. The video will teach the difference between the two as well as how to use each one along with its functionality.

746 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now