writing data in binary form

Posted on 2013-09-04
Medium Priority
Last Modified: 2013-09-04
In my large C++ application, I see often times previous developers have written data in binary form. So basically when they have to read they convert it to ASCII form as they know the format of the file, (example 1st 4 bytes are uint32_t, next byte is char etc....)

I am trying to understand what is the benefit of writing in binary form and then when it comes to reading, reconstruct the original human readable form?

Is there any benefit like size reduction by saving in binary form, or faster write processing etc or maybe something else?

P.S: Again these files are only configuration related file like metadata of a file etc...
Question by:perlperl
LVL 84

Expert Comment

by:Dave Baldwin
ID: 39464888
ASCII form is just for humans to be able to read it.  All arithmetic is done in binary form as is all addressing including things like memory, MAC, and IP addresses.  Binary is the original and necessary format, not ASCII.

Author Comment

ID: 39464899
I am talking about the contents of files stored on filesystem
LVL 86

Accepted Solution

jkr earned 1000 total points
ID: 39464907
As David wrote, binary is the format that is native to computing and therefore "faster". Also:

>>Is there any benefit like size reduction by saving in binary form, or faster write processing
>>etc or maybe something else?

It's  a major size reduction. E.g. to express the number 4294967295 (UINT_MAX) you need 10 bytes in ASCII, whereas the same can be done with just four bytes in binary.

The basic question probably is: Do you need the saved to be human readable? If the aswer is "yes", then use ASCII storage, if "no", binary is preferrable.
Free Tool: ZipGrep

ZipGrep is a utility that can list and search zip (.war, .ear, .jar, etc) archives for text patterns, without the need to extract the archive's contents.

One of a set of tools we're offering as a way to say thank you for being a part of the community.

LVL 84

Assisted Solution

by:Dave Baldwin
Dave Baldwin earned 400 total points
ID: 39464972
The binary forms are Required.  Things like numbers for arithmetic and sizes of arrays and offsets into memory areas are used in Binary form.  The ASCII form is useless for that and is Only used to make it readable by humans.  Since the Binary form is required, it is much more efficient to create it that way to begin with.  If you store the data in ASCII form, then you are adding a translation step before it can be used by the system.

Author Comment

ID: 39465017
Now I got it.
In my case it was more for size reduction. The file was mainly storing uint32 only. and the file had a limit of 4K only. So by saving binary instead of ASCII we can fit more entries in the file.

Thanks a lot.
LVL 40

Assisted Solution

evilrix earned 200 total points
ID: 39465049
Just adding to the information already provided by the other experts...

It's worth noting that binary data isn't portable, whereas text data (generally) is. You have to consider endianess. If you write binary data on a big endian platform and then read it back on a little endian platform (or vice versa) you won't get the original data. This is why, when you send data over a network, you have to convert it to Network Byte Order.

You'll always get the original data when written as text because it is serialised as a byte sequence or characters. Not that this is only write when you are writing "narrow" 8 bit ASCII (extended) text, since each character is just a single byte.

The same is *not* true when you are writing wide or multi-byte text. Each character is going to be multiple-bytes and so the endianness matters. This is why UTF (Unicode Transformation Format) uses Byte Order Marks (BOM), to ensure the text can be reconstructed properly regardless of the endianess of the original platform.

Short answer: if you need the data to be platform independent, human readable and it can be serialised as a byte stream and/or written with a BOM then use text. If you need the data to be machine readable and/or size matters and/or you know you'll be reading and writing on the same platform then use binary. These are rules of thumb - YMMV :)

That all said, you are better off using a proper data serialisation library (such as Boost Serialization) as it will take care of all of these "low level" problems and allow you to just get on with the "business logic" of your program.

Author Comment

ID: 39465197
Thanks for the information,
I did see the ntohl and htonl in my application while storing/reading data to/from the file

Featured Post

Get your problem seen by more experts

Be seen. Boost your question’s priority for more expert views and faster solutions

Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Errors will happen. It is a fact of life for the programmer. How and when errors are detected have a great impact on quality and cost of a product. It is better to detect errors at compile time, when possible and practical. Errors that make their wa…
IntroductionThis article is the second in a three part article series on the Visual Studio 2008 Debugger.  It provides tips in setting and using breakpoints. If not familiar with this debugger, you can find a basic introduction in the EE article loc…
The goal of the video will be to teach the user the concept of local variables and scope. An example of a locally defined variable will be given as well as an explanation of what scope is in C++. The local variable and concept of scope will be relat…
The viewer will learn how to pass data into a function in C++. This is one step further in using functions. Instead of only printing text onto the console, the function will be able to perform calculations with argumentents given by the user.

607 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question