Go Premium for a chance to win a PS4. Enter to Win


Reading a large file into an Array.

Posted on 2002-07-22
Medium Priority
Last Modified: 2010-04-01
I'm having a problem figuring out how to read a large file from disk into an array. The way I know of doing this is to do a byte by byte read of the file and filling the array. This takes quite a bit of time as the file's I'm reading are generally in the 15Mbyte range. The hex bytes are read into the array and then the contents are analyzed. Also, I would like to read a word at a time, but the bytes are reversed when I read them. As an example, if I was reading the word 4E6F hex, the word read comes out 6F4E. How do I fix this big indian, little indian problem so I can read from the file directly into the array without transposing the digits.

I'm using Visual C++ on a PC.
Question by:GoldStrike
  • 4
  • 4
  • 2
  • +3
LVL 86

Expert Comment

ID: 7170100
I'd read the whole file into a memory buffer at once, e.g.

HANDLE hFile = CreateFile ( ...);
DWORD dwSize = GetFileSize ( hFile, NULL);

LPBYTE pb = new BYTE [ dwSize];

DWORD dwRead;
ReadFile ( hFile, pb, dwSize, &dwRead, NULL);

CloseHandle ( hFile);

>>How do I fix this big indian, little indian problem so I
>>can read from the file directly into the array without
>>transposing the digits.

Assuming that you don't want to convert the whole array, adjust the endianity when reading the values from the array (in your example, it seems that you're dealing with 'short' values), e.g.:

short* ps = (short*) pb;

short value = ntohs ( ps [ 0]);


Expert Comment

ID: 7170146
If you need to access a large file's data in memory you may also want to check out memory mapped files. You don't need to read the entire file into memory to be able to access it but rather map it into your process's address space and leave to rest to the OS. If you are developing for a windows platform check out the functions:

LVL 30

Expert Comment

ID: 7170572
I agree with fl0yd method.

If you're interested in this method, I'll post an example.

I think you're better off using the ntohs() function jkr, mention, as you iterate through the data.
In other words, instead of converting the entire 15Mbyte of data, just convert as you use each section of the data.
Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

LVL 30

Expert Comment

ID: 7170577
Is this an MFC project?

Can you explain why you need to read the entire file?

Maybe we can suggest a better method, if you can explain what you're trying to do with the data.

Author Comment

ID: 7170628
thanks for the responses,

"I agree with fl0yd method.
If you're interested in this method, I'll post an example."

Axter, if you can post the example I would appreciate it.
Also, how would I go about converting the entire file if I needed to?

The data is real time radar data that is collected and saved to a zip disk. An analysis program will then read the file from the zip disk and perform various tasks on it, i.e. plotting the data, running computations on the data, formatting and printing a record of the data. We have utilities that do this, I was just looking for a faster way of accessing the information from disk.
LVL 86

Expert Comment

ID: 7170773
When it comes to nitpicking, loading 15MB as one chunk might be faster .o)

But, that also depends on the resource available - if memory is an issue, mamory mapped files might be better.
LVL 22

Expert Comment

ID: 7170959
hmm .. how about an intermediate approach, not using memeory maps and not loading all the file at a time too. But reading the file in chunks of say 8 kb or 64 kb (preferabbly any multiple of DWORD size) or  whatever value suits your purpose ?

Expert Comment

ID: 7171280
jkr: "When it comes to nitpicking, loading 15MB as one chunk might be faster .o)"

Possibly, if you are a machine. If you are human using memory mapped files might be slower but it sure as hell *FEELS* a lot more responsive and thus faster. So if you are developing for a robot use the fastest approach -- if it's going to be a human that is using your software you will have to take to other criteria for choosing your approach.

Expert Comment

ID: 7174189
Endian Swapping in C++.

This should sort out your endian swapping:

inline void uint8 MaskBits(uint16 f, uint16 m)
{ return (f & m) }

inline void EndianSwap(uint16 &Data)
  Data = uint16(MaskBits(Data, 0x00FF) << 8) |
         uint16(MaskBits(Data, 0xFF00) >> 8);

Can be easily extended to swap uint8s, uint32s etc.


Expert Comment

ID: 7174316
    judging from your excessive use of the 'inline'-keyword you seem to be looking for high performance. If that is the case, why spend so much time and memory performing instructions that have no real effect? Check out this revised version of your EndianSwap( uint16 ):

void EndianSwap( uint16& Data ) {
    Data = ( Data << 8 ) | ( Data >> 8 );

Moreover, in contrast to your version, this one does work as expected. Take a look at your MaskBits-function which has errors all over the place.

    using 16-bit WORD's on a PC is SLOOOOW. If you use them in 32bit-applications you will end up with constantly stalling the pipeline by switching to 16bit-mode. Let's assume you have a WORD in the lower 2 bytes of a DWORD. To swap those, use the following code:

#include <xutility>  // for std::swap<class T>

DWORD dwData = some_16bit_value;
unsigned char* p = &dwData;
std::swap<unsigned char>( *p, *( p + 1 ) );

While not being much of a gain (you'll lose around 7 clock cycles on a pIII each time you access 16bit data) it will definately make a difference when applied on 15mb of data.

Author Comment

ID: 7174668
I agree with fl0yd on his approach to the big endian issue.
thanks for the responses.

Axter, if you have that example to post I would appreciate it. thks.


Accepted Solution

fl0yd earned 1200 total points
ID: 7174724
HANDLE hFile;    // file handle
HANDLE hFileMapping;    // file mapping object handle
BYTE* pbFile;    // file pointer

// open file
hFile = CreateFile( lpszName, GENERIC_READ, ... );

// create file mapping object for that file
hFileMapping = CreateFileMapping( hFile, ... );

// map the file into process' address space
pbFile = (BYTE*)MapViewOfFile( hFileMapping, ... );

// here you can access the file through memory pointers
// do not alter pbFile, though, we will need it later on

// cleanup
UnmapViewOfFile( pbFile );
CloseHandle( hFileMapping );
CloseHandle( hFile );

You will have to decide what parameters to use whereever I omitted them. There are too many possible combinations to make a choice for you. Also, you will have to check for errors which I have omitted to keep the sample easy.

Author Comment

ID: 7174736
thnks, That's what I needed.

Author Comment

ID: 7174738
thanks for the help everyone.

Featured Post

Important Lessons on Recovering from Petya

In their most recent webinar, Skyport Systems explores ways to isolate and protect critical databases to keep the core of your company safe from harm.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

  Included as part of the C++ Standard Template Library (STL) is a collection of generic containers. Each of these containers serves a different purpose and has different pros and cons. It is often difficult to decide which container to use and …
Go is an acronym of golang, is a programming language developed Google in 2007. Go is a new language that is mostly in the C family, with significant input from Pascal/Modula/Oberon family. Hence Go arisen as low-level language with fast compilation…
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…
The viewer will learn how to user default arguments when defining functions. This method of defining functions will be contrasted with the non-default-argument of defining functions.
Suggested Courses

916 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question