Reading a large file into an Array.

Posted on 2002-07-22
Last Modified: 2010-04-01
I'm having a problem figuring out how to read a large file from disk into an array. The way I know of doing this is to do a byte by byte read of the file and filling the array. This takes quite a bit of time as the file's I'm reading are generally in the 15Mbyte range. The hex bytes are read into the array and then the contents are analyzed. Also, I would like to read a word at a time, but the bytes are reversed when I read them. As an example, if I was reading the word 4E6F hex, the word read comes out 6F4E. How do I fix this big indian, little indian problem so I can read from the file directly into the array without transposing the digits.

I'm using Visual C++ on a PC.
Question by:GoldStrike
  • 4
  • 4
  • 2
  • +3
LVL 86

Expert Comment

ID: 7170100
I'd read the whole file into a memory buffer at once, e.g.

HANDLE hFile = CreateFile ( ...);
DWORD dwSize = GetFileSize ( hFile, NULL);

LPBYTE pb = new BYTE [ dwSize];

DWORD dwRead;
ReadFile ( hFile, pb, dwSize, &dwRead, NULL);

CloseHandle ( hFile);

>>How do I fix this big indian, little indian problem so I
>>can read from the file directly into the array without
>>transposing the digits.

Assuming that you don't want to convert the whole array, adjust the endianity when reading the values from the array (in your example, it seems that you're dealing with 'short' values), e.g.:

short* ps = (short*) pb;

short value = ntohs ( ps [ 0]);


Expert Comment

ID: 7170146
If you need to access a large file's data in memory you may also want to check out memory mapped files. You don't need to read the entire file into memory to be able to access it but rather map it into your process's address space and leave to rest to the OS. If you are developing for a windows platform check out the functions:

LVL 30

Expert Comment

ID: 7170572
I agree with fl0yd method.

If you're interested in this method, I'll post an example.

I think you're better off using the ntohs() function jkr, mention, as you iterate through the data.
In other words, instead of converting the entire 15Mbyte of data, just convert as you use each section of the data.
ScreenConnect 6.0 Free Trial

Check out the updates in one game-changing release, ScreenConnect 6.0, based on partner feedback. New features include a redesigned UI that improves session organization and overall user experience. See the enhancements for yourself!

LVL 30

Expert Comment

ID: 7170577
Is this an MFC project?

Can you explain why you need to read the entire file?

Maybe we can suggest a better method, if you can explain what you're trying to do with the data.

Author Comment

ID: 7170628
thanks for the responses,

"I agree with fl0yd method.
If you're interested in this method, I'll post an example."

Axter, if you can post the example I would appreciate it.
Also, how would I go about converting the entire file if I needed to?

The data is real time radar data that is collected and saved to a zip disk. An analysis program will then read the file from the zip disk and perform various tasks on it, i.e. plotting the data, running computations on the data, formatting and printing a record of the data. We have utilities that do this, I was just looking for a faster way of accessing the information from disk.
LVL 86

Expert Comment

ID: 7170773
When it comes to nitpicking, loading 15MB as one chunk might be faster .o)

But, that also depends on the resource available - if memory is an issue, mamory mapped files might be better.
LVL 22

Expert Comment

ID: 7170959
hmm .. how about an intermediate approach, not using memeory maps and not loading all the file at a time too. But reading the file in chunks of say 8 kb or 64 kb (preferabbly any multiple of DWORD size) or  whatever value suits your purpose ?

Expert Comment

ID: 7171280
jkr: "When it comes to nitpicking, loading 15MB as one chunk might be faster .o)"

Possibly, if you are a machine. If you are human using memory mapped files might be slower but it sure as hell *FEELS* a lot more responsive and thus faster. So if you are developing for a robot use the fastest approach -- if it's going to be a human that is using your software you will have to take to other criteria for choosing your approach.

Expert Comment

ID: 7174189
Endian Swapping in C++.

This should sort out your endian swapping:

inline void uint8 MaskBits(uint16 f, uint16 m)
{ return (f & m) }

inline void EndianSwap(uint16 &Data)
  Data = uint16(MaskBits(Data, 0x00FF) << 8) |
         uint16(MaskBits(Data, 0xFF00) >> 8);

Can be easily extended to swap uint8s, uint32s etc.


Expert Comment

ID: 7174316
    judging from your excessive use of the 'inline'-keyword you seem to be looking for high performance. If that is the case, why spend so much time and memory performing instructions that have no real effect? Check out this revised version of your EndianSwap( uint16 ):

void EndianSwap( uint16& Data ) {
    Data = ( Data << 8 ) | ( Data >> 8 );

Moreover, in contrast to your version, this one does work as expected. Take a look at your MaskBits-function which has errors all over the place.

    using 16-bit WORD's on a PC is SLOOOOW. If you use them in 32bit-applications you will end up with constantly stalling the pipeline by switching to 16bit-mode. Let's assume you have a WORD in the lower 2 bytes of a DWORD. To swap those, use the following code:

#include <xutility>  // for std::swap<class T>

DWORD dwData = some_16bit_value;
unsigned char* p = &dwData;
std::swap<unsigned char>( *p, *( p + 1 ) );

While not being much of a gain (you'll lose around 7 clock cycles on a pIII each time you access 16bit data) it will definately make a difference when applied on 15mb of data.

Author Comment

ID: 7174668
I agree with fl0yd on his approach to the big endian issue.
thanks for the responses.

Axter, if you have that example to post I would appreciate it. thks.


Accepted Solution

fl0yd earned 300 total points
ID: 7174724
HANDLE hFile;    // file handle
HANDLE hFileMapping;    // file mapping object handle
BYTE* pbFile;    // file pointer

// open file
hFile = CreateFile( lpszName, GENERIC_READ, ... );

// create file mapping object for that file
hFileMapping = CreateFileMapping( hFile, ... );

// map the file into process' address space
pbFile = (BYTE*)MapViewOfFile( hFileMapping, ... );

// here you can access the file through memory pointers
// do not alter pbFile, though, we will need it later on

// cleanup
UnmapViewOfFile( pbFile );
CloseHandle( hFileMapping );
CloseHandle( hFile );

You will have to decide what parameters to use whereever I omitted them. There are too many possible combinations to make a choice for you. Also, you will have to check for errors which I have omitted to keep the sample easy.

Author Comment

ID: 7174736
thnks, That's what I needed.

Author Comment

ID: 7174738
thanks for the help everyone.

Featured Post

Netscaler Common Configuration How To guides

If you use NetScaler you will want to see these guides. The NetScaler How To Guides show administrators how to get NetScaler up and configured by providing instructions for common scenarios and some not so common ones.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Unlike C#, C++ doesn't have native support for sealing classes (so they cannot be sub-classed). At the cost of a virtual base class pointer it is possible to implement a pseudo sealing mechanism The trick is to virtually inherit from a base class…
In days of old, returning something by value from a function in C++ was necessarily avoided because it would, invariably, involve one or even two copies of the object being created and potentially costly calls to a copy-constructor and destructor. A…
The viewer will learn how to pass data into a function in C++. This is one step further in using functions. Instead of only printing text onto the console, the function will be able to perform calculations with argumentents given by the user.
The viewer will learn how to user default arguments when defining functions. This method of defining functions will be contrasted with the non-default-argument of defining functions.

770 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question