Improve company productivity with a Business Account.Sign Up

  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 221
  • Last Modified:

Reading a large file into an Array.

I'm having a problem figuring out how to read a large file from disk into an array. The way I know of doing this is to do a byte by byte read of the file and filling the array. This takes quite a bit of time as the file's I'm reading are generally in the 15Mbyte range. The hex bytes are read into the array and then the contents are analyzed. Also, I would like to read a word at a time, but the bytes are reversed when I read them. As an example, if I was reading the word 4E6F hex, the word read comes out 6F4E. How do I fix this big indian, little indian problem so I can read from the file directly into the array without transposing the digits.

I'm using Visual C++ on a PC.
  • 4
  • 4
  • 2
  • +3
1 Solution
I'd read the whole file into a memory buffer at once, e.g.

HANDLE hFile = CreateFile ( ...);
DWORD dwSize = GetFileSize ( hFile, NULL);

LPBYTE pb = new BYTE [ dwSize];

DWORD dwRead;
ReadFile ( hFile, pb, dwSize, &dwRead, NULL);

CloseHandle ( hFile);

>>How do I fix this big indian, little indian problem so I
>>can read from the file directly into the array without
>>transposing the digits.

Assuming that you don't want to convert the whole array, adjust the endianity when reading the values from the array (in your example, it seems that you're dealing with 'short' values), e.g.:

short* ps = (short*) pb;

short value = ntohs ( ps [ 0]);

If you need to access a large file's data in memory you may also want to check out memory mapped files. You don't need to read the entire file into memory to be able to access it but rather map it into your process's address space and leave to rest to the OS. If you are developing for a windows platform check out the functions:

I agree with fl0yd method.

If you're interested in this method, I'll post an example.

I think you're better off using the ntohs() function jkr, mention, as you iterate through the data.
In other words, instead of converting the entire 15Mbyte of data, just convert as you use each section of the data.
Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Is this an MFC project?

Can you explain why you need to read the entire file?

Maybe we can suggest a better method, if you can explain what you're trying to do with the data.
GoldStrikeAuthor Commented:
thanks for the responses,

"I agree with fl0yd method.
If you're interested in this method, I'll post an example."

Axter, if you can post the example I would appreciate it.
Also, how would I go about converting the entire file if I needed to?

The data is real time radar data that is collected and saved to a zip disk. An analysis program will then read the file from the zip disk and perform various tasks on it, i.e. plotting the data, running computations on the data, formatting and printing a record of the data. We have utilities that do this, I was just looking for a faster way of accessing the information from disk.
When it comes to nitpicking, loading 15MB as one chunk might be faster .o)

But, that also depends on the resource available - if memory is an issue, mamory mapped files might be better.
hmm .. how about an intermediate approach, not using memeory maps and not loading all the file at a time too. But reading the file in chunks of say 8 kb or 64 kb (preferabbly any multiple of DWORD size) or  whatever value suits your purpose ?
jkr: "When it comes to nitpicking, loading 15MB as one chunk might be faster .o)"

Possibly, if you are a machine. If you are human using memory mapped files might be slower but it sure as hell *FEELS* a lot more responsive and thus faster. So if you are developing for a robot use the fastest approach -- if it's going to be a human that is using your software you will have to take to other criteria for choosing your approach.
Endian Swapping in C++.

This should sort out your endian swapping:

inline void uint8 MaskBits(uint16 f, uint16 m)
{ return (f & m) }

inline void EndianSwap(uint16 &Data)
  Data = uint16(MaskBits(Data, 0x00FF) << 8) |
         uint16(MaskBits(Data, 0xFF00) >> 8);

Can be easily extended to swap uint8s, uint32s etc.

    judging from your excessive use of the 'inline'-keyword you seem to be looking for high performance. If that is the case, why spend so much time and memory performing instructions that have no real effect? Check out this revised version of your EndianSwap( uint16 ):

void EndianSwap( uint16& Data ) {
    Data = ( Data << 8 ) | ( Data >> 8 );

Moreover, in contrast to your version, this one does work as expected. Take a look at your MaskBits-function which has errors all over the place.

    using 16-bit WORD's on a PC is SLOOOOW. If you use them in 32bit-applications you will end up with constantly stalling the pipeline by switching to 16bit-mode. Let's assume you have a WORD in the lower 2 bytes of a DWORD. To swap those, use the following code:

#include <xutility>  // for std::swap<class T>

DWORD dwData = some_16bit_value;
unsigned char* p = &dwData;
std::swap<unsigned char>( *p, *( p + 1 ) );

While not being much of a gain (you'll lose around 7 clock cycles on a pIII each time you access 16bit data) it will definately make a difference when applied on 15mb of data.
GoldStrikeAuthor Commented:
I agree with fl0yd on his approach to the big endian issue.
thanks for the responses.

Axter, if you have that example to post I would appreciate it. thks.

HANDLE hFile;    // file handle
HANDLE hFileMapping;    // file mapping object handle
BYTE* pbFile;    // file pointer

// open file
hFile = CreateFile( lpszName, GENERIC_READ, ... );

// create file mapping object for that file
hFileMapping = CreateFileMapping( hFile, ... );

// map the file into process' address space
pbFile = (BYTE*)MapViewOfFile( hFileMapping, ... );

// here you can access the file through memory pointers
// do not alter pbFile, though, we will need it later on

// cleanup
UnmapViewOfFile( pbFile );
CloseHandle( hFileMapping );
CloseHandle( hFile );

You will have to decide what parameters to use whereever I omitted them. There are too many possible combinations to make a choice for you. Also, you will have to check for errors which I have omitted to keep the sample easy.
GoldStrikeAuthor Commented:
thnks, That's what I needed.
GoldStrikeAuthor Commented:
thanks for the help everyone.
Question has a verified solution.

Are you are experiencing a similar issue? Get a personalized answer when you ask a related question.

Have a better answer? Share it in a comment.

Join & Write a Comment

Featured Post

Free Tool: IP Lookup

Get more info about an IP address or domain name, such as organization, abuse contacts and geolocation.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 4
  • 4
  • 2
  • +3
Tackle projects and never again get stuck behind a technical roadblock.
Join Now