Solved

Reading a large file into an Array.

Posted on 2002-07-22
14
211 Views
Last Modified: 2010-04-01
I'm having a problem figuring out how to read a large file from disk into an array. The way I know of doing this is to do a byte by byte read of the file and filling the array. This takes quite a bit of time as the file's I'm reading are generally in the 15Mbyte range. The hex bytes are read into the array and then the contents are analyzed. Also, I would like to read a word at a time, but the bytes are reversed when I read them. As an example, if I was reading the word 4E6F hex, the word read comes out 6F4E. How do I fix this big indian, little indian problem so I can read from the file directly into the array without transposing the digits.

I'm using Visual C++ on a PC.
thanks
0
Comment
Question by:GoldStrike
  • 4
  • 4
  • 2
  • +3
14 Comments
 
LVL 86

Expert Comment

by:jkr
ID: 7170100
I'd read the whole file into a memory buffer at once, e.g.

HANDLE hFile = CreateFile ( ...);
DWORD dwSize = GetFileSize ( hFile, NULL);

LPBYTE pb = new BYTE [ dwSize];

DWORD dwRead;
ReadFile ( hFile, pb, dwSize, &dwRead, NULL);

CloseHandle ( hFile);

>>How do I fix this big indian, little indian problem so I
>>can read from the file directly into the array without
>>transposing the digits.

Assuming that you don't want to convert the whole array, adjust the endianity when reading the values from the array (in your example, it seems that you're dealing with 'short' values), e.g.:

short* ps = (short*) pb;

short value = ntohs ( ps [ 0]);


0
 
LVL 8

Expert Comment

by:fl0yd
ID: 7170146
If you need to access a large file's data in memory you may also want to check out memory mapped files. You don't need to read the entire file into memory to be able to access it but rather map it into your process's address space and leave to rest to the OS. If you are developing for a windows platform check out the functions:

CreateFile
CreateFileMapping
MapViewOfFile
CloseHandle
0
 
LVL 30

Expert Comment

by:Axter
ID: 7170572
I agree with fl0yd method.

If you're interested in this method, I'll post an example.

I think you're better off using the ntohs() function jkr, mention, as you iterate through the data.
In other words, instead of converting the entire 15Mbyte of data, just convert as you use each section of the data.
0
 
LVL 30

Expert Comment

by:Axter
ID: 7170577
Is this an MFC project?

Can you explain why you need to read the entire file?

Maybe we can suggest a better method, if you can explain what you're trying to do with the data.
0
 

Author Comment

by:GoldStrike
ID: 7170628
thanks for the responses,

"I agree with fl0yd method.
If you're interested in this method, I'll post an example."

Axter, if you can post the example I would appreciate it.
Also, how would I go about converting the entire file if I needed to?

The data is real time radar data that is collected and saved to a zip disk. An analysis program will then read the file from the zip disk and perform various tasks on it, i.e. plotting the data, running computations on the data, formatting and printing a record of the data. We have utilities that do this, I was just looking for a faster way of accessing the information from disk.
0
 
LVL 86

Expert Comment

by:jkr
ID: 7170773
When it comes to nitpicking, loading 15MB as one chunk might be faster .o)

But, that also depends on the resource available - if memory is an issue, mamory mapped files might be better.
0
 
LVL 22

Expert Comment

by:ambience
ID: 7170959
hmm .. how about an intermediate approach, not using memeory maps and not loading all the file at a time too. But reading the file in chunks of say 8 kb or 64 kb (preferabbly any multiple of DWORD size) or  whatever value suits your purpose ?
0
Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

 
LVL 8

Expert Comment

by:fl0yd
ID: 7171280
jkr: "When it comes to nitpicking, loading 15MB as one chunk might be faster .o)"

Possibly, if you are a machine. If you are human using memory mapped files might be slower but it sure as hell *FEELS* a lot more responsive and thus faster. So if you are developing for a robot use the fastest approach -- if it's going to be a human that is using your software you will have to take to other criteria for choosing your approach.
0
 

Expert Comment

by:JamesDuggan
ID: 7174189
Endian Swapping in C++.

This should sort out your endian swapping:

inline void uint8 MaskBits(uint16 f, uint16 m)
{ return (f & m) }

inline void EndianSwap(uint16 &Data)
{
  Data = uint16(MaskBits(Data, 0x00FF) << 8) |
         uint16(MaskBits(Data, 0xFF00) >> 8);
}


Can be easily extended to swap uint8s, uint32s etc.

Enjoy.
0
 
LVL 8

Expert Comment

by:fl0yd
ID: 7174316
James,
    judging from your excessive use of the 'inline'-keyword you seem to be looking for high performance. If that is the case, why spend so much time and memory performing instructions that have no real effect? Check out this revised version of your EndianSwap( uint16 ):

void EndianSwap( uint16& Data ) {
    Data = ( Data << 8 ) | ( Data >> 8 );
}

Moreover, in contrast to your version, this one does work as expected. Take a look at your MaskBits-function which has errors all over the place.


GoldStrike,
    using 16-bit WORD's on a PC is SLOOOOW. If you use them in 32bit-applications you will end up with constantly stalling the pipeline by switching to 16bit-mode. Let's assume you have a WORD in the lower 2 bytes of a DWORD. To swap those, use the following code:

#include <xutility>  // for std::swap<class T>

DWORD dwData = some_16bit_value;
unsigned char* p = &dwData;
std::swap<unsigned char>( *p, *( p + 1 ) );

While not being much of a gain (you'll lose around 7 clock cycles on a pIII each time you access 16bit data) it will definately make a difference when applied on 15mb of data.
0
 

Author Comment

by:GoldStrike
ID: 7174668
I agree with fl0yd on his approach to the big endian issue.
thanks for the responses.

Axter, if you have that example to post I would appreciate it. thks.

0
 
LVL 8

Accepted Solution

by:
fl0yd earned 300 total points
ID: 7174724
HANDLE hFile;    // file handle
HANDLE hFileMapping;    // file mapping object handle
BYTE* pbFile;    // file pointer

// open file
hFile = CreateFile( lpszName, GENERIC_READ, ... );

// create file mapping object for that file
hFileMapping = CreateFileMapping( hFile, ... );

// map the file into process' address space
pbFile = (BYTE*)MapViewOfFile( hFileMapping, ... );

// here you can access the file through memory pointers
// do not alter pbFile, though, we will need it later on

// cleanup
UnmapViewOfFile( pbFile );
CloseHandle( hFileMapping );
CloseHandle( hFile );

You will have to decide what parameters to use whereever I omitted them. There are too many possible combinations to make a choice for you. Also, you will have to check for errors which I have omitted to keep the sample easy.
0
 

Author Comment

by:GoldStrike
ID: 7174736
thnks, That's what I needed.
0
 

Author Comment

by:GoldStrike
ID: 7174738
thanks for the help everyone.
0

Featured Post

Is Your Active Directory as Secure as You Think?

More than 75% of all records are compromised because of the loss or theft of a privileged credential. Experts have been exploring Active Directory infrastructure to identify key threats and establish best practices for keeping data safe. Attend this month’s webinar to learn more.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When writing generic code, using template meta-programming techniques, it is sometimes useful to know if a type is convertible to another type. A good example of when this might be is if you are writing diagnostic instrumentation for code to generat…
Many modern programming languages support the concept of a property -- a class member that combines characteristics of both a data member and a method.  These are sometimes called "smart fields" because you can add logic that is applied automaticall…
The goal of the video will be to teach the user the concept of local variables and scope. An example of a locally defined variable will be given as well as an explanation of what scope is in C++. The local variable and concept of scope will be relat…
The viewer will be introduced to the member functions push_back and pop_back of the vector class. The video will teach the difference between the two as well as how to use each one along with its functionality.

914 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

17 Experts available now in Live!

Get 1:1 Help Now