Still celebrating National IT Professionals Day with 3 months of free Premium Membership. Use Code ITDAY17

x
?
Solved

Reading a large file into an Array.

Posted on 2002-07-22
14
Medium Priority
?
216 Views
Last Modified: 2010-04-01
I'm having a problem figuring out how to read a large file from disk into an array. The way I know of doing this is to do a byte by byte read of the file and filling the array. This takes quite a bit of time as the file's I'm reading are generally in the 15Mbyte range. The hex bytes are read into the array and then the contents are analyzed. Also, I would like to read a word at a time, but the bytes are reversed when I read them. As an example, if I was reading the word 4E6F hex, the word read comes out 6F4E. How do I fix this big indian, little indian problem so I can read from the file directly into the array without transposing the digits.

I'm using Visual C++ on a PC.
thanks
0
Comment
Question by:GoldStrike
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 4
  • 4
  • 2
  • +3
14 Comments
 
LVL 86

Expert Comment

by:jkr
ID: 7170100
I'd read the whole file into a memory buffer at once, e.g.

HANDLE hFile = CreateFile ( ...);
DWORD dwSize = GetFileSize ( hFile, NULL);

LPBYTE pb = new BYTE [ dwSize];

DWORD dwRead;
ReadFile ( hFile, pb, dwSize, &dwRead, NULL);

CloseHandle ( hFile);

>>How do I fix this big indian, little indian problem so I
>>can read from the file directly into the array without
>>transposing the digits.

Assuming that you don't want to convert the whole array, adjust the endianity when reading the values from the array (in your example, it seems that you're dealing with 'short' values), e.g.:

short* ps = (short*) pb;

short value = ntohs ( ps [ 0]);


0
 
LVL 8

Expert Comment

by:fl0yd
ID: 7170146
If you need to access a large file's data in memory you may also want to check out memory mapped files. You don't need to read the entire file into memory to be able to access it but rather map it into your process's address space and leave to rest to the OS. If you are developing for a windows platform check out the functions:

CreateFile
CreateFileMapping
MapViewOfFile
CloseHandle
0
 
LVL 30

Expert Comment

by:Axter
ID: 7170572
I agree with fl0yd method.

If you're interested in this method, I'll post an example.

I think you're better off using the ntohs() function jkr, mention, as you iterate through the data.
In other words, instead of converting the entire 15Mbyte of data, just convert as you use each section of the data.
0
Independent Software Vendors: We Want Your Opinion

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

 
LVL 30

Expert Comment

by:Axter
ID: 7170577
Is this an MFC project?

Can you explain why you need to read the entire file?

Maybe we can suggest a better method, if you can explain what you're trying to do with the data.
0
 

Author Comment

by:GoldStrike
ID: 7170628
thanks for the responses,

"I agree with fl0yd method.
If you're interested in this method, I'll post an example."

Axter, if you can post the example I would appreciate it.
Also, how would I go about converting the entire file if I needed to?

The data is real time radar data that is collected and saved to a zip disk. An analysis program will then read the file from the zip disk and perform various tasks on it, i.e. plotting the data, running computations on the data, formatting and printing a record of the data. We have utilities that do this, I was just looking for a faster way of accessing the information from disk.
0
 
LVL 86

Expert Comment

by:jkr
ID: 7170773
When it comes to nitpicking, loading 15MB as one chunk might be faster .o)

But, that also depends on the resource available - if memory is an issue, mamory mapped files might be better.
0
 
LVL 22

Expert Comment

by:ambience
ID: 7170959
hmm .. how about an intermediate approach, not using memeory maps and not loading all the file at a time too. But reading the file in chunks of say 8 kb or 64 kb (preferabbly any multiple of DWORD size) or  whatever value suits your purpose ?
0
 
LVL 8

Expert Comment

by:fl0yd
ID: 7171280
jkr: "When it comes to nitpicking, loading 15MB as one chunk might be faster .o)"

Possibly, if you are a machine. If you are human using memory mapped files might be slower but it sure as hell *FEELS* a lot more responsive and thus faster. So if you are developing for a robot use the fastest approach -- if it's going to be a human that is using your software you will have to take to other criteria for choosing your approach.
0
 

Expert Comment

by:JamesDuggan
ID: 7174189
Endian Swapping in C++.

This should sort out your endian swapping:

inline void uint8 MaskBits(uint16 f, uint16 m)
{ return (f & m) }

inline void EndianSwap(uint16 &Data)
{
  Data = uint16(MaskBits(Data, 0x00FF) << 8) |
         uint16(MaskBits(Data, 0xFF00) >> 8);
}


Can be easily extended to swap uint8s, uint32s etc.

Enjoy.
0
 
LVL 8

Expert Comment

by:fl0yd
ID: 7174316
James,
    judging from your excessive use of the 'inline'-keyword you seem to be looking for high performance. If that is the case, why spend so much time and memory performing instructions that have no real effect? Check out this revised version of your EndianSwap( uint16 ):

void EndianSwap( uint16& Data ) {
    Data = ( Data << 8 ) | ( Data >> 8 );
}

Moreover, in contrast to your version, this one does work as expected. Take a look at your MaskBits-function which has errors all over the place.


GoldStrike,
    using 16-bit WORD's on a PC is SLOOOOW. If you use them in 32bit-applications you will end up with constantly stalling the pipeline by switching to 16bit-mode. Let's assume you have a WORD in the lower 2 bytes of a DWORD. To swap those, use the following code:

#include <xutility>  // for std::swap<class T>

DWORD dwData = some_16bit_value;
unsigned char* p = &dwData;
std::swap<unsigned char>( *p, *( p + 1 ) );

While not being much of a gain (you'll lose around 7 clock cycles on a pIII each time you access 16bit data) it will definately make a difference when applied on 15mb of data.
0
 

Author Comment

by:GoldStrike
ID: 7174668
I agree with fl0yd on his approach to the big endian issue.
thanks for the responses.

Axter, if you have that example to post I would appreciate it. thks.

0
 
LVL 8

Accepted Solution

by:
fl0yd earned 1200 total points
ID: 7174724
HANDLE hFile;    // file handle
HANDLE hFileMapping;    // file mapping object handle
BYTE* pbFile;    // file pointer

// open file
hFile = CreateFile( lpszName, GENERIC_READ, ... );

// create file mapping object for that file
hFileMapping = CreateFileMapping( hFile, ... );

// map the file into process' address space
pbFile = (BYTE*)MapViewOfFile( hFileMapping, ... );

// here you can access the file through memory pointers
// do not alter pbFile, though, we will need it later on

// cleanup
UnmapViewOfFile( pbFile );
CloseHandle( hFileMapping );
CloseHandle( hFile );

You will have to decide what parameters to use whereever I omitted them. There are too many possible combinations to make a choice for you. Also, you will have to check for errors which I have omitted to keep the sample easy.
0
 

Author Comment

by:GoldStrike
ID: 7174736
thnks, That's what I needed.
0
 

Author Comment

by:GoldStrike
ID: 7174738
thanks for the help everyone.
0

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Article by: SunnyDark
This article's goal is to present you with an easy to use XML wrapper for C++ and also present some interesting techniques that you might use with MS C++. The reason I built this class is to ease the pain of using XML files with C++, since there is…
This article shows you how to optimize memory allocations in C++ using placement new. Applicable especially to usecases dealing with creation of large number of objects. A brief on problem: Lets take example problem for simplicity: - I have a G…
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…
The goal of the video will be to teach the user the difference and consequence of passing data by value vs passing data by reference in C++. An example of passing data by value as well as an example of passing data by reference will be be given. Bot…

670 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question