?
Solved

C++ Random Access Files

Posted on 2003-03-28
5
Medium Priority
?
1,409 Views
Last Modified: 2009-07-29
I'm a VB man trying to create a poor man's database (txt file based), using the randomn access file concept and C++.

I've already done something similar in VB using UDTs; is there anything similar to that in C++? Using structs perhaps? Or do I have to use vectors? When I first did a random access file db in VB, there was microsoft document telling you how to create and use random access, does this exist in C++? or does it not work that way with C++.

Thanks in advance.
0
Comment
Question by:Sade
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 3
  • 2
5 Comments
 
LVL 12

Expert Comment

by:Salte
ID: 8224606
First off,

VB's UDT is just a poor man's struct compared to C++ where you have full and complete both struct and class mechanisms available to you, so the "UDT" approach will definitely be both easier and work better for you in C++ than in VB.

Secondly, the random access of files works beautifully in C++ and it is extremely simple:

In basic you have a file that is a simple sequence of bytes. These bytes can be interpreted any way you want, as binary date or as text. Be aware that for a database you probably want to open the file as binary file even if you want to read text from it - you just have to take care of any CR LF translation yourself instead of having the system do it for you.

In the basic I/O you never specify a position in the file and sequential access is assumed. You read n bytes from the file and interpret it as some data type that require n bytes. The act of reading those n bytes will have the side effect that a file pointer will advance by n bytes. This file pointer start at position 0 when you open a file so if you never touch that file pointer you will read a file from beginning until end of file in a sequential manner. Every read will also have the side effect that the file pointer advances by exactly as many bytes as you read from the file.

Random access is handled by manipulating that file pointer value. You can both read it and set it. The file pointer is probably best explained if you view the file as one big byte array with index 0 being the first byte of the file and index MAX-1 being the last byte of the file. The file has a total of MAX bytes. The file pointer is then simply the index in that array that indicate where you want to start reading next time you do a read or where you want to start writing next time you do a write. Note that there is only one file pointer and it is used for both purposes. Writing also causes the file pointer to advance by as many bytes as you write.

To set the file pointer you have 3 types of operations.

1. You can set the pointer to an absolute value in the range 0..MAX-1. In fact you can also set it to be >= MAX, if you do and then do a write to the file, the file will grow to the new size and the bytes inbetween will be all 0.

2. You can set the pointer to a value relative to the current position. So setting the value to -3 will then move the pointer 3 bytes backwards and setting the value to +8 will move the pointer 8 bytes forward and effectively skip the 8 bytes between the current position and the new position. Of course, any value set must be valid so if the file pointer is currently at position 5 and you attempt to set it to -8 relatively it will not work and you will get an error return from the function that attempted to change the position.

3. You can set the pointer to a value relative to the end of the file, setting the value to 0 relative to the end will move the file pointer to the end. If you then start to write you will append the data to the file. If you start to read you will immediately get an end of file. Setting the position to -4 relative to the end and then read 4 bytes will cause you to read the 4 last bytes of the file.

The actual functions to set and get the file pointer depends on at what level you want to do this - i.e. what kind of files are you operating on.

If you're working on Win32 files and you have the HANDLE to the file you can use SetFilePointer to set and get the file pointer. The SetFilePointer function has a parameeter MoveMethod that can be set to the values FILE_BEGIN (relative to the start of the file - or absolute), FILE_CURRENT (relative to the current position) and FILE_END (relative to the end of the file).

This function is able to work on files that are very large (uses 64 bit file pointers) but can also work on old style 32 bit files (files with 32 bit file pointers). The function also return the resulting filepointer so you can call SetFilePointer with FILE_CURRENT and a relative position of 0. This will not change the file position but it will return the value.

You can also use SetFilePointerEx which is more convenient for 64 bit files but otherwise do the same function. In particular it can also have a FILE_CURRENT and relative position of 0 to read the current position.

If you're working on file descriptors and you have a file descriptor to the file you can use lseek() to both get and set the file pointer value. Like the SetFilePosition it has an argument to specify where the position is relative to.

lseek(fd,filepos,how);

Note that this function only works for 32 bit files, if you want to use 64 bit files you need to use a newer version of that function which is lseeki64(). The same arguments but the type for the offset and return value are 64 bit signed integers instead of 32 bit signed integers.

The how argument can be SEEK_SET (from beginning of the file - absolute - same as FILE_BEGIN in Win32), SEEK_POS (from current position - same as FILE_CURRENT in Win32) and SEEK_END (from the end of the file - same as FILE_END in Win32).

IF you have FILE * fp access to the file, you need to use fseek() instead. fseek() takes much the same arguments:

fseek(fp,filepos,how);

But fp is a FILE * pointer. One difference is that fseek() does not return the new position. To get the value of the file pointer you need to call ftell(fp).

The how argument is exactly as for lseek().

It appears that MS "forgot" to support 64 bit files for FILE * random access. There is no fseeki64 function documented. It is possible the function is there it's just the documentation division that's not doing their job - I don't have VC++ 6.0 installed on my Linux machine so I can't tell :-)

However, in C++ the most natural thing is to use fstream and if you have an fstream you have two seek functions available and two tell functions available. The point is that streams have two file pointers, one for reading and another for writing.

fstream file("foo.dat", ios::in|ios::out|ios::binary);

file.seekg(offset,how);

Will set the 'file pointer' for 'get' operations (reading the file) while seekp() is the function for 'put' operations (writing the file). The how argument is one of:

ios::beg    - same function as FILE_BEGIN in Win32.
ios::cur    - same function as FILE_CURRENT in Win32.
ios::end    - same function as FILE_END in Win32.

There are also a functions to read each of the file pointers:

fpos_t val = file.tellg();
fpos_t val2 = file.tellp();

will get the values of each of the file pointers.

To illustrate, say you want to read 5 bytes from the file at position 10 and then write 3 of them to position 20 and then skip one byte and write the remaining 2. This is how you do it in each of the interfaces. Note that this is silly to do in those that doesn't uses buffering, since it would be very inefficient to read/write bytes like that using the systme functions. This means HANDLE and file descriptors. The FILE * and streams uses buffering and they will handle this inside their internal buffers and you won't get any penalty for inefficiency.

char by5[5]; // storage for the five bytes.

Win32:
void do_it(HANDLE file)
{
   int len;

   SetFilePointerEx(file,10,0,FILE_BEGIN);
   ReadFile(file,by5,5,&len,0);
   SetFilePointerEx(file,20,0,FILE_BEGIN);
   WriteFile(file,by5,3,&len,0);
   SetFilePointerEx(file,1,0,FILE_CURRENT); // skip one
   WriteFile(file,by5+3,2,&len,0);
}

file descriptors:
void do_it(int fd)
{
   lseek(fd,10,SEEK_SET);
   read(fd,by5,5);
   lseek(fd,20,SEEK_SET);
   write(fd,by5,3);
   lseek(fd,1,SEEK_POS); // skip one
   write(fd,by5+3,2);
}

FILE * pointer:
void do_it(FILE * fp)
{
   fseek(fp,10,SEEK_SET);
   fread(by5,1,5,fp);
   fseek(fp,20,SEEK_SET);
   fwrite(by5,1,3,fp);
   fseek(fp,1,SEEK_POS); // skip one
   fwrite(by5+3,1,2,fp);
}

streams:
void do_it(fstream & file)
{
   file.seekg(10,ios::beg);
   file.read(by5,5);
   file.seekp(20,ios::beg);
   file.write(by5,3);
   file.seekp(1,ios::cur); // skip one
   file.write(by5+3,2);
}

Hope this explains random file access in C++.

Alf
0
 
LVL 12

Expert Comment

by:Salte
ID: 8224910
One thing I should warn you about.

If you plan on random access and text files that is usually a very bad mix.

The reason is that pure text files are usually separated in records where one records is a line and the records have different length.

This doesn't work very well if you want to remove a record in the middle of the table or move one record from one position to another or insert a record at a random place in the file.

I will urge you therefore to consider the following:

1. Let every record have a fixed size. If a line is shorter than the fixed size, let the rest of the record hold garbage. Of course, you must have some way of detecting the end of the record but a \n character might work well for that purpose. You should probably also have a small header holding the length and some other info anyway so this is a minor problem.

If a line is longer than the fixed size you must allow the line to continue on another record. This means that a line may occupy more than one record.

If you want to insert lines at random places in a particular sort order you should let the sort order be at some place outside the actual data file. The reason is that typically the data are large (the record size is big) and you want to minimize moving around such big data. Instead having a small index file where you have small records where the data of the index file is simply the file position or record number in the data file where the data is found can allow the smaller index file to be kept in memory for fast sorting, searching and whatever else you want to do. When you actually want to read any data you use the smaller index table as a lookup to find the record number in the data file and do a seek to that position and read the record. Typically the record should then hold the position of the 'next' record if the record hold a line that is very long and won't fit in one single record.

Deleting a record is then simply marking a record as deleted (removing it from the index table also) and then you can use that 'next record' field to insert it into a list of free records. When you need a free record you first check that list of free records and if non-empty you remove the record from the free list and fill it with data and create an entry for it in the index table if necessary.

This can give a very flexible system for textual data. However, raw text data is seldom a good idea for random access, what are you going to do if you want to insert a line of 50 bytes at a position X? If you have to move all the bytes already at position X through end to positions X+50 through end+50 (and extend the file) just to make room for that line insertion it can be a very costly operation to do so.

Alf
0
 

Author Comment

by:Sade
ID: 8225524
Thanks for your input, this is by far the most detailed answer I have received, ever; however, is there any way of using randomn access files in ascii, and not in binary?

I will give points next time, even if no further answer is posted.

thanks Alf.

0
 
LVL 12

Accepted Solution

by:
Salte earned 500 total points
ID: 8225744
Well, yes and no. As I explained above the problem is not in the random access, the problem is if you want to insert records in the middle of a file etc. Text files are typically 'unstructured files' which only structure is that they have a 'end of line' marker to mark end of each line. So inserting a line between two lines implies that you must physically move all the data after the line to a new location and then insert the line once you have made room for it. This is obviously too much work than its worth and the easier method is therefore to read the file line by line and make a new file line by line based on the original file's data and the modification you want to make.

Binary files can be created with a structure that allows you to let one record have a pointer to where the related data is and so you can use random access to get those related data. Thus inserting a record between two records in a binary file can mean that you simply store the new record at the end of the file and then update pointers so that the new record comes logically between the two.

Such update on text files isn't really possible unless you are very sure that the data you want to modify and the modified data have exactly the same number of bytes. In that case you can just write the data as you would for a binary file.

In a binary file such updates are possible because every file pointer is 4 bytes or every file pointer is 2 bytes or whatever size - but the size is determined beforehand and so replacing some data with other data is easier to do because all sizes are fixed - well, more or less.

A text file have one line of 3 characters and end of line and next line can be 58 characters and end of line and if you then want to change that 3 char line into a 7 char line there's no room to do so.

A binary file typically has a header and then a bunch of records - all of which are the same size and it is much easier to insert and remove records from it.

Of course, it is possible to make binary files with variable sized records too, there are in fact several ways to do that, but they all rely on the fact that the logical ordering isn't necessarily the same as the physical ordering.

For example you can have a table in the binary file with pointers (file offsets) to each line and their length. The line data can come immediately after the table if you want to make it all in one file and simple.

Now if you want to insert a string between table entry [4] and table entry [5] you simply move the data in the table (they are small entries and easy to move) to make room for a new table entry. You append the string at the end of the file (where there is room) and then you set the file pointer of the new entry to point to the new string. The point is that the string data itself is appended at the end of the string but if you list the contents you will see it is stored in entry [5].

If you want to avoid that extra move you can simply turn the table into a list, have each entry contain the index of the next entry, if so you just insert the new entry at the end of the table (where there is room) and then you change the entry [4]'s next index to be the index of the new entry and set the new entry's next index to be entry [5] (which now becomes entry [6]).

Things like this is possible because a binary file contain data structures and not plain text.

Alf
0
 

Author Comment

by:Sade
ID: 8263970
Thanks again.
0

Featured Post

Industry Leaders: We Want Your Opinion!

We value your feedback.

Take our survey and automatically be enter to win anyone of the following:
Yeti Cooler, Amazon eGift Card, and Movie eGift Card!

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

When writing generic code, using template meta-programming techniques, it is sometimes useful to know if a type is convertible to another type. A good example of when this might be is if you are writing diagnostic instrumentation for code to generat…
Written by John Humphreys C++ Threading and the POSIX Library This article will cover the basic information that you need to know in order to make use of the POSIX threading library available for C and C++ on UNIX and most Linux systems.   [s…
The goal of the video will be to teach the user the concept of local variables and scope. An example of a locally defined variable will be given as well as an explanation of what scope is in C++. The local variable and concept of scope will be relat…
The viewer will learn how to use the return statement in functions in C++. The video will also teach the user how to pass data to a function and have the function return data back for further processing.
Suggested Courses

765 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question