asked on

save/restore project file with mixed/multiple classes

hi guys,

I think that I'm doing it again - giving away points when I should just RTFM & hack the code. Ah well, that's laziness for you :-)

Basically, I want my users to be able to save/restore the current state of their project. This - obviously - means writing the program's internal data to a file & reading it back at a later time.

My data consists of 3 or 4 different classes, with the data of each class being in an STL <vector> of objects. I'd like to store a few data items as header first (4 bytes to identify my save file, program version of saver, maybe date/time, etc). Then I'll write out the program data.

So, the old Pascal problem - since the file isn't a neat collection of records all of the same type/length ...

Do I have to precede each collection of objects by a few identifying bytes for object type & number of objects so that I will know how many, of what type, to read bacK? I'm thinking that the answer is yes.

Do I have to overload the << operator in each class which I have defined? Again, I am thinking that the answer is yes.

Since I am using <vector> objects I don't have to worry about linkages etc. However, I am using Borland C++ Builder. Who knows about VCL components. If my class has a member which is of type TPanel, can it save & restore itself, or do I have to write the code (maybe some extra points for this part)?

What I think that I am saying here is that I could probably have coded this in straight C in the time that I have taken to write this post. However, I would like to do things 'properly' in C++ ... what's the generic/preferred/not too clever way of saving restoring program data of mixed types?

nietod

>> Do I have to precede each collection of objects
>> by a few identifying bytes for object type &
>> number of objects so that I will know how many,
>> of what type, to read bacK? I'm thinking that the
>> answer is yes.
That depends. (always helpful.)

If the format is aways going to be the same, like object of class A, then object of class B, object of class CC, then no you don't have to include this formating information.

However if the format changes, like there are 3 objects, but they can be any compinations of objects from class A, B, and C. Then yes you definitley need this sort of information to help you "understand" the file.

continues.

KangaRoo

>> ... do things 'properly' in C++ ...

"What is 'properly'?", one might ask. Is it using an overloaded << of some sort for IO? I think not, properly would be 'reusable and flexible'. The use of operator << does not automatically mean it is going to be 'proper'

Common classlibraries (like MFC, OWL and VCL(?)) which offer 'streamable classes' require you to derive from their streamable baseclass. Hardly flexible nor reusable.

An effective reusable start would be to simply add read and write functions for those classes involved, like:

void write(const Myclass&, FILE*);
void read(MyClass&, FILE*);

Or maybe you prefer C++ streams:
void write(const MyClass&, ostream&);
void read(MyClass&, istream&);

nietod

>> Do I have to overload the << operator in
>> each class which I have defined? Again, I
>> am thinking that the answer is yes
"Have to?" No. "Should I?" Maybe. helpful again.

You need to read an write this information, but how you do that is up to you. If the information will be read/written in only one place in the program, you coudl write a single procedure that reads/writes all the objects and other data. however, if you need toread/written from several different places you might want to consider adding member procedures to the classes that allow them to be read/written. This is also a good idea if the classes contain private data members of if their implimentation is expected to change. (it is always expected to change.) So there is a good argument for writting read/write member procedures for each class. On the other side of the coin, if you do change the class and therefore the read/write procedures, well any stored data will become invalidated (unless you store version numbers and includes features for reading older versions.) So there is an argument for letting the non-member code do the read/write, since this code should not change "quitely".

> if my class has a member which is of type
>> TPanel, can it save & restore itself, or do
>> I have to write the code (maybe some
>> extra points for this part)?
Are you familiar with the term "object permenance"?

I don't know if VCL supports object permenance, in what wats it does, but it is techniques by which objects can be saved and restored.

The general idea is, if you do it right, the sort of issues you are worrying about aren't issues. But in order to do this, you can't necessarily write out an abject "wholesale". like in C you can write out a structure using write(pointertostructure,sizeof(struct)). That is not safe in C++. frrst of all the class may contain data like virtual function table pointers that cannot be written. (furthermore the class may not even be contiguious in memory!). The class may also contain pointer data members. Writing pointers out makes no sense, when they are read back in they will be invalid. So the idea is you must reite each data member out individually and with consideration for how it will be read back in. If a data member is a number, that is no big deal. If the data member is a pointer to data, you probalby need to write out that data, not the pointer. Then when you read the data back in, you need to allcoate memory for the data and read into that memory, then store the pointer to that memory in the data member. When you write out a member that is an object of a class, you may have these same issues in that class. So it is best if that class provies its own "safe" procedures for reading and writting (And if it has members that are classes....)

If you are not familiar with object permenance, I recommend you get a book or two that covers it. it is not hard, but there are a lot of details and ideas that are worth considering.

Let me know if you have any questions.

KangaRoo

This approach allows yoy to incorporate things like vectors:

template<class T>
void write(const vector<T>&, FILE*)
{
// write vector header
// write each object in the vector
// opt. write a end.
}

Or, alternatively

template<class ForwardIterator>
void write(ForwardIterator first, ForwardIterator last, FILE*);

etc.

nietod

Just to elaborate on what Kangaroo said, there are two main approaches to writting vectors. You can either write out a count of the number of items in the vector, then write out each item in the vector. Or you can write out something that indicates the start of the vector (that may or may not be needed, that depends on the circumstances, but in general it is needed.) Then write out each item in the vector, then write out something that indicates the end of the vector. The problem with the 2nd approach is insuring that the semaphore that indicates the end of the vector is somehow distinct from the data in the vector, so the two cannot be mistaken.

My preferences is to use the first approach when writting data in a binary form and the 2nd apporach when writting it in ASCII form. The reason is the first approach is a little safer and easier (especially because you don't have to worry about that unique semaphore issue) so I prefer it. But when I write in ASCII, I like the ability to edit the data and change it in a word processor, so I prefer the 2nd approach in this case as I don't have to keep counting items in an array.

graham_k

ASKER

well, nothing new learned & 50 points probably thrown away :-) Don't worry, nietod, you'll probably end up getting the points. I just rejected to see if anyone comes up with a better answer.

I have about 6 or 7 different vector structures. There will be a different number of objects in each. So, either I precede each individual entry in the save file with a type, or each group of such entries with a type & number of entries.
Hmm, I will probably have to precede each individual entry with a type/length header. After all, if I have a class which contains an int, a float & a string, how do I know how to read them back ?

I won't have problems with pointers, I already thought of that. The <vector> takes care of the linked list aspects of pointers. Objects of one class can contain a <vector> of objects of another class, I will simply store a series of indices to the pointed to thingies (either a unique integer or a string) and when restoring I will have to recreate this vector too. (Thinks, I have a sort of class hierarchy here, so I will have to save the 'minor' stuff before the major. That way, when I restore the major stuff, I can always search the 'minor' stuff ... if you see what I mean).

You see, this sort of nightmare with a binary file & every field preceded with a type/length (whcih may be longer than the field itself), if the pain in the a*s sort of thing that I've always done with C and wanted to avoid somehow with C++.

graham_k

ASKER

"Just to elaborate on what Kangaroo said, there are two main approaches to writting vectors. You can either write out a count of the number of
items in the vector, then write out each item in the vector. Or you can write out something that indicates the start of the vector (that may or may
not be needed, that depends on the circumstances, but in general it is needed.) Then write out each item in the vector, then write out
something that indicates the end of the vector. The problem with the 2nd approach is insuring that the semaphore that indicates the end of the
vector is somehow distinct from the data in the vector, so the two cannot be mistaken. "

that's why I normally end up preceding *every* datum with a type/length field :-(

nietod

For a given set of circumstances, what you need to do is pretty clear. Unfortunately I don't know your circunmstances well enough.

>> I have about 6 or 7 different vector structures.
>> There will be a different number of objects in
>> each. So, either I precede each individual entry
>> in the save file with a type

but if the vectors always occur and in the same order, you only need to record the number of items, not the types. i.e. you would have a count of the number of items in vector 1, followed by data for each item in the vector, then the cound of the number of items in vector 2, followed by the data for each of its items etc.

if the vectors may change in number, or data types stored. Then you need to preseve the "type" of each object. (That is the case i have to deal with, in my software object permenances is used to great lenghs and objects are extremely interchangable so a paraticular array may contain objects of different types, so I must write out type information when ever I write out an object.)

KangaRoo

I agree with nietod, writing the size up front is easiest.

If its only serialization, a complete 'structure' of read and write functions would do all your work. The template version to write vectors would make it even less work...

nietod

>> Objects of one class can contain a
>> <vector> of objects of another class, I will simply
>> store a series of indices to the pointed to thingies
>> (either a unique integer or a string) and when restoring I
>> will have to recreate this vector too.
I'm not sure I understand your approach here. It might work, i can't tell. What I do is to store the "embedded" data in series with the other data So for example

class X
{
double D;
vector<int> IAry;
}

vector<X> XAry;

XAry.resize(2);
* * *

would be written like

int - (2) - size of the XAry;
double - First X's D.
int - First X's IAry length
int - First X's 1st IAry entry
int - First X's 2nd IAry entry
* * *
int - First X's nth IAry entry.
double - Second X's D.
int - Second X's IAry length
int - secondt X's 1st IAry entry
* * *

This sort of approach is best accomplished by having ead/write procedures for each class, so that the read/write procedure of one class can then call upon the read/write procedure for each of its members. So you would have a read/write procedure for vector<> that reads/writes the length followed by each enter.
Then X's read/write procedurs read/write the D data member then read/write the IAry data member (which involves calling the vector read/write which takes care of the length and number of entries etc.)

KangaRoo

template<class T>
void write(const vector<T>& v, FILE* f)
{
write(v.size(), f);
for( vector<T>::const_iterator i(v.begin());
i != v.end();
++i
) write(*i, f);
}

Now write(int,FILE*) and write(const T&,FILE*) should be defined.

nietod

>> The problem with the 2nd approach is
>> insuring that the semaphore that indicates
>> the end of the vector is somehow distinct
>> from the data in the vector, so the two
>> cannot be mistaken. "
>>
>> that's why I normally end up preceding *every*
>> datum with a type/length field :-(
And in a binary format, that can be very hard to do. (but actually there are ways, tis just not that worth it.) But in ASCII there are easier ways.

My ASCII format, for what is is worth, consists of the objects's type name, followed by parentheses containing the values for the data members within the class. like

class X
{
double D;
int i;
}

ASCII version would be
X(5.5,3)

if a data member is a class, then it gets embeded, like

Y(11,X(5.0,4),"a string")

if an data member is an array, it is enclosed in [...] and lists the items int he array, like

Z("a string",[X(1.2,1),X(1.3,7),X(5,0) ])

So you can see that the "]" can only be interpretted as the end of the array. (it can only occur at the end of an array or in a string, no room for mistakes there.)

That's just the synopsis, I have a lot of other features that may be neededd for complex cases. like to allow objects to refer to objects elsewhere in the data stream.

Are you writting in binary, ASCII, or both?

KangaRoo

template<class T>
void read(const vector<T>& v, FILE* f)
{
int s; read(s,f); // Yes I read sf occasionally
v.resize(i); // if implemented in your STL
// or v.resize(i + v.size())

T temp;
while(i--)
{
read(temp, f);
v.push_back(temp);
}
}

graham_k

ASKER

nietod :

"I'm not sure I understand your approach here. It might work, i can't tell. What I do is to store the "embedded" data in series with the other
data So for example ". I suppose that I could, but it might get messy as most 'outer' items won't have an inner item and a few will have many.

"
>> that's why I normally end up preceding *every*
>> datum with a type/length field :-(
And in a binary format, that can be very hard to do. (but actually there are ways, tis just not that worth it.) But in ASCII there are easier ways. "
IMHO, it's a *lot* easier to do this sort of thing as a binary file.

Take a look at http://gleam.tsx.org if you wish, there you can d/l some code which does this and the documentation which explains the file format. It's not required reading, though, and its always tough following someone else's code, so you might think twice before investing the time :-)

kangaroo:

I guess that what I was looking for was persistent objects (cheaply & easilly). Thanks for all of the sample code.

both:

I guess that I'll just stick to what I know & store it as I would if coding in C (described at length in th edocumentation at teh URL mentioned above). Thanks for an interesting discussion. Since you took the time & trouble to reply, I'll award you both 50 points.

ASKER CERTIFIED SOLUTION

KangaRoo

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

graham_k

ASKER

yes, I can see how it would be done. I'll give it some thought. I'm an old school C & Assembler programmer who uses C++ at home, just to keep up. As such, I have the choice of how to implement it.

Maybe I'll just plough ahead with what I know best, to get the project finishes, then tidy it up later.

Thans again.

nietod

Just to be clear, when I said it was harder in binary, I meant comming up with a scheme that allows you to differenciate between an object's data and the semanphore that ends the array, not that it is harder to embed type information.

>> look at http://gleam.tsx.org
Is that the right site. I didn't see anything there.

>> Implementing a system for serialization
>> or 'persistent objects' is not so hard at all
It depends on the nature of the classes and the number of classes that need to be converted to support persistance, but even at its most difficult, it probably is really is not that "hard". (Can be time consuming, I had to convert about 300 classes once....) The hardest case i can think of is when you have objects that "refer" (like have pointers) to other objects and those referred objects must be written to the data stream and it is possible that they are referred two in multiple places, and when you read the data back in you want all those different pointers to refer (point) back to 1 object, not seperate objects with the same data. That is probably the worst issue, and ussually it is not a concern, and it probably isn't for you.