asked on

Is this using or abusing polymorphism?

Say that I have a bunch of cars and I want to
store two types of information about each car:
style attributes and performance attributes.

I want to keep these types of information separate
to maintain data locality in my code.
But, I also want to be able to share information
between the two.

Is this a reasonable use of polymorphism?

Class Car {
public:
virtual ~Car();
}

Class CarStyle: public Car {
ColorType color;

bool IsRed() { return color==red } ;
}

Class CarPerformance: public Car {
TimeType ZeroToSixty;

bool IsRedSportsCar () { return ZeroToSixty < 7.0 && IsRed() } ;

ASKER CERTIFIED SOLUTION

nietod

membership

This solution is only available to members.

To access this solution, you must be a member of Experts Exchange.

Start Free Trial

nietod

Public inheritance is often called a "is a" relation ship. This means that the derived class "is a" whatever the base class is, but it also is "more" than the base class. For example, you might have somethin like

class vehicle
{
int TopSpeed;
};

class car : public vihicle
{
int NumberOfCylinders
};

here car is a type of vehicle so it "is a" vehincle and is publicly derived from vehicle. Car is also more than a vehicle. Byt that I mean it has attributes that the vehicle does not, like the NumberOfCylinders.

The ica go on an on, like

class SportsCar : public Car
{
};

class StationWagon : public Car;
{
};

Again SportsCar "is a" Car. StationWagon "is a" car.

continues.

nietod

Now look at what happens when you mess up the "is a" rule. If you do

class Jet : public Car
{
};

A jet declared like this has a NumberOfCylinders (inherited from Car). But what does the NumberOfCylinders stored in a jet mean? It doesn't mean anything, jets don't have cilinders. The problem is that a jet is not a car, it is a vehicle, though, you can do

class Jet : public Vehicle
{
};

and it will make sense. A jet is a vehicle it will inherit the topspeed from the vehicle class, which does make sense for the jet.

continues

klopter

ASKER

My problem is that I want to have my
cake and eat it too. I want the power
that C++ gives me, but I want the
control over data placement (for
performance reasons) that C gives me.

In the example above, I want all the
attributes in CarPerformance to be
stored close to each other because
I will be accessing them a lot and
storing them close to each other
will improve execution time.
(Execution time is what I actually
do know something about.)

I am currently leaning toward making
CarPerformance a derived class of Car.

All the cars will be stored as a
vector. And the performance attributes
will be stored as a separate vector.

I know that this is not the purpose
of polymorphism, but I don't see
a better way to achieve what I want.

Ken
P.S. I was delighted to et e-mail
indicating that you are on-line today.
I am trying to completely rewrite my
code (which is itself a rewrite) in
an attempt to do things closer to right.

nietod

From this it shoudl be obvious that "CarStyle" is not (i.e. isn't "is a") a "Car" That would be an attribute of a car, i.e. that should be a data member inside the car class (probably, it does depend on you needs, shich I have to guess at somewhat)

Given the information you originally presented, i can sort of see this sort of design emerging, like

enum ColerType = {red,blue,green,white...};

class Car
{
colorType Color;
public:
virtual bool IsSportsCar()
{
return false; // Car is not necessarily a sports car.
};
bool IsredSportsCar()
{
return IsSportsCar() && Color == Red;
};
}

class SportsCar : public Car
{
public:
virtual bool IsSportsCar()
{
return true; // Indicate this is a sports car.
};
};

In this case, the base class has defined an "inquiry" function--IsSportsCar(). This is a virtual function that allows the derived class to "answer" the "inquiry" differently than the base class. This is a useful approach for thses sorts of situations, but should really be used only in cases where you know that class heirachy is very well defined and will not have later additions in the future. The problem with later additions is that they may not fit will into the scheme defined by the base class. I.e. in this case the base class tried to clasify everythign the in hierarchy as either a regular car or a sports car. What if you add a anitique roadster to the class hierachry? Weill when it was made it was a sports car, but by today's standards it is not. Perhaps there are times when it should "answer" true to IsSportsCar() and times when it should be false. That is a problem.

In these cases the better solution is to try to let the derived classes handle any sort of task in which this might matter. i.e. any task that would have called IssportsCar() shoudl really be a virtual function and the derived class can then choose to handle the task based on what it "knows" about it self, rather than based on some atribtrary decision of the base class. Like if you had a function to enter the cars int a race, you don't want the sportster entered as a regular car or as a sportscar, the only two choices avaialbe in the desing above. Instead you have a virtual EnterRace() function and the Roadster class will enter the car as an antique.

Does that help?

nietod

>> In the example above, I want all the
>> attributes in CarPerformance to be
>> stored close to each other because
>> I will be accessing them a lot and
>> storing them close to each other
>> will improve execution time.
>> (Execution time is what I actually
>> do know something about.)
First of all. I am a DIE HARD assembly programmer. I have been programming in assembly for 15+ years and have generated increadibly FAST code that way. So understand where you are coming from. But If you switch to C++ you can still make increadible gains in speed, but usually by different approaches. C++ is so much more expresive so much more reliable than Assembly (or C) that you can employee much more efficient algorithms that would be unreasonably difficult to do in C or assembly. Things like lazy I/O, reference counting, pre-calculating, etc can often be done very easily under C++ and produce huge beneifts that small tweaks with registers and memory can't begin to account for. You might want to consider reading the "Zen of Code Optimization" where he gives soem very good real-life examples of this. Things like someone takes an algorithm and spends 6 weeks optmizing it in assembly to get 15% more speed. Then another person rewrites it in an interpreted language, like basic and uses a different aproach and in 2 hours gets 99% more speed.

The first rule of optmization, is to profile your design and consider the 80-20 rule. It is extemely extremely unlikely that the location of the data (in this sort of example) will ever slow things down enough to be measurable much less to make it warranted by the 80-20 rule.

However, if, you want the data to be stored together, store it a data member of the class, as I suggested above.

>> I know that this is not the purpose
>> of polymorphism, but I don't see
>> a better way to achieve what I want.
Tjere is a better way. I don't know what it is yet, but there is a MUCH better way.

Can you provide some more details about the design?

klopter

ASKER

I am afraid that I chose a poor example, or a poor description and got this
discussion on the wrong track.

I understnd that CarStyle and CarPerformance
are not cars and hence ought not to be
derived from a class named Car.

What I want is the C++ equivalent of:

typedef struct stylestuff ...

stylestuff carstyles[ MAXCARS ] ;

typedef struct performancestuff ...

performancestuff carperformance[ MAXCARS ];

This forces the carperformance information
to be stored together and the carstyle
information to be stored together.
In C I would just have cartsyles[ i ]
refer to the same car as carperformance[ i ].

I want to do that within C++ but
I want to be object-oriented and not
array-index oriented. In other words,
I want an appropriate linnkage between
carperformance and carstyles.

Ken

nietod

>> In C I would just have cartsyles[ i ]
>> refer to the same car as carperformance[ i ].

If I remember correctly, you are currently storing the data in a STL list. If you switched to a vector then this would be the same. i.e. you can use a single index value to refer to an item in either vectopr (array) (You can't do that with a list, because you have to use iterators, not indexe values.)

Another alternative is to group the style and performace together into one class/struct, like

class CarAttributes
{
CarStyles Style;
CarPerformance Performance;
};

Then store this class/struct in ONE list (or vector)

Another option would be to do away with the list/vector intirely. It seems to me like this data should be stored inside of a car class somewhere. But I really don't know enough about your needs to know for sure.

klopter

ASKER

OK. I will try to be more specific.

My job is to create a parallel code.
(Originally it was to parallelize an
existing code, but then we decided that
the best way to get there wasto rewrite
the serial code).

Most of the code will be parallel and
run on a shared memory machine with
say 256 Gigabytes of RAM. But, there
will be a small part of the code that
will remain serial because it takes
much less time than the rest and doesn't
lend itself easily to parallelization.
(We could parallelize it, but it isn't
worth the effort.)

My real reason for keeping one part of
the data together is so that the serial
part of the code (which only touches a
small subset of the total data) doesn't
have to run all over memory to collect
the pieces that it needs.

We have two choices for the serial part
of the code, we might run it on a
single processor machine (in which
case we will probably have only 16
Gigabytes) or we will just use one
processor (and leave the others idle).
If we switch to a single processor
machine we have no choice but to
reduce memory usage. But, even if we
run on a big multi-processor
shared memory machine, data locality
will be key to getting respectable
performance.

Ken

nietod

What platform is this for? 256 Gig of RAM!

>> only 16 Gigabytes)
How can you use "only".?

> so that the serial
> part of the code (which only touches a
> small subset of the total data) doesn't
> have to run all over memory to collect
> the pieces that it needs.
What worries you about that? Is it that you have to use multe-step "search" code that finds one thing, that then is used to find another and then another, that type of thing? Or is that the things that are being found are coming from widely seperated portions of memory and thus not using a memory cache effectively. (I assume you are not even considering virtual memory, as you shouldn't need it!)

Do either of the suggestions I made above help your? You either have to use an array class, which is similiar to your C solution. This gives fast aceess to items, but may be show when you do additions and deleteions (probably no worse than C was). Or you need to use a singe linked list. This gives you fast additions and deletetions, but groups together the two items into one.

Depending on this "search" or whatever it is that is collecting the peices does, you may be able to do some pre-calculation or lazy I/O to help make that more efficient.

klopter

ASKER

The platform is not yet specified.
We may be able to get away with only
128 or even 64 Gigabytes. There are
only a few 256 Gigabyte shared memory
machines in the world, but more are on
their way this summer.

I do plan to store them as vectors.
I will know the size upfront.

The parallel step requires pretty
much all of the data structure.
Hence, virtual memory is not realistic.
But, the serial step requires only
a subset of the data structure.

Yes, I want reasonable cache performance,
especially in the serial step.
The serial step is small enough
that we can afford to let it be serial,
but not so small that we can afford to
let it have miserable cahce performance
as well.

Ken

nietod

>> i do plan to store them as vectors.
Then you shoudl have the same performace and the same design with two vectors as in the C case when you use two arrays.

>> but not so small that we can afford to
>> let it have miserable cahce performance
>> as well
Why do you think the cache performance will be low? I don't see anything that would necesarily indicate this. It certainly shouldn't be appreciably worse than the C design. A vector will store items contiguosly (your aren't actually guaranteed this, but it will. If you want that guarantee, you can use a basic_string class instead, but I doubt it is worth it.) so you shoudl get about the same cashe performance as with a normal array.

klopter

ASKER

Yes. If I store them as vectors I
will get the performance behaviour
that I want (same in C as in C++).

But, if I store them as vectors
I don't get the OO behaviour that I
want - or at least I don't see
how to do so.

Copying from above:
What I want is the C++ equivalent of:

typedef struct stylestuff ...

stylestuff carstyles[ MAXCARS ] ;

typedef struct performancestuff ...

performancestuff carperformance[ MAXCARS ];

I plan to store carstlyes and carperformance as vectors,
but I want to treat them as objects
and I want carperformance[i] to
know about its relationship to
carstyles[i].

Ken

nietod

I'm not understanding what your problem is--or what you think it is.

>> I plan to store carstlyes and carperformance as vectors,
>> but I want to treat them as objects
I don't see any hinderance to this. I don't know if you mean each individial carstyle and performance shoud be an object, or if the arrays themselves should be an object, but either way there doesn't seem to be a problem

>> I want carperformance[i] to
>> know about its relationship to
>> carstyles[i]
the relationshsiip is pretty simple, so that doesn't seem to be a big deal, but by simply enclosing the two vectors in a single class, you can enforce the relationship. i.e you can be sure that when an item is added to 1 vector a corresponding one is added to the other etc..

klopter

ASKER

I obvious have a major mind block about
this. There is clearly something that
is so obvious to you that you can't
imagine what I don't understand.
Let me try again. I'll go back to
my example, changing structs to
classes and array to vectors.

Class stylestuff {
ColorType color ;
} ;

vector<stylestuff> carstyles[ MAXCARS ] ;

Class performancestuff {
TimeType ZeroToSixty ;

Bool IsRedSportsCar();
} ;

vector<performancestuff> carperformance[ MAXCARS ];

How do I implement IsRedSportsCar()?

I want to store the objects as vectors,
but only because I don't want to
create them one at a time.

I still want to be able to treat them
as objects and I want to the
performancestuff objects to know
about their stylestuff counterparts.

Can I? If so , how?

Ken

nietod

I think we may be homing in on our problem

>> How do I implement IsRedSportsCar()?
What class is this a member of?

It should not be a member of carstyles right? its not an attribute of a stle . Nor should it be a member of car performance, its not an attribute of a car's performance. So what class is it an attribute of?

Probably some sort of Car class. But I don't see that one mentioned.

klopter

ASKER

I agree that IsRedSportsCar() is not
a member of carstyles or of carperformance.

Maybe if I explain my app a lit bit
better. Forget cars. What I have
is a big (and messy data structure)
out of which (after applying a
bunch of heuristics) I get a simple
graph. All of this can be done in
parallel. Then in serial, I do what
amounts to finding connected components.

So what I really have is:

Class Nodes {
Lots of stuff

list<Nodes *> links ;
}

The nodes are stored as a vector:
vector<Nodes> this_can_be_200Gigabytes;

But, in the serial code what I want is:

Class Nodes {
very little stuff

list<Nodes *> links ;
} // This will be less than 16 Gigs in all

One possibility is that I can have
two class BigNodes and LittleNodes
and just copy from BigNodes to LittleNodes.
I'll do that if I can figure out
anything better. The copy
won't
cost much in space or time.
But, you can't call it elegant.

Ken

nietod

Is the little node class an exact sub-set of the big node class. i.e. it has some of the data member and some of the member functions of the bignode class, and has no data or functions that the bignode doesn't have.

Also, is there an exact coorespondance between objects in the little and big node classes. i.e. for every object of the bignode class, is there one of the little node class and vice versa. (In which case the goal is to see it that they share a single copy of the information.)

klopter

ASKER

The data members in LittleNode are a pure subset
of the data members in BigNode.

The member functions in LittleNode could be
considered a subset of the member functions in
BigNode. But, the member functions in LittleNode
that are used by the serial program
to do the pseudo-connected components
algorithm would not be used in BigNode.

Every object in BigNode corresponds to
one and only object in LittleNode.

Ken

nietod

Then from an OOP standpoint, the ideal solution woudl be to have the Bignode derived from the littlenode. Then on a large memory (multi-processor) machine you can store bignode objects (one time i.e both "parts" of the program could use the bignode objects) and on a small memory machine (single processor) you could just use littlenode objects to reduce memory usage. As long as the serial algorityhm works with references to littlenode objects (either LittleNode * or LittleNode &) then the code can actually work with BigNode objects and not even "know" it.
The problem then is how to store the objects. The serial code can't have an array of its own objects, that would mean duplication of data when the parallel algorithm is running. What I would do is create 3 containers (lists?):

A list<BigNode> that is used by the parallel algarithm and is populated only on a big memory machine.

A list<LittleNode> that is used (only at times) by the serial algorithm and is popilated only on a small memory machine.

A list<LittleNode *> (not this is a list of pointers, not LittleNodes. that is used by the serial algorithm for its main work. i.e. when it needs to do calculations, this is where it goes to get data. This list is populated from either the Bignode or littlenode list, which ever is appropriate.

The reason for the List<litleNode> is to store little node objects in the case where there is no list of bignode objects. This way all the objects (either small or larger) are stored directly in a list, so you don't have to delete them directly and you still have a list of pointers. (usually when you have a list of pointer, you have to worry about deleting the items being pointed to, but that gets messy if those items might not be dunamically allocated, like if they come from another list at some times and not others. This avoides that issue.)

Now this approach definitely does not address the caching issue. But honestly, I don't think you will ever notice it. Unless the operations on the objects themselves are extremely short and fast, the fact that the data may be widely seperated in memory and the fact that I added an extra indirection is just not going to be noticable.

(I was just doing some profiling a few weeks ago to measure the affect of an extra indirection on code that is VERY short and fast and found it almost impossible to measure. In fact In one test (and this was consistently reporoduced) the code with the extra indirection was about 0.7% faster. (and in another test about 1.5% slower.) So this stuff tends to be meaningless, especially if there is "real" work beign done as the time for that tends to dominate.

KangaRoo

struct LittleNodeData;
struct BigNodeData;

typedef int index_type;
typedef int link_type;

static LittleNodeData little_data[MAXNODES];
static BigNodeData big_data[MAXNODES];
static List<link_type> nodelinks[MAXNODES];

class CarProxy
{
private:
index_type me;

public:
Color GetColor()
{ return BigNodeData[me].GetColor();}

};

klopter

ASKER

That makes sense now.

I do have one last question on a
related issue. This code will operate
on both real and simulated data. When
we operate on simulated data we know
what the final answer will be and
hence we have a sister data structure.

Again we have a case where the program
will sometimes have this sister data
and sometimes won't. Here are the
possibilities that I can see:

1) Include a pointer to the sister
data structure into the main
data structure. On real data runs it
will be NULL. [ I think I will do this -
I can have member functions that follow
this pointer transparently. ]

2) Have the sister data structure be
derived from the main data structure
(as you, nietod, recommended above).
This seems like it would work fine,
but it may conflict with using this
technique for my parallel/serial
solution.

3) #ifdef's
This would be a last resort.

nietod

>> 3) #ifdef's
>> This would be a last resort.
I don't know if that is so terrible, although I woudl probably combine it with the other solutions, like in 1) include the pointer only on some compilations (unless you want the final program to be able to do both types of "runs".) As long as the number of places that the #if would appear is reasonable small and well defined, I don't think it is such a bad thing. I would try to avoid them if they occur in lots of places, though.

>> 2) Have the sister data structure be
>> derived from the main data structure
That seems fine too, but now you have to extend the technique I mentioned You will have to ahve a list of these data structures that may or may not get populated. And the calculations will be done using a list (or array) of pointers to either this list/array or the one without the extra data. Its adds a little extra overhead, but again it should be no big deal.

>> 1) Include a pointer to the sister
>> data structure into the main
>> data structure.
This would be fine too, but you do need to take steps to unsure the proper use of the memory it points to. i.e destructors need to be sure to delete the memory, copy and assignment need to manage the pointer correctly etc. So like solution 2 there will be some extra overhead and some extra programming to be done. I think 2) is the safest in terms of not have as much room for potential mistakes, but both 1 and 2 can be made to be perfectly safe.