why "." vs "->"

Im an old, emphasis on old, K&R guy, so I am very well versed in accessing members of structures, unions, and now classes with "." notation vs "->" notation.  I was asked this morning "why the difference?"  My answer started with "That's easy," and then every instance that I could come up with I ended up deciding that a modern compiler could easily overcome the issues.  

So, given a structure or class foo with a single member "a"

I understand that
       struct foo *pfoo, afoo;
       pfoo=&afoo;
       pfoo->a would be the appropriate call

OR
      struct foo afoo;
      afoo.a would be the appropriate call

but why are they separate.  In case 1, why can't the compiler sort out pfoo.a or in the second afoo->a.  There must be a case where this behavior would be unacceptable, but I am trying to fathom what it is.
rickhill11Asked:
Who is Participating?
I wear a lot of hats...

"The solutions and answers provided on Experts Exchange have been extremely helpful to me over the last few years. I wear a lot of hats - Developer, Database Administrator, Help Desk, etc., so I know a lot of things but not a lot about one thing. Experts Exchange gives me answers from people who do know a lot about one thing, in a easy to use platform." -Todd S.

Karrtik IyerSoftware ArchitectCommented:
-> is an indicator to compiler to do a level of indirection before invoking the method in your example.
Since pfoo is a pointer it only contains address of afoo.
Say when you created afoo was created at an address of 0x1000, so the memory for foo object is allocated at this address, so when you do afoo. the compiler knows that no need to jump or go to another address to get members of foo which is allocated for afoo.
But pfoo contains address of afoo, so say pfoo is created at 0x2000, and it contains a values of 0x1000, so when you pfoo. compiler cannot find memory allocated for members of afoo starting at 0x2000, instead it has to jump or indirect itself to 0x1000 to find the members of afoo hence for pfoo - > is required. When you debug the program see the value of pfoo it shall be equal to address of afoo..also try to print the size of pfoo versus size of afoo you shall understand what I am trying to explain.
Kent OlsenDBACommented:
Hi Rick,

C wasn't designed from the ground up.  Actually, it evolved from a now dead language called B and B used the asterisk to designate a pointer.  C just carried on that practice.

I've long maintained that modern compilers don't need separate operators for struct and pointer to struct.  And for a single object that's true.  But a single operator can get really messy when you're mixing pointers and structures in an array.  Every reference would have to be explicitly cast to (struct) or (pointer), unless operator precedence was established to designate a default.  And even that could wind up being some very ugly source code!


Good Luck!
Kent
rickhill11Author Commented:
Kent,,

I'm trying to understand your array example.  Can you be more specific?  Assuming that the array contained some sort of mish mash of data types, then a union or explicit cast would have to be used.

For instance ((mystruct *)db)->element is required, but why couldn't the compiler sort out ((mystruct *)db).element?  It seems to me that the keepers of the keys either wanted to keep these separate for simply historical reasons, or there is some place where a pointer dereferenced with a ".", or a structure member accessed with a "->" leads to unwanted side effects.  Since the syntax for structures, unions, and classes are so intertwined, the reason could easily be related to any one of the three.

This is not a burning issue, but I just like to understand the "why" of things.

Rick
Amazon Web Services

Are you thinking about creating an Amazon Web Services account for your business? Not sure where to start? In this course you’ll get an overview of the history of AWS and take a tour of their user interface.

phoffric\Commented:
Since pointers are a source of many code errors, I like the idea that I can tell that a variable is a pointer by its use of p->a or (*p).a. (I also like the idea of having some prefix to indicate pointers or references.
Kent OlsenDBACommented:
Start with the basics.  Does a dynamic array contain an array of structs or an array of pointers?  Is data within a struct in an array another struct or is it a pointer to a struct?


It gets ugly....
rickhill11Author Commented:
Phoffric,

I get your point, and agree to a large extent, but it still doesn't answer why the language enforces this.

Rick
rickhill11Author Commented:
Kent,

Ugly it may be, but regardless of the type, whether an array of pointers, or an array of structures, the compiler will throw an error if "->" is used on an element of the array of structures and also if "." is used on an element of an array of pointers.

Take a linked list where you want access to  mystru->ptr->ptr->ptr.element.  I have no problem with doing it; I've been doing so for over 30 years, but I still wonder why the compiler can't understand mystru.ptr.ptr.ptr.element.  Again, I'm not arguing the syntax, I am simply trying to explain the "why" to a colleague.
Kent OlsenDBACommented:
Let's break the question into two parts.

Why did C originally have different syntax for structures and pointers?  Because Dennis Ritchie designed the language that way.  (Though they are closely related, referencing an object by name or by address really are different things and C was intended to be an operating system - unix - implementation language.  As a former O/S designer/developer, trust me when I say that you don't want that kind of ambiguity in the implementation layer.)

Why does the language continue to maintain these items "as is"?  C is perhaps the most widely known, and used, programming language in the world.  Such a drastic change would have to get past all of the C committees.  That certainly hasn't happened yet.


Another area to consider is parameter passing.  Pass by address and pass by value are two entirely different things.  From the beginning, C required that parameter access be an atomic operation.  That is, the parameter had to be accessible via a single instruction so addresses, integers, floats, and char values are valid parameter types.  Structures, unions, strings, etc. are/were not.  Automatically converting a struct to pass the correct type would violate the pass by value / pass by address rules.
sarabandeCommented:
to add to above comments:

why can't the compiler sort out pfoo.a
i think it could. it is not a question of ability but simply that the syntax, rules, and the grammar of c language requires  . for structure members and -> for pointer members.

a similar thing is that variables may not begin with a digit or have to be case-sensitive though the compiler easily could handle this.

if you look at reference variable in c++ you see that it is not so much different to a pointer variable beside that you can't reassign it to a new address. you find this type also in fortran and newer languages like java and c#. i would assume that Bjarne Stroustrup wanted to create a variable type that could be passed by address but was less error-prone and dangerous than pointers. so i would like to support phoffric's point that the difference is good both for readability and error reduction.

Sara
peprCommented:
+1 for sarabande's reference remark.

From another point of view, the programming language should reflect both the way the programmer thinks, and the way the things are implemented. The program (and the source code) is static, the execution is dynamic. You want also to re-use the source code. So, even the part of the source code (syntax) must have a clear meaning (semantics).

One can give compiler more decision power of how it should be implemented. However, it brings trade-offs -- you are telling your compiler you do not want to make your decisions. To make it short, when not making a difference between . and -> syntactically, you tend to (have to) end up with the same implementation. And you get Java or C# or Python or whatever similar. It is fine when it is fine. But, as a programmer, you may want to express the difference.

Wanting the unified approach for the pointer syntax and non-pointer syntax, you actually want to have a generic source code (think about library functions that would not know the type in advance). Actually, in C++, it is the job for templates. And playing with templates, it is clear that the compiled generic approach is possible, but it is not the easy part of the language. So, the unified approach would make all source text be of the same "easiness" as the work with templates.

Petr

P.S. I like the theme -- interesting discussion ;)
Subrat (C++ windows/Linux)Software EngineerCommented:
Adding to other experts answer...

Lets start an ex:

#include <iostream>
using namespace std;

class Ex {
public:
 void fun() {
     std::cout << "I am fun()\n";
 }
};

int main() {

      Ex ob;
    ob.fun();

    Ex* ptrEx = new Ex;
    ptrEx->fun(); // One way
   
    (*ptrEx).fun(); // Another way.
   
      return 0;
}

here observe in function call, we can say easily what is ptrEx and what is obj. Which is pointer and which is obj.
Using address we we are calling function, we need to use -> while using obj if, need to use . operator. If using pointer need to call function, we need to extract the obj from that by saying *ptr then we can call function using it by . operator.
It also can be eliminated . and -> by compiler designing. Everything is depending upon compiler designer to do this. As already discussed above how C# and java works.
But in C++ language feature tell us to do it in this manner.

Hope you may be clear now!!!
phoffric\Commented:
In the beginning there was C. Well, actually, I think it was B, and before B, there was something else, and before that, there was assembly and before that there was machine code.

Now, Ritchie made up C as a convenience for himself and his crew at AT&T (or was it Bell Labs at the time), and it was good. I mean that it was so good that the many few programmers in the world adopted it and said it was good. Some may have even been zealots. After all, it was almost assembly language, but a lot easier to manage.

ANSI C came around in 1988. I know that some ANSI/ISO formulations are done by committees consisting of compiler vendor representatives and other people. Adding new features costs these compiler vendors lots of money not only to implement, but also in writing test specifications and testing.

Members of the committees can make requests for new features and probably there are forums where ordinary computer scientists and application developers can make requests. If enough people make the request, then maybe, if there isn't too much opposition in terms of complexity and costs, the request may go through.

I guess your idea just never got approved, or it was not suggested. After all, part of the K&R philosophy was to keep C simple (and let C++ be the elephant).
ozoCommented:
In c++, you could define a.a, (*a).a, and  a->a to all return something different, if you wished to be so perverse.
#include <iostream>
class foo{
  public:
  int a;
  foo(int a):a(a){}
  foo operator *(){
    return foo(2);
  }
  foo *operator ->(){
     return new foo(3);
  }
  operator int(){
    return 4;
  }
};
int main(){
  foo a(1);
  std::cout << a.a << std::endl;
  std::cout << (*a).a << std::endl;
  std::cout << a->a << std::endl;
  std::cout << a << std::endl;
}
Subrat (C++ windows/Linux)Software EngineerCommented:
Ozo,
What about my simple soln
phoffric\Commented:
I think my last post answers your question as to the "why" the compiler rejects your alternative form.

But looking back at kdo's post, http:#a41373505 , I realize that I must have only read his last paragraph. I think I duplicated much of what kdo said. So, unless you find some additional information in my last post, you can give points for my last post to kdo.
ozoCommented:
What about my simple soln
In your simple solution, which would often be preferable to a perverse solution,
 ptrEx->fun() and (*ptrEx).fun() are synonymous, while ptrEx.fun() would be a syntax error.
rickhill11Author Commented:
Just to be clear, my original question was neither a request for a change, nor a suggestion.

I am simply attempting to answer a question that was put to me.  The question was "why does the language enforce the difference between '.' and '->'?"

I don't mean to be overly picky, but none of the answers resonate with me.  This doesn't make them wrong, they simply don't resonate, and I will have trouble passing along information that I am in doubt about.  In K&R 'C' where I started 30+ years ago, it made a lot of sense; however, even then, a person could make the argument that a compiler faced with a pointer to a structure could easily handle something like foo.bar in lieu of foo->bar.  For instance, it the old compilers passing an argument like char foo[] and then addressing is as *(foo+10) would throw an error.  Now some, maybe all, compilers seem to be comfortable with this.

To me the answer must be simple and be one of:
     1.  The various committees either didn't consider this, or wanted to maintain a strict notion for the programmer whether he/she was using a pointer or not.  It may very well be as simple as this, but if so, it would be interesting to read the notes from the meeting(s) to understand their logic.
     2.  It was simply too hard to accomplish-----I really doubt this.  There have been many more changes to the language, that to me seem to be at the margin, that had to be much harder to implement.
    3.  There is some case, without redefining operators, where interpreting foo.bar as if the programmer written foo->bar, or vice versa would lead to unexpected or invalid results.  To me this is the most likely option, but I haven't seen a good example.

Sarabande had a good example when she mentioned variables not beginning with a digit.  However, this might be more related to compiler speed.  Her suggestion, which was echoed by others that it is simply one of the requirements of the syntax, may be true.  Usually though, if you dig deep enough into an issue like this, the underlying reason stands out.  That is what I am trying to discover.

It has been an interesting conversation though.  I'll leave it open for a few more days just to see if anything pops.
Kent OlsenDBACommented:
Hi Rick,

We've all kind of danced around it, but the answer is simply that the language has different operators for each operation, just like it has different operators for addition, division, logical difference, etc.  

As someone that has written and maintained compilers and operating systems, let me offer that when Ritchie designed the language, it made good sense to have separate operators for each operation.  There was no real code base to enhance so the operations had to implemented from scratch.  The fact that they are strikingly similar in logical usage doesn't change the fact they are different operations and the compiler will generate significantly different executable code for each operation.

If this was 1978 and I was designing C from the ground up, there are some things that I'd be tempted to do differently.  For example, logical operators could be the doubled operator of the algebraic equivalent.  (+) is still addition, but (++) is a logical AND.  (-) is still subtraction and (--) is logical DIFFERENCE, etc.

But I'm not writing C today and don't get to make that decision.

Ritchie was writing C 37 years ago and the decision that he made was one of implementation simplicity and nobody has changed the language to make the enhancement that you've asked about.


Kent
peprCommented:
I will reverse the question...

Having a.b technically means that you have an address related to a, and the item offset and length of the item related to b.

Having p->b means that you have an address in p that contains another address, and only then the b is applied as offset to the other address.

Having a library source code--in your opinion--how the two syntax/semantic variants should be unified?
phoffric\Commented:
There does not have to be an underlying reason to not do something. The "something" may simply have never been considered. One could look at C++ features (and other questions) and ask why some of those features were not included in C. I am guessing that not all of these features have been considered and then rejected.
Kent OlsenDBACommented:
One more comment before I consider this horse dead and beaten.

C was developed as the implementation language for unix, replacing assembly language.  If you think of it in this context, the structure feature would have been implemented to organize key kernel parameters before the need to pass structures (or structure pointers) existed.  The development of task management, device drivers, etc. soon made structure pointers a necessity.  But structures as static entities was already defined.  Ritchie made the decision to use different operators for accessing static structures and dynamic structures, probably because they are, in fact, different operations.
ozoCommented:
In C++, you can define a class such that a.b, (*a).b, and a->b are all synonymous.

In C, you can find some of the committee's reasoning here: http://www.open-std.org/jtc1/sc22/wg14/www/docs/C99RationaleV5.10.pdf
They mention that for pointers to functions, using them as (*pf)() or as pf() is unambiguous.
I don't see a mention of a specific reason for disallowing a.b on a structure pointer as a synonym for (*a).b
but since using a or (*a)  would not be unambiguous, the same reasoning would not apply.

In C,  a->b is a synonym for (*a).b, so making a.b and a->b the same would imply either breaking that correspondence, which seems severe, or making a and (*a) the same, which would introduce ambiguities.
When a, *a, **a, ***a all mean different things, making a.b (*a).b, (**a).b, (***a).b all mean the same thing may get confusing.

Experts Exchange Solution brought to you by

Your issues matter to us.

Facing a tech roadblock? Get the help and guidance you need from experienced professionals who care. Ask your question anytime, anywhere, with no hassle.

Start your 7-day free trial
rickhill11Author Commented:
Again, thanks for your thoughtful answers.
It's more than this solution.Get answers and train to solve all your tech problems - anytime, anywhere.Try it for free Edge Out The Competitionfor your dream job with proven skills and certifications.Get started today Stand Outas the employee with proven skills.Start learning today for free Move Your Career Forwardwith certification training in the latest technologies.Start your trial today
C

From novice to tech pro — start learning today.