Link to home
Start Free TrialLog in
Avatar of GrayGh0st
GrayGh0st

asked on

C++ Standard question: guarantees on string's iterators and c_str

First off, if you look at the address of the return value for the begin() iterator for a string, what is that address actually pointing to? Is it the physical location of the beginning of the string in memory? Or is it the address of the iterator object itself?

I'm on a Solaris machine, and I have a situation where a CONST reference to a string is being passed to a fcn. Occassionaly, upon immediate return to the calling code, the return value of begin() has a different address.

My question is, what guarantee does the standard make w/ regard to what iterators physically point to?

As an example:

//some fcn...
{
       string a = "lasjfaskd";

       string::const_iterator itr = a.begin();
       cout << &itr << endl;

       someOtherFcn(a);

       string::const_iterator itr2 = a.begin();
       cout << &itr2 << endl;
//only sometimes, these two statements will output different addresses
}

void someOtherFcn(const string& s) {
//essentially nothing is happening here, nothing that can have any affect on s - since it's const
       return;
}

What guarantees does the standard make w/ regard to this situation?
Thanks!

Justin
Avatar of jkr
jkr
Flag of Germany image

Though shalst not rely on what STL implementations do internally.

>>what guarantee does the standard make w/ regard to what iterators physically point to?

None. They are iterators. One implementation might use a char* directly for that another might use an opaque data structure.

>>nothing that can have any affect on s - since it's const

What about rearranging memory b/c of reorganizing the heap?
Avatar of GrayGh0st
GrayGh0st

ASKER

True... but what then can invalidate an iterator?
Correction to my above post: I know that normal things like erasing for example, will invalidate it. But I can guarantee none of that is happening in my code.

Justin
ASKER CERTIFIED SOLUTION
Avatar of jkr
jkr
Flag of Germany image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Sorry, I should have clarified. I'm not really using those cout statements in my code. That was just in the example. I was just trying to point out that the addy's are different when I look at them under the debugger - but didn't want people reading this to go through all of that.

>> One thing that for sure invalidates iterators are non-const operations on a container.

If I'm passing that string into that function as a const&, it can't possibly be allowing non-const operations on it.

Justin
SOLUTION
Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
I'd say the question is answered...
Thanks, but why a "C"? I am fully aware that this is not answered "A" from your point of view, but...
"C" is not appropriate here.  You have originally posed three questions, all of which have been answered.

Q 1:
GrayGh0st> [...]  if you look at the address of the return value for the begin() iterator for a string, what is that address actually pointing to? [...] Or is it the address of the iterator object itself?
A:
jkr> [...] the two iterators are stored in two different locations and you are outputting their *addresses*.

Q 2:
GrayGh0st> [...]what guarantee does the standard make w/ regard to what iterators physically point to?
A:
jkr> None.

Q 3:
GrayGh0st> What guarantees does the standard make w/ regard to this situation?

The "situation" here appears to be defined by your two comments, which follow:

Q 3.a):
GrayGh0st> //only sometimes, these two statements will output different addresses
A:
sonstkeiner> Are you sure that the addresses of itr and itr2 have ever been the same?

I bet that without optimization, the addresses are always different.

Q 3.b)
GrayGh0st> //essentially nothing is happening here, nothing that can have any affect on s - since it's const
A:
jkr> What about rearranging memory b/c of reorganizing the heap?
sonstkeiner> [...] someOtherFcn might internally use const_cast<string &>(s) to manipulate s [...]. The standard does not forbid this.
Honestly, not meaning to offend anyone, but I didn't feel like my question was completely answered.

W/ your help I do realize that outputting &itr and &itr2, will return different address - and rightly so. This was established in jkr's second post and I didn't think it needed further clarification. This arose from me printing the iterator in dbx. The output was a memory address and also the string to which the iterator was pointing. I mistakenly thought the addy was that of the iterator, and not the underlying memory.

>>You may however expect them to have the same value.

I would.

>>In the dbx debugger, something like 'print (int)itr' and 'print (int)itr2' should therefore yield the same values.

They didn't (sometimes) - the root of the problem.

I then clarified my question:

>> True... but what then can invalidate an iterator?

I researched on my own, and found that I was not performing any operation that could even possibly invalidate an iterator (plus having used const), as I mentioned in earlier posts.

>> Using const is a promise.  Promises can be broken.  Especially in C++ (;-).
>> For example, someOtherFcn might internally use const_cast<string &>(s) to manipulate s, for whatever reason.
>> The standard does not forbid this (but it also may not guaratee that this would work, not sure here).

That's kind of fuzzy, but as jkr said:

>> Though shalst not rely on what STL implementations do internally.

Not that it should have any bearing on this question, but I found the bug whilst subtracting iterators. Iterator subtraction is perfectly valid according to the standard, as long as you have two valid iterators (one reachable from the other). At the end of the day, regardless of what is happening underneath and having done nothing to invalidate either iterator (according to the standard), I should have two valid iterators that I can perform subtraction on (or any other operation for that matter).

Honestly, I'm not trying to offend anyone. Some, not all, of my questions were answered. Hence the grade of C. I just didn't feel satisfied as to why this didn't work. According to the standard, as well as jkr's comment about calling non-const operations on a container, I could not have possibly invalidated the iterators in this way. As per the comments regarding reorganizing the heap, or using const_cast, they seemed uncertain.

If any more light could be shed on this topic, I'd be happy to put in a request to change the grade. Also, my apologies for letting this question get away from me.
Thanks for your feedback.

Since I have already outed myself as a nitpicker, the damage is already done and I'm going to follow through.  No, I'm not trying to offend anyone either ;-).  (This is the first time I'm not satisfied with a grade. )

Quote from the EE help section "What's the right grade to give?":
> [...] a "C" is the lowest grade you can give[...]
> You may not like the answer you get, and in some cases, and you may not like the way it is delivered, but if it is deemed to be accurate, no less than a B is an acceptable grade.

Please be specific. Which questions have not been answered?

The question "what can invalidate an iterator?" is not part of your original question.  Answering it may be grounds for awarding an A.  However, not answering it should not lead to a C.

>> Using const is a promise.
>> The standard does not forbid this (but it also may not guaratee that this would work, not sure here).
> That's kind of fuzzy, [...]

Fuzzy in what way?  By the way, "kind of" is also fuzzy (could not resist that one ;-).
The standard simply does not define what happens when you do this in the general case.  
When you use const, the standard requires ("promises") that the compiler does not let you call non-const operations -- unless you use const_cast, which is a standard-compliant way to drop that requirement.  With "not sure", I meant that I do not know which if any guarantees the standard gives as to when casting away constness and calling non-const must work.
Unless the string is stored in the program's text segment (read-only), I expect using a const_cast and calling a non-const function to work.  However, that may invalidate iterators (depends on the non-const function called).

By the way, in your question title, you use "c_str", never to use it again.  In what way is your question about c_str?
 
>>The question "what can invalidate an iterator?" is not part of your original question.  Answering it may be grounds for awarding an A.  >>However, not answering it should not lead to a C.

That's fair. You have a point. I'll put in a request for change.

>>Fuzzy in what way?  By the way, "kind of" is also fuzzy (could not resist that one ;-).

Hehehe. By fuzzy I meant your confidence in your answer didn't sound 100%. I agree w/ what you're saying. The standard says that calling non-const operations COULD invalidate an iterator (not must). On my end at least, I can guarantee that I'm not doing this. As for the specific implementation using const_cast internally, I can't be responsible for that (as jkr pointed out). I would think that the (hypothetical) coder who decided to throw caution to the wind and use const_cast whilst implementing the stl, would go to whatever lengths one had to in order to ensure iterators are not invalidated. Otherwise, his/her implementation would not be standard compliant, correct?

This discussion also seems to imply that iterators (even const_iterators) are highly volitile and should never be trusted (X-Files style)!

Sorry for the c_str in the topic, I forget now why I included it. I think as a way to refer to the underlying memory, I can't remember :)
> use const_cast whilst implementing the stl, would [...] ensure iterators are not invalidated. Otherwise, his/her implementation would not be standard compliant, correct?

Yes.  Not that this helps in practice.  There is always one more bug left.  AFAIK, there is no implementation yet that even claims 100% compliance.  If in doubt, I write a test program to isolate and understand the problem.  Same goes for the C library, even such functions as printf.
Note also that probably not all of the code you are calling is from the standard library.

> iterators (even const_iterators) are highly volatile
const_iters are exactly as volatile as iters, they just don't allow non-const operations on the referred-to objects.
"exactly as volatile" means that the same set of operations on the container invalidates the iterator.

Since we've now spent considerable time on this question, I would be interested to hear what your bug was and how you found it.
Sure thing. I'll post later on today or tomorrow regarding the specific bug. It is pretty interesting.

Justin
Dan - Thanks, I'll post in the proper place next time!

sonstkeiner - Here's the specifics on the bug. I think I've included all the relevant code. Basically the context is, we're linking with tcl libraries in our program and I wanted to use their regular expression calls - but in an object oriented way. So I wrote wrappers around them. I know there are already libraries that do this, i.e. Boost, but suffice it to say, we can't use them. So I wrote my wrapper to look/behave like boost. The function definition for regex_search and part of the class definitions for sub_match and matches are shown below. I've also indicated the point at which the bug was discovered. You would think that it would be ok as to the "const-ness" of how everything was being passed around - or at least we did, thus spurring this discussion :).

...
string subBuffer("");
...

XSRegEx<TclRegEx> __exp("^\\s*(\\})");
XSRegEx<TclRegEx>::result_type matches;

//matches is filled by regex_search
if(!__exp.regex_search(subBuffer, matches))
{
    bool done = false, __isCmd = true;
    ...
}
else
{
    buf += '}';
    size_t brace_pos = matches[1].first - subBuffer.begin();        <---- was returning bogus pos b/c matches[1].first was no longer valid
    stream.seekg(brace_pos - subBuffer.length(), ios::cur);
}

bool TclRegEx::regex_search(const string& s, result_type& matches, int flags)
{
    return regex_search(s.begin(), s.end(), matches, flags);
}

bool TclRegEx::regex_search(string::const_iterator beg, string::const_iterator end,
                                      result_type& matches, int flags)
{
    ...
    sub_match sub;
    ...
    if(sub.matched)
    {
      sub.first  = beg + info.matches[i].start;
      sub.second = beg + info.matches[i].end;
    }
}

class sub_match {
private:
    sub_match();
public:
    string str() const;

    bool matched;
    string::const_iterator first;
    string::const_iterator second;
    ...
}

class match_results {
 public:
    sub_match operator[] (int index);        //subscripts m_subs
    void subs(const vector<sub_match> & match_info);
    size_t size() const;
    void clear();
   
 private:
    vector<sub_match> m_subs;
};
Thanks for the bug escription (and the grade change, anyway).
Looks like that is a FMM with regexp libraries, using the Java ORO library, I once ran into it , too.
What's FMM?

Justin
Frequently Made Mistake (:-)
Ahhhh. I see. I wouldn't have gotten that :)

Thanks to you and everyone else for the assisstance.

Justin