Solved

C++ Standard question: guarantees on string's iterators and c_str

Posted on 2004-08-04
21
415 Views
Last Modified: 2013-12-14
First off, if you look at the address of the return value for the begin() iterator for a string, what is that address actually pointing to? Is it the physical location of the beginning of the string in memory? Or is it the address of the iterator object itself?

I'm on a Solaris machine, and I have a situation where a CONST reference to a string is being passed to a fcn. Occassionaly, upon immediate return to the calling code, the return value of begin() has a different address.

My question is, what guarantee does the standard make w/ regard to what iterators physically point to?

As an example:

//some fcn...
{
       string a = "lasjfaskd";

       string::const_iterator itr = a.begin();
       cout << &itr << endl;

       someOtherFcn(a);

       string::const_iterator itr2 = a.begin();
       cout << &itr2 << endl;
//only sometimes, these two statements will output different addresses
}

void someOtherFcn(const string& s) {
//essentially nothing is happening here, nothing that can have any affect on s - since it's const
       return;
}

What guarantees does the standard make w/ regard to this situation?
Thanks!

Justin
0
Comment
Question by:GrayGh0st
  • 9
  • 6
  • 4
21 Comments
 
LVL 86

Expert Comment

by:jkr
ID: 11720530
Though shalst not rely on what STL implementations do internally.

>>what guarantee does the standard make w/ regard to what iterators physically point to?

None. They are iterators. One implementation might use a char* directly for that another might use an opaque data structure.

>>nothing that can have any affect on s - since it's const

What about rearranging memory b/c of reorganizing the heap?
0
 

Author Comment

by:GrayGh0st
ID: 11720666
True... but what then can invalidate an iterator?
0
 

Author Comment

by:GrayGh0st
ID: 11720721
Correction to my above post: I know that normal things like erasing for example, will invalidate it. But I can guarantee none of that is happening in my code.

Justin
0
 
LVL 86

Accepted Solution

by:
jkr earned 125 total points
ID: 11720747
>>but what then can invalidate an iterator?

One thing that for sure invalidates iterators are non-const operations on a container.

BTE, since you are using

      cout << &itr << endl;
      cout << &itr2 << endl;

you should not be surprised about that, since the two iterators are stored in two different locations and you are outputting their *addresses*. Try

      cout << (int)itr << endl;
      cout << (int)itr2 << endl;

instead...


0
 

Author Comment

by:GrayGh0st
ID: 11720798
Sorry, I should have clarified. I'm not really using those cout statements in my code. That was just in the example. I was just trying to point out that the addy's are different when I look at them under the debugger - but didn't want people reading this to go through all of that.

>> One thing that for sure invalidates iterators are non-const operations on a container.

If I'm passing that string into that function as a const&, it can't possibly be allowing non-const operations on it.

Justin
0
 
LVL 2

Assisted Solution

by:sonstkeiner
sonstkeiner earned 125 total points
ID: 11733956
> What guarantees does the standard make w/ regard to this situation?

I guess the standard makes no guarantee here.
Using const is a promise.  Promises can be broken.  Especially in C++ (;-).
For example, someOtherFcn might internally use const_cast<string &>(s) to manipulate s, for whatever reason.
The standard does not forbid this (but it also may not guaratee that this would work, not sure here).

> the addy's are different when I look at them under the debugger
And they should be, as jkr alread said.
The two iterators are different objects, and therfore should have different addresses, that much is clear.

> //only sometimes, these two statements will output different addresses
Are you sure that the addresses of itr and itr2 have ever been the same?

You may however expect them to have the same value.
In the dbx debugger, something like 'print (int)itr' and 'print (int)itr2' should therefore yield the same values.
0
 
LVL 86

Expert Comment

by:jkr
ID: 11946306
I'd say the question is answered...
0
 
LVL 86

Expert Comment

by:jkr
ID: 11948132
Thanks, but why a "C"? I am fully aware that this is not answered "A" from your point of view, but...
0
 
LVL 2

Expert Comment

by:sonstkeiner
ID: 11949949
"C" is not appropriate here.  You have originally posed three questions, all of which have been answered.

Q 1:
GrayGh0st> [...]  if you look at the address of the return value for the begin() iterator for a string, what is that address actually pointing to? [...] Or is it the address of the iterator object itself?
A:
jkr> [...] the two iterators are stored in two different locations and you are outputting their *addresses*.

Q 2:
GrayGh0st> [...]what guarantee does the standard make w/ regard to what iterators physically point to?
A:
jkr> None.

Q 3:
GrayGh0st> What guarantees does the standard make w/ regard to this situation?

The "situation" here appears to be defined by your two comments, which follow:

Q 3.a):
GrayGh0st> //only sometimes, these two statements will output different addresses
A:
sonstkeiner> Are you sure that the addresses of itr and itr2 have ever been the same?

I bet that without optimization, the addresses are always different.

Q 3.b)
GrayGh0st> //essentially nothing is happening here, nothing that can have any affect on s - since it's const
A:
jkr> What about rearranging memory b/c of reorganizing the heap?
sonstkeiner> [...] someOtherFcn might internally use const_cast<string &>(s) to manipulate s [...]. The standard does not forbid this.
0
IT, Stop Being Called Into Every Meeting

Highfive is so simple that setting up every meeting room takes just minutes and every employee will be able to start or join a call from any room with ease. Never be called into a meeting just to get it started again. This is how video conferencing should work!

 

Author Comment

by:GrayGh0st
ID: 11953374
Honestly, not meaning to offend anyone, but I didn't feel like my question was completely answered.

W/ your help I do realize that outputting &itr and &itr2, will return different address - and rightly so. This was established in jkr's second post and I didn't think it needed further clarification. This arose from me printing the iterator in dbx. The output was a memory address and also the string to which the iterator was pointing. I mistakenly thought the addy was that of the iterator, and not the underlying memory.

>>You may however expect them to have the same value.

I would.

>>In the dbx debugger, something like 'print (int)itr' and 'print (int)itr2' should therefore yield the same values.

They didn't (sometimes) - the root of the problem.

I then clarified my question:

>> True... but what then can invalidate an iterator?

I researched on my own, and found that I was not performing any operation that could even possibly invalidate an iterator (plus having used const), as I mentioned in earlier posts.

>> Using const is a promise.  Promises can be broken.  Especially in C++ (;-).
>> For example, someOtherFcn might internally use const_cast<string &>(s) to manipulate s, for whatever reason.
>> The standard does not forbid this (but it also may not guaratee that this would work, not sure here).

That's kind of fuzzy, but as jkr said:

>> Though shalst not rely on what STL implementations do internally.

Not that it should have any bearing on this question, but I found the bug whilst subtracting iterators. Iterator subtraction is perfectly valid according to the standard, as long as you have two valid iterators (one reachable from the other). At the end of the day, regardless of what is happening underneath and having done nothing to invalidate either iterator (according to the standard), I should have two valid iterators that I can perform subtraction on (or any other operation for that matter).

Honestly, I'm not trying to offend anyone. Some, not all, of my questions were answered. Hence the grade of C. I just didn't feel satisfied as to why this didn't work. According to the standard, as well as jkr's comment about calling non-const operations on a container, I could not have possibly invalidated the iterators in this way. As per the comments regarding reorganizing the heap, or using const_cast, they seemed uncertain.

If any more light could be shed on this topic, I'd be happy to put in a request to change the grade. Also, my apologies for letting this question get away from me.
0
 
LVL 2

Expert Comment

by:sonstkeiner
ID: 11957858
Thanks for your feedback.

Since I have already outed myself as a nitpicker, the damage is already done and I'm going to follow through.  No, I'm not trying to offend anyone either ;-).  (This is the first time I'm not satisfied with a grade. )

Quote from the EE help section "What's the right grade to give?":
> [...] a "C" is the lowest grade you can give[...]
> You may not like the answer you get, and in some cases, and you may not like the way it is delivered, but if it is deemed to be accurate, no less than a B is an acceptable grade.

Please be specific. Which questions have not been answered?

The question "what can invalidate an iterator?" is not part of your original question.  Answering it may be grounds for awarding an A.  However, not answering it should not lead to a C.

>> Using const is a promise.
>> The standard does not forbid this (but it also may not guaratee that this would work, not sure here).
> That's kind of fuzzy, [...]

Fuzzy in what way?  By the way, "kind of" is also fuzzy (could not resist that one ;-).
The standard simply does not define what happens when you do this in the general case.  
When you use const, the standard requires ("promises") that the compiler does not let you call non-const operations -- unless you use const_cast, which is a standard-compliant way to drop that requirement.  With "not sure", I meant that I do not know which if any guarantees the standard gives as to when casting away constness and calling non-const must work.
Unless the string is stored in the program's text segment (read-only), I expect using a const_cast and calling a non-const function to work.  However, that may invalidate iterators (depends on the non-const function called).

By the way, in your question title, you use "c_str", never to use it again.  In what way is your question about c_str?
 
0
 

Author Comment

by:GrayGh0st
ID: 11958146
>>The question "what can invalidate an iterator?" is not part of your original question.  Answering it may be grounds for awarding an A.  >>However, not answering it should not lead to a C.

That's fair. You have a point. I'll put in a request for change.

>>Fuzzy in what way?  By the way, "kind of" is also fuzzy (could not resist that one ;-).

Hehehe. By fuzzy I meant your confidence in your answer didn't sound 100%. I agree w/ what you're saying. The standard says that calling non-const operations COULD invalidate an iterator (not must). On my end at least, I can guarantee that I'm not doing this. As for the specific implementation using const_cast internally, I can't be responsible for that (as jkr pointed out). I would think that the (hypothetical) coder who decided to throw caution to the wind and use const_cast whilst implementing the stl, would go to whatever lengths one had to in order to ensure iterators are not invalidated. Otherwise, his/her implementation would not be standard compliant, correct?

This discussion also seems to imply that iterators (even const_iterators) are highly volitile and should never be trusted (X-Files style)!

Sorry for the c_str in the topic, I forget now why I included it. I think as a way to refer to the underlying memory, I can't remember :)
0
 
LVL 2

Expert Comment

by:sonstkeiner
ID: 11961120
> use const_cast whilst implementing the stl, would [...] ensure iterators are not invalidated. Otherwise, his/her implementation would not be standard compliant, correct?

Yes.  Not that this helps in practice.  There is always one more bug left.  AFAIK, there is no implementation yet that even claims 100% compliance.  If in doubt, I write a test program to isolate and understand the problem.  Same goes for the C library, even such functions as printf.
Note also that probably not all of the code you are calling is from the standard library.

> iterators (even const_iterators) are highly volatile
const_iters are exactly as volatile as iters, they just don't allow non-const operations on the referred-to objects.
"exactly as volatile" means that the same set of operations on the container invalidates the iterator.

Since we've now spent considerable time on this question, I would be interested to hear what your bug was and how you found it.
0
 

Author Comment

by:GrayGh0st
ID: 11998408
Sure thing. I'll post later on today or tomorrow regarding the specific bug. It is pretty interesting.

Justin
0
 

Author Comment

by:GrayGh0st
ID: 12017028
Dan - Thanks, I'll post in the proper place next time!

sonstkeiner - Here's the specifics on the bug. I think I've included all the relevant code. Basically the context is, we're linking with tcl libraries in our program and I wanted to use their regular expression calls - but in an object oriented way. So I wrote wrappers around them. I know there are already libraries that do this, i.e. Boost, but suffice it to say, we can't use them. So I wrote my wrapper to look/behave like boost. The function definition for regex_search and part of the class definitions for sub_match and matches are shown below. I've also indicated the point at which the bug was discovered. You would think that it would be ok as to the "const-ness" of how everything was being passed around - or at least we did, thus spurring this discussion :).

...
string subBuffer("");
...

XSRegEx<TclRegEx> __exp("^\\s*(\\})");
XSRegEx<TclRegEx>::result_type matches;

//matches is filled by regex_search
if(!__exp.regex_search(subBuffer, matches))
{
    bool done = false, __isCmd = true;
    ...
}
else
{
    buf += '}';
    size_t brace_pos = matches[1].first - subBuffer.begin();        <---- was returning bogus pos b/c matches[1].first was no longer valid
    stream.seekg(brace_pos - subBuffer.length(), ios::cur);
}

bool TclRegEx::regex_search(const string& s, result_type& matches, int flags)
{
    return regex_search(s.begin(), s.end(), matches, flags);
}

bool TclRegEx::regex_search(string::const_iterator beg, string::const_iterator end,
                                      result_type& matches, int flags)
{
    ...
    sub_match sub;
    ...
    if(sub.matched)
    {
      sub.first  = beg + info.matches[i].start;
      sub.second = beg + info.matches[i].end;
    }
}

class sub_match {
private:
    sub_match();
public:
    string str() const;

    bool matched;
    string::const_iterator first;
    string::const_iterator second;
    ...
}

class match_results {
 public:
    sub_match operator[] (int index);        //subscripts m_subs
    void subs(const vector<sub_match> & match_info);
    size_t size() const;
    void clear();
   
 private:
    vector<sub_match> m_subs;
};
0
 
LVL 2

Expert Comment

by:sonstkeiner
ID: 12220509
Thanks for the bug escription (and the grade change, anyway).
Looks like that is a FMM with regexp libraries, using the Java ORO library, I once ran into it , too.
0
 

Author Comment

by:GrayGh0st
ID: 12220769
What's FMM?

Justin
0
 
LVL 2

Expert Comment

by:sonstkeiner
ID: 12235824
Frequently Made Mistake (:-)
0
 

Author Comment

by:GrayGh0st
ID: 12236801
Ahhhh. I see. I wouldn't have gotten that :)

Thanks to you and everyone else for the assisstance.

Justin
0

Featured Post

How to run any project with ease

Manage projects of all sizes how you want. Great for personal to-do lists, project milestones, team priorities and launch plans.
- Combine task lists, docs, spreadsheets, and chat in one
- View and edit from mobile/offline
- Cut down on emails

Join & Write a Comment

Jaspersoft Studio is a plugin for Eclipse that lets you create reports from a datasource.  In this article, we'll go over creating a report from a default template and setting up a datasource that connects to your database.
How to install Selenium IDE and loops for quick automated testing. Get Selenium IDE from http://seleniumhq.org (http://seleniumhq.org) Go to that link and select download selenium in the right hand columnThat will then direct you to their downlo…
The goal of the tutorial is to teach the user how to use functions in C++. The video will cover how to define functions, how to call functions and how to create functions prototypes. Microsoft Visual C++ 2010 Express will be used as a text editor an…
The goal of the video will be to teach the user the difference and consequence of passing data by value vs passing data by reference in C++. An example of passing data by value as well as an example of passing data by reference will be be given. Bot…

757 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

23 Experts available now in Live!

Get 1:1 Help Now