Learn how to a build a cloud-first strategyRegister Now

x
  • Status: Solved
  • Priority: Medium
  • Security: Public
  • Views: 3608
  • Last Modified:

segmentation fault getline and string

I am trying to read zcat output through popen in c++.  It works perfectly Ok, if I don't use C++ string.  As soon as I declare "string a;", I get segmentation fault at getline.  Segmentation fault goes away if I change getline with fgets.  I need both.  I am using g++4.0.  

Second problem is that I get the last line twice when I make it work with cstring.

#include<iostream>
#include<string>
#include<cstdio>

//using namespace std;
using std::string;
const string CMD = "gunzip -c ";

const int MAX_BUFFER = 2048;
int main()
{
    string filename, cmd;
    FILE *fp;
    char *line;
    //char line[MAX_BUFFER];
    size_t len = 2048;
    filename = "/path/file.log.gz";

    cmd = CMD + filename;

    fp = popen( cmd.c_str() , "r");
    
    while(!feof(fp))
    {
        if(::getline(&line, &len, fp))
        //if(fgets(line, MAX_BUFFER, fp) != NULL)
        {
            printf("%s", line);
        }
    }

    return 0;

}

Open in new window

0
farzanj
Asked:
farzanj
  • 9
  • 8
  • 4
4 Solutions
 
evilrixSenior Software Engineer (Avast)Commented:
line is just a pointer and you are passing the address of line to getline. That will most certainly cause failure. The cannonical for using getline with string is...

#include <string>
#include <iostream>

string s;
while(getline(cin, s)) // note this is not the same as the char * version of getline
{
   // do stuff
}

Open in new window


Note: Do NOT prefix getline with :: as this prevents the compiler from using ADL (Argument Dependent Lookup) to find the correct version of getline. You should never need to do this.

There are two different versions of getline. One for char * and one for string. You MUST use the correct version. If you follow my previous note you can leave it up to the compiler to figue it out.

string version: http://www.cplusplus.com/reference/string/getline/
char * version: http://www.cplusplus.com/reference/iostream/istream/getline/

Also, line 7 is dangerous. A string can throw on construction. If it does you can't catch it and your application will just terminate for reasons that may not be apparent.
0
 
farzanjAuthor Commented:
Thanks for your response but I could not make it work and I do not understand the reason either.

First I have to read from popen  pipe and I don't think there is a c++ version of it.  It can only read into a c string.  I am not reading from STDIN, so the version of getline I used is the C - getline.  C++ getline version doesn't read from FILE* (as far as I know).

Also you can see that there are two versions of line, one is commented out.  This is because C- getline only appears to be accepting pointer and works fine with it but fgets takes a buffer so I have that as well.

Thing that I don't understand is that I DO NOT get a segmentation fault when I don't declare a C++ string.  The moment I just declare the string, I get segmentation fault  even if I don't use it.

Thanks for your time.
0
 
farzanjAuthor Commented:
TO illustrate what I mean

The following code works perfectly (almost except for repeating last line).
#include<iostream>
#include<string>
#include<cstdio>

//using namespace std;
using std::string;

const int MAX_BUFFER = 2048;
int main()
{
    FILE *fp;
    char *line;
    //string a;
    size_t len = 2048;

    fp = popen("gunzip -c /path/file.txt.gz" , "r");
    while(!feof(fp))
    {
        if(getline(&line, &len, fp))
        {
            printf("%s", line);
        }
    }
    return 0;
}

Open in new window

0
VIDEO: THE CONCERTO CLOUD FOR HEALTHCARE

Modern healthcare requires a modern cloud. View this brief video to understand how the Concerto Cloud for Healthcare can help your organization.

 
evilrixSenior Software Engineer (Avast)Commented:
>> First I have to read from popen  pipe and I don't think there is a c++ version of it.
No, there's not.

>> C++ getline version doesn't read from FILE* (as far as I know).
There isn't a C version of getline -- you should use fgets.
http://www.cplusplus.com/reference/clibrary/cstdio/fgets/

>> This is because C- getline only appears to be accepting pointer
You are probably calling the GNU function getline, which isn't part of the C standard.
http://www.gnu.org/software/libc/manual/html_node/Line-Input.html

This does require a char ** for line but line must refer to either a buffer or a pointer that points to a buffer. If it points to null the buffer will be allocated for you. You are responsible for freeing that buffer.

This is NOT a standard C function - it is a GNU extention. It is not portable.

>> Thing that I don't understand is that I DO NOT get a segmentation fault when I don't declare a C++ string

Well, that's what happens when you corrupt memory. The results are undefined. There is nothing to say the program will crash or not. It might even run as you expect. Defining (when memory is being allocated it is defining, declaring is when you just let the compiler know somewhere this variable exists) a new variable changes the memory layout (it is adding more stuff to the stack-frame) and so introduces different behaviour. Random behaviour like this is a sure sign of memory corruption.
0
 
evilrixSenior Software Engineer (Avast)Commented:
>> The following code works perfectly
See my previous comment - it works by chance, not by design. The C/C++ standards state that the result of corrupting memory in this way is "undefined" - meaning anything can happen... it might even work as you expected it to but that is by pure luck.
0
 
farzanjAuthor Commented:
Why in http:#a38328896, ONLY when I uncomment line 13 I get segmentation fault.
0
 
evilrixSenior Software Engineer (Avast)Commented:
Because you are declaring a local variable on the stack. The memory corruption you are creating is corrupting the stack. The definition of an additional variable changes the stack and, thus, the behaviour.
0
 
farzanjAuthor Commented:
Ok.  I did use this:
line = new char[2048];

And segmentation fault disappeared, confirming your comments.  I did understand this but was not using it because of this example:
http://www.tin.org/bin/man.cgi?section=3&topic=getline

How do I do this without putting it on heap?  I want to declare a fixed array but then I would need to cast it to char*?

Second, I need one line at a time for my application.  I need getline to avoid manual work
0
 
evilrixSenior Software Engineer (Avast)Commented:
>> line = new char[2048];
What's wrong with line[2048] or line = NULL and letting GNU getline allocate the required buffer? Don't forget if you let getline allocate the buffer you must use free() to release it once you are done.

>> I want to declare a fixed array but then I would need to cast it to char*?
No a char array will automatically decompose to a char *

char line[2048];
char *pline = line; // no cast needed
0
 
farzanjAuthor Commented:
The problem was that I was expecting getline to allocate (malloc) memory but the getline I got was not what I was expecting.  I really got very rusty.  Hope to relearn C++ pretty fast.
0
 
Hugh McCurdyCommented:
farzanj, if you are still having problems, please post the current code.

I very much agree with evilrix about using fgets() over getline() but it's your program.
0
 
farzanjAuthor Commented:
hmccurdy: Thank you for your offer to help.  I do not doubt on fgets.  But I am writing a fast parser that has to parse hundreds of TB a day and I need to parse one line at a time and getline does exactly that.

If you have a little time, I would be honored if you could look at the other question I posted several days ago and did not get any response.  It is about gunzip and non of the C++ wrappers to zlib compile anymore.

http://www.experts-exchange.com/Programming/Languages/C/Q_27833199.html
0
 
Hugh McCurdyCommented:
I looked at the other thread.  Didn't answer it but I did comment.

As for this thread, shouldn't you be initializing line?  My understanding is that getline() only calls malloc() if the the first arg to getline points to null.  If line isn't initialized or pointing to valid memory you could get segmentation violations that don't seem to make any sense at all.

char *line = NULL;
0
 
Hugh McCurdyCommented:
Evilrix makes a good point about freeing memory.

Functions that invisibily use malloc() should be use cautiously in non-trivial programs.  (Trivial programs don't live long enough to have a memory leak be harmful as you are calling exit at the time you'd otherwise call free.)  Memory leaks can happen in C++ as well.

How far along are you in your C++ studies?
0
 
evilrixSenior Software Engineer (Avast)Commented:
>> But I am writing a fast parser
Then you most certainly want to avoid heap allocations. Stick to using a local stack based array. Heap allocations are very costly.

>> I need to parse one line at a time and getline does exactly that.
And why isn't fgets suitable for this purpose?

FWIW, I probably wouldn't use either. I'd probably use fread to read the data into blocks and then process those blocks in memory. Quite possible I'd do this using a multi-threaded solution that implements something along the lines of the producer/consumer pattern. I'd most likely use Boost asio to simplify the implementation.

If performance was critical I'd probably prototype solutions using all three (and maybe some other mechanisms) and then use a performance profiler to decide which is the best to use.
0
 
farzanjAuthor Commented:
Thank you so much for both of you and your precious time and valuable insight.

 >> And why isn't fgets suitable for this purpose?
I am not telling you rather asking your.  I though I will be manually doing the same process of finding new line characters manually.  Plus my bench marking results did not show too much cost of using getline as opposed to fgets.  Well, in attempting various things, I had even written some code just to get lines myself but I felt I was reinventing the wheel.

Right now I am doing a work around of not being able to read gz files.  I am opening pipe so the data is read from the drive and is already in the memory.  Then I am doing Boost threads to distribute the task of parsing to many threads and am using signals mechanism to pass data to threads in a round robin fashion.

Which performance profiler on Linux platform do you recommend?

Perhaps I can open another question for this discussion so that I can value experts' precious time.
0
 
Hugh McCurdyCommented:
You are right that a new thread should be used for a new question but one profiler is
valgrind
0
 
evilrixSenior Software Engineer (Avast)Commented:
>> I though I will be manually doing the same process of finding new line characters manually
No, it reads up to and including the newline.
http://www.cplusplus.com/reference/clibrary/cstdio/fgets/

"Reads characters from stream and stores them as a C string into str until (num-1) characters have been read or either a newline or the End-of-File is reached, whichever comes first.
A newline character makes fgets stop reading, but it is considered a valid character and therefore it is included in the string copied to str.
A null character is automatically appended in str after the characters read to signal the end of the C string."

>> Then I am doing Boost threads to distribute the task of parsing to many threads
That is quite possible counter productive. Multiple thread context switches combined with the possibility of invalidating read-ahead caching means you may not be gaining much, if anything. One thread for reading and one thread for processing is quite probably the optimal solution (implementing a producer/consumer pattern).

>> Which performance profiler on Linux platform do you recommend?
http://www.cs.utah.edu/dept/old/texinfo/as/gprof.html

>> Perhaps I can open another question for this discussion
I don't really mind either way as long as it's still related to the original question :)
0
 
farzanjAuthor Commented:
0
 
farzanjAuthor Commented:
>> How far along are you in your C++ studies?
I did BS and MS in Computer Science, MS in Mathematics from US and MEng in Networking from Canada.  I started programming in 1980s on XT but then programmed C++ in mid 90s to 2000.  In 2000 or so there were hardly any C++ programming jobs so I first tried becoming Oracle DBA and then finally spent few years as Linux admin.  I did some complicated programming projects in between with many different languages but no one knew I wasn't fluent with my languages but I built great algorithms so I am able to beat every one else's speed/ideas and I have already improved parsing by about 100 times in terms of speed and am aiming for more.
0
 
evilrixSenior Software Engineer (Avast)Commented:
>> I built great algorithms
Definately the most important skill. For example. Stephen Hawking would still be as brilliant as he is today even if he couldn't speak English. The language is just a means to an end. The language of choice doesn't really matter as long as you choose the right language for the job at hand. It certainly helps if you know a language well but it's not necessarily essential. For example, I am a complete Python novice and yet I probably write as much python as I do C++ where I work (although, when I can get away with it I opt for Perl, instead - much to the chargrin of my co-workers)
0

Featured Post

Free Tool: SSL Checker

Scans your site and returns information about your SSL implementation and certificate. Helpful for debugging and validating your SSL configuration.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

  • 9
  • 8
  • 4
Tackle projects and never again get stuck behind a technical roadblock.
Join Now