Link to home
Start Free TrialLog in
Avatar of JRPrakash
JRPrakash

asked on

pthread, thread prints the same process id of mail thread.

Hello,
I have the following code snippet

#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
void* t_function (void* arg)
{
fprintf (stdout, “child->  thread pid is %d\n”, (int) getpid ());

while (1);
return NULL;
}
int main ()
{
pthread_t thread_id;
fprintf (stdout , “main->  thread pid is %d\n”, (int) getpid ());
pthread_create (&thread_id, NULL, &t_function, NULL);


while (1);
return 0;
}


Here main, and  child thread both prints the pid  same value. How is that possible ?. As for as i know
in linux, a thread is as like as individual process, but only shares the process address space?.
I m trying to run in ubuntu linux kernel 2.6.   Any suggessions ??
ASKER CERTIFIED SOLUTION
Avatar of Infinity08
Infinity08
Flag of Belgium image

Link to home
membership
This solution is only available to members.
To access this solution, you must be a member of Experts Exchange.
Start Free Trial
Avatar of JRPrakash
JRPrakash

ASKER

infinity08, Thanks for the reply. But what i dont understand is in linux, it is mentioned that it creates a new process while thread is created. I was reading advanced linux programming, where in,  the same code snippet would produce different pids for all the threads. Unfortunately my sample code doesn't produce different pids. How is that possible ?
You can use the command:
ps -aLf
to see the thread id. 4th column in the listing shows the thread id and it is under the heading LWP (Light Weight Process).
Also, there is supposed to be a system call gettid().
I tried it on my system - my libc doesn't export this system call.

If it doesn't work, then you can directly use syscall().
Here is your modified code:
#include <pthread.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/syscall.h>

pthread_t thread_id;

void* t_function(void* arg)
{
    printf("thread_id = %d\n", syscall(SYS_gettid));

    while (1);
    return NULL;
}

int main ()
{
    printf("parent->  thread pid is %d\n", syscall(SYS_gettid));
    pthread_create (&thread_id, NULL, &t_function, NULL);

    while (1);
    return 0;
}

On my system, this prints:
~/test/ # ./a.out
parent->  thread pid is 4687
thread_id = 4688

And the ps command prints:
~/ # ps -aLf | grep a.out
root      4687  3423  4687 47    2 12:07 pts/3    00:00:12 ./a.out
root      4687  3423  4688 44    2 12:07 pts/3    00:00:12 ./a.out

So, in the 3rd column you can see the thread id (which is same as pid for the main thread).
And this is what is printed when I execute ./a.out
>> infinity08, Thanks for the reply. But what i dont understand is in linux, it is mentioned that it creates a new process while thread is created.

Technically, pthread_create does indeed start a new process. However, since the child process shares so much with the parent process, you can no longer really call it a process without causing confusion, and calling it a thread is more accurate.

The old LinuxThreads implementation of pthreads did indeed have separate process id's for the threads. However, the more recent NPTL implementation (the currently prevalent pthreads implementation on Linux) has improved a lot in that respect, and gives pthread_create a behavior that more closely matches what you'd expect from a thread (in terms of overhead etc.).
Still the thread implementation in Linux is called LWP - Light Weight Process.
It uses clone() system call to create a thread.
clone() actually creates a process, but shares some of the things with its parent process, such as the memory space, the table of file descriptors, and the table of signal handlers.

The main difference between clone() and fork() is:
clone() uses a function that is passed as a function pointer from the caller and returns.
In case of fork(), the execution continues from the point where fork() is called.
Are you sure, each thread has its own task_struct?. Because According to me, if it doesnot have it will not get scheduled.  If it has task_struct, why it doesn't have its own pid ?.
Yes. Since threads are also processes in Linux, they will have their own pid.
But, they are implemented as light weight processes and hence the ps o/p will not show the pid of threads.
But, you can see the pid by using the command ps -aLf and I have already discussed in comment# 35783400, how to see the pid of a thread. Since, in general threads are not called processes, these ids are called as thread ids or tid.
Whatever the name you give for threads, Linux implements them as "Light Weight Processes".
>> Are you sure, each thread has its own task_struct?

Indeed they have. But that's an implementation detail, and we're moving away from the conceptual model now.

On the kernel level, threads indeed act very much like processes, because that allows easy sharing of resources, and re-use of the existing scheduler, etc. But that's just to make the implementation more convenient.

On the user level however, the behavior of threads is quite different from that of processes, and unless you specifically need to, I would ignore the low-level implementation details, and rather focus on the conceptual idea of a thread versus a process.


>> If it has task_struct, why it doesn't have its own pid ?.

There is indeed a separate task_struct for every thread, BUT all task_structs for a given process are linked together with one single process id.


So, to get back to the original question (and repeat what I've said initially), getpid will return the process id, which will be the same for all threads of a given process, and pthread_self will return the thread id, which will be different for all threads.


The behavior you described in http:#35783277 can be reproduced by running the code on a system that still uses the old LinuxThreads implementation of pthreads.
Oh, and just for clarity :

>> But that's an implementation detail, and we're moving away from the conceptual model now.

When I was talking about the "conceptual model", I was of course referring to what the POSIX standard describes. Because in the end, that's all that matters. How they decided to implement the POSIX specifications in Linux (or on any other platform) is only relevant to a programmer using pthreads in very rare situations.
Infinity08,
Nice Explanation, You got the logic of my question. Let me conclude from your discussion points:  
1) All threads have its own task_struct, with the pid of the main thread, but different thread ids.
2) Thread can be scheduled, as of course it has its own task_struct.

Let me know if my understand is correct or not ?
SSKUmar, Infinity08, thanks for the explanations!!. Well, Got triggered into other question, NPLT, and Linux Thread both creates unique task_struct for threads, and only diference is in Linux Thread, thread will have different pid from main thread, where as in NPLT, the threads will have the same pid of main thread. But, Is it the only difference that we have between these two ?..
>> Let me know if my understand is correct or not ?

That's pretty much what it comes down to, yes. Note however that the pid member in the task_struct of a thread is NOT the process id - it is re-used as the thread id. In a task_struct of a thread, the process id is confusingly stored in the tgid member heh (and that's what getpid returns).

But that's again an implementation detail, and just happens to be the way it's implemented today. That can all change in a next version of Linux of course heh.


>> But, Is it the only difference that we have between these two ?..

No, there are quite a bit more differences/improvements between the old LinuxThreads implementation of pthreads, and the new NPTL implementation of pthreads. The man page for pthreads contains a quick overview of the main differences :

        http://linux.die.net/man/7/pthreads
This question has been classified as abandoned and is closed as part of the Cleanup Program. See the recommendation for more details.