Solved

Python proc vs thread

Posted on 2013-12-28
4
554 Views
Last Modified: 2013-12-29
In python - what is the difference between a process, a child process (that has been forked), the output of a fork-exec and a thread?  Actually I get the difference between the first two  - (I think) that when forking you wind up with a new process that is a virtual copy of an existing parent process only running separately with a new PID.  But I lost the trail when they got into fork-exec and then the thread concept.
0
Comment
Question by:amigan_99
  • 2
  • 2
4 Comments
 
LVL 16

Accepted Solution

by:
gelonida earned 500 total points
ID: 39744551
In the first part I give the non Python specific answer (assuming you run on a Linux like OS) in the second part I'll try to tackle Python specific details (GIL, . . .)

1.)
creating a sub process with identical code/data is done by doing a fork()
- fork() duplicates the current process (the Python interpreter) with all its
   code / data (variables, . . .)
   The only difference between the new and the old process is the return value of the
   fork() call, which allows the two process to be distinguished, all variables have the
   same value, but if changed in one process, they will not be changed in the other()

- fork() / exec() is the Unix way of creating a new child process()
   the new forked child process will immediately exec anonther executyable. (load and
  execute some completely different code, which could be again a python program
  or something completely different. so basically all code / data of the child process
  will be overwritten

- creating a thread is like creating a new process with fork(), BUT both new processes will
  refer to the SAME data, so if one process modifies some data it will be changed for the
  otherthread (process) as well.


2.) Now to Python:
- fork()/exec(). If you would exec another python program, you would re-initialize a completely new Python interpreter, reload/import all the .py (or.pyc) files and run the code from the beginning.
 The Python module to use for creating child processes is the module subprocess.
 and the command would be subprocess.Popen() or some of the helpers simplifying this.

-fork() duplicates the process, nothing had to be reinitialized and both processes can now live their own life.
 A Python module, which is roughly using fork() to create new processes is the module multiprocessing. Multiprocessing will also work under windows, though the cost of creating another process would more or less have the cost of a fork()/exec() as windows doesn't really implement a fork() but for you as developer the code stays platform independent and creating a subprocess will be as fast as possible depending on the platform.

- with threading:
Threading is very powerful and nice, but has some issues. As two processes access the same data and not all modifications to Python variables are atomic you have to use a lot of tools to protect your code like mutexes ( threading.Lock()s ) not writing to the same file from different threads nor use many other shared exclusive resources at the same time.

If unexperienced it's easy to write code. which behaves differently than initially expected.

Another drawback only specific to Python is the GIL (global interpreter lock).
which means that due to an implementation limitation of Python multiple threads cannot execute Python byte code at the same time.

So if you have a pure CPU limited Python program on a machine with multiple cores (CPUs) you will see, that with multiprocessing both python programs will run in parallel and both your CPUs will be loaded, whereas with threading you'll be unable to benefit from the other CPU.

Depending on you want to do in your thread this may not be a problem, as:
- python has a lot of modules, which call code from shared libraries (numPy, PIL, . . . ) which releases the GIL before calling). SO they would benefit from a multicore machine.
In many of my programs most of the CPU is consumed in C libraries being called by Python.

- often threads are not used to perform calculations in parallel, but more to structure code and most of the time most threads are sleeping and just woken up for a short time to perform a short task.


So to summarize:
Three Python modules:
- subprocessing for fork()/exec()
- multiprocessing for fork()
- threading for threads

- to benefit from a multicore machine and it's CPUs you should use multiprocessing ( fork() )
- in order to write code needing to access the same data you should use threading.
  A typical example would be a GUI, where have to do some task when clicking on a button (transferring data over the net, .  . .) should not freeze the GUI.
 or a program with one thread receivning data another one processing it and another one sending it (thoug such code could also be implemented withouth threading and select() or alike.


Interthread communication:
- you can use Locks() / wait() functions and Queues() and variables

Interprocess Communication
- common files (if protected with file locking)
- pipes
- sockets
- a database, which supports multiple processes. (sqlite, sql servers, . . .)

Hope this is the level of detail that you were lookiong for.
0
 
LVL 1

Author Closing Comment

by:amigan_99
ID: 39744847
Thank you very much!  A point of clarification - if you have a moment - can you explain "not all modifications to Python variables are atomic"?  I think of atomic from Democratis as a small indivisible particle.  Not sure how that applies to a Python variable.
0
 
LVL 16

Expert Comment

by:gelonida
ID: 39744988
OK: atomic operation in computer science means something like
"non interruptable by another thread." so an atomic operation would either not be executed or it wouild be completely executed.
Under no circujmstance it would be possible, that another thread interrupts an atomic operation while 'half'of the work is done.

you can refer to http://en.wikipedia.org/wiki/Atomic_operation

By the way it seems, that my above statement is wrong. some googling seems to indicate, that Python variable assignments are atomic operations.

However commands like
a += 1
a, b = b, a

would not be and modifying several members of an object wouldn't thus normally
Locks should be used in order to avoid, that an object is in an inconsistent state before being accessed by another thread.
0
 
LVL 1

Author Comment

by:amigan_99
ID: 39745128
Thank you again very much.
0

Featured Post

What Should I Do With This Threat Intelligence?

Are you wondering if you actually need threat intelligence? The answer is yes. We explain the basics for creating useful threat intelligence.

Join & Write a Comment

Here I am using Python IDLE(GUI) to write a simple program and save it, so that we can just execute it in future. Because when we write any program and exit from Python then program that we have written will be lost. So for not losing our program we…
The purpose of this article is to demonstrate how we can use conditional statements using Python.
Learn the basics of strings in Python: declaration, operations, indices, and slicing. Strings are declared with quotations; for example: s = "string": Strings are immutable.: Strings may be concatenated or multiplied using the addition and multiplic…
Learn the basics of if, else, and elif statements in Python 2.7. Use "if" statements to test a specified condition.: The structure of an if statement is as follows: (CODE) Use "else" statements to allow the execution of an alternative, if the …

708 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question

Need Help in Real-Time?

Connect with top rated Experts

13 Experts available now in Live!

Get 1:1 Help Now