Python proc vs thread

Posted on 2013-12-28
Medium Priority
Last Modified: 2013-12-29
In python - what is the difference between a process, a child process (that has been forked), the output of a fork-exec and a thread?  Actually I get the difference between the first two  - (I think) that when forking you wind up with a new process that is a virtual copy of an existing parent process only running separately with a new PID.  But I lost the trail when they got into fork-exec and then the thread concept.
Question by:amigan_99
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
LVL 17

Accepted Solution

gelonida earned 2000 total points
ID: 39744551
In the first part I give the non Python specific answer (assuming you run on a Linux like OS) in the second part I'll try to tackle Python specific details (GIL, . . .)

creating a sub process with identical code/data is done by doing a fork()
- fork() duplicates the current process (the Python interpreter) with all its
   code / data (variables, . . .)
   The only difference between the new and the old process is the return value of the
   fork() call, which allows the two process to be distinguished, all variables have the
   same value, but if changed in one process, they will not be changed in the other()

- fork() / exec() is the Unix way of creating a new child process()
   the new forked child process will immediately exec anonther executyable. (load and
  execute some completely different code, which could be again a python program
  or something completely different. so basically all code / data of the child process
  will be overwritten

- creating a thread is like creating a new process with fork(), BUT both new processes will
  refer to the SAME data, so if one process modifies some data it will be changed for the
  otherthread (process) as well.

2.) Now to Python:
- fork()/exec(). If you would exec another python program, you would re-initialize a completely new Python interpreter, reload/import all the .py (or.pyc) files and run the code from the beginning.
 The Python module to use for creating child processes is the module subprocess.
 and the command would be subprocess.Popen() or some of the helpers simplifying this.

-fork() duplicates the process, nothing had to be reinitialized and both processes can now live their own life.
 A Python module, which is roughly using fork() to create new processes is the module multiprocessing. Multiprocessing will also work under windows, though the cost of creating another process would more or less have the cost of a fork()/exec() as windows doesn't really implement a fork() but for you as developer the code stays platform independent and creating a subprocess will be as fast as possible depending on the platform.

- with threading:
Threading is very powerful and nice, but has some issues. As two processes access the same data and not all modifications to Python variables are atomic you have to use a lot of tools to protect your code like mutexes ( threading.Lock()s ) not writing to the same file from different threads nor use many other shared exclusive resources at the same time.

If unexperienced it's easy to write code. which behaves differently than initially expected.

Another drawback only specific to Python is the GIL (global interpreter lock).
which means that due to an implementation limitation of Python multiple threads cannot execute Python byte code at the same time.

So if you have a pure CPU limited Python program on a machine with multiple cores (CPUs) you will see, that with multiprocessing both python programs will run in parallel and both your CPUs will be loaded, whereas with threading you'll be unable to benefit from the other CPU.

Depending on you want to do in your thread this may not be a problem, as:
- python has a lot of modules, which call code from shared libraries (numPy, PIL, . . . ) which releases the GIL before calling). SO they would benefit from a multicore machine.
In many of my programs most of the CPU is consumed in C libraries being called by Python.

- often threads are not used to perform calculations in parallel, but more to structure code and most of the time most threads are sleeping and just woken up for a short time to perform a short task.

So to summarize:
Three Python modules:
- subprocessing for fork()/exec()
- multiprocessing for fork()
- threading for threads

- to benefit from a multicore machine and it's CPUs you should use multiprocessing ( fork() )
- in order to write code needing to access the same data you should use threading.
  A typical example would be a GUI, where have to do some task when clicking on a button (transferring data over the net, .  . .) should not freeze the GUI.
 or a program with one thread receivning data another one processing it and another one sending it (thoug such code could also be implemented withouth threading and select() or alike.

Interthread communication:
- you can use Locks() / wait() functions and Queues() and variables

Interprocess Communication
- common files (if protected with file locking)
- pipes
- sockets
- a database, which supports multiple processes. (sqlite, sql servers, . . .)

Hope this is the level of detail that you were lookiong for.

Author Closing Comment

ID: 39744847
Thank you very much!  A point of clarification - if you have a moment - can you explain "not all modifications to Python variables are atomic"?  I think of atomic from Democratis as a small indivisible particle.  Not sure how that applies to a Python variable.
LVL 17

Expert Comment

ID: 39744988
OK: atomic operation in computer science means something like
"non interruptable by another thread." so an atomic operation would either not be executed or it wouild be completely executed.
Under no circujmstance it would be possible, that another thread interrupts an atomic operation while 'half'of the work is done.

you can refer to http://en.wikipedia.org/wiki/Atomic_operation

By the way it seems, that my above statement is wrong. some googling seems to indicate, that Python variable assignments are atomic operations.

However commands like
a += 1
a, b = b, a

would not be and modifying several members of an object wouldn't thus normally
Locks should be used in order to avoid, that an object is in an inconsistent state before being accessed by another thread.

Author Comment

ID: 39745128
Thank you again very much.

Featured Post

Free Tool: Port Scanner

Check which ports are open to the outside world. Helps make sure that your firewall rules are working as intended.

One of a set of tools we are providing to everyone as a way of saying thank you for being a part of the community.

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

This article will show the steps for installing Python on Ubuntu Operating System. I have created a virtual machine with Ubuntu Operating system 8.10 and this installing process also works with upgraded version of Ubuntu OS. For installing Py…
The purpose of this article is to demonstrate how we can upgrade Python from version 2.7.6 to Python 2.7.10 on the Linux Mint operating system. I am using an Oracle Virtual Box where I have installed Linux Mint operating system version 17.2. Once yo…
Learn the basics of modules and packages in Python. Every Python file is a module, ending in the suffix: .py: Modules are a collection of functions and variables.: Packages are a collection of modules.: Module functions and variables are accessed us…
Learn the basics of while and for loops in Python.  while loops are used for testing while, or until, a condition is met: The structure of a while loop is as follows:     while <condition>:         do something         repeate: The break statement m…

718 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question