Solved

Python proc vs thread

Posted on 2013-12-28
4
616 Views
Last Modified: 2013-12-29
In python - what is the difference between a process, a child process (that has been forked), the output of a fork-exec and a thread?  Actually I get the difference between the first two  - (I think) that when forking you wind up with a new process that is a virtual copy of an existing parent process only running separately with a new PID.  But I lost the trail when they got into fork-exec and then the thread concept.
0
Comment
Question by:amigan_99
[X]
Welcome to Experts Exchange

Add your voice to the tech community where 5M+ people just like you are talking about what matters.

  • Help others & share knowledge
  • Earn cash & points
  • Learn & ask questions
  • 2
  • 2
4 Comments
 
LVL 17

Accepted Solution

by:
gelonida earned 500 total points
ID: 39744551
In the first part I give the non Python specific answer (assuming you run on a Linux like OS) in the second part I'll try to tackle Python specific details (GIL, . . .)

1.)
creating a sub process with identical code/data is done by doing a fork()
- fork() duplicates the current process (the Python interpreter) with all its
   code / data (variables, . . .)
   The only difference between the new and the old process is the return value of the
   fork() call, which allows the two process to be distinguished, all variables have the
   same value, but if changed in one process, they will not be changed in the other()

- fork() / exec() is the Unix way of creating a new child process()
   the new forked child process will immediately exec anonther executyable. (load and
  execute some completely different code, which could be again a python program
  or something completely different. so basically all code / data of the child process
  will be overwritten

- creating a thread is like creating a new process with fork(), BUT both new processes will
  refer to the SAME data, so if one process modifies some data it will be changed for the
  otherthread (process) as well.


2.) Now to Python:
- fork()/exec(). If you would exec another python program, you would re-initialize a completely new Python interpreter, reload/import all the .py (or.pyc) files and run the code from the beginning.
 The Python module to use for creating child processes is the module subprocess.
 and the command would be subprocess.Popen() or some of the helpers simplifying this.

-fork() duplicates the process, nothing had to be reinitialized and both processes can now live their own life.
 A Python module, which is roughly using fork() to create new processes is the module multiprocessing. Multiprocessing will also work under windows, though the cost of creating another process would more or less have the cost of a fork()/exec() as windows doesn't really implement a fork() but for you as developer the code stays platform independent and creating a subprocess will be as fast as possible depending on the platform.

- with threading:
Threading is very powerful and nice, but has some issues. As two processes access the same data and not all modifications to Python variables are atomic you have to use a lot of tools to protect your code like mutexes ( threading.Lock()s ) not writing to the same file from different threads nor use many other shared exclusive resources at the same time.

If unexperienced it's easy to write code. which behaves differently than initially expected.

Another drawback only specific to Python is the GIL (global interpreter lock).
which means that due to an implementation limitation of Python multiple threads cannot execute Python byte code at the same time.

So if you have a pure CPU limited Python program on a machine with multiple cores (CPUs) you will see, that with multiprocessing both python programs will run in parallel and both your CPUs will be loaded, whereas with threading you'll be unable to benefit from the other CPU.

Depending on you want to do in your thread this may not be a problem, as:
- python has a lot of modules, which call code from shared libraries (numPy, PIL, . . . ) which releases the GIL before calling). SO they would benefit from a multicore machine.
In many of my programs most of the CPU is consumed in C libraries being called by Python.

- often threads are not used to perform calculations in parallel, but more to structure code and most of the time most threads are sleeping and just woken up for a short time to perform a short task.


So to summarize:
Three Python modules:
- subprocessing for fork()/exec()
- multiprocessing for fork()
- threading for threads

- to benefit from a multicore machine and it's CPUs you should use multiprocessing ( fork() )
- in order to write code needing to access the same data you should use threading.
  A typical example would be a GUI, where have to do some task when clicking on a button (transferring data over the net, .  . .) should not freeze the GUI.
 or a program with one thread receivning data another one processing it and another one sending it (thoug such code could also be implemented withouth threading and select() or alike.


Interthread communication:
- you can use Locks() / wait() functions and Queues() and variables

Interprocess Communication
- common files (if protected with file locking)
- pipes
- sockets
- a database, which supports multiple processes. (sqlite, sql servers, . . .)

Hope this is the level of detail that you were lookiong for.
0
 
LVL 1

Author Closing Comment

by:amigan_99
ID: 39744847
Thank you very much!  A point of clarification - if you have a moment - can you explain "not all modifications to Python variables are atomic"?  I think of atomic from Democratis as a small indivisible particle.  Not sure how that applies to a Python variable.
0
 
LVL 17

Expert Comment

by:gelonida
ID: 39744988
OK: atomic operation in computer science means something like
"non interruptable by another thread." so an atomic operation would either not be executed or it wouild be completely executed.
Under no circujmstance it would be possible, that another thread interrupts an atomic operation while 'half'of the work is done.

you can refer to http://en.wikipedia.org/wiki/Atomic_operation

By the way it seems, that my above statement is wrong. some googling seems to indicate, that Python variable assignments are atomic operations.

However commands like
a += 1
a, b = b, a

would not be and modifying several members of an object wouldn't thus normally
Locks should be used in order to avoid, that an object is in an inconsistent state before being accessed by another thread.
0
 
LVL 1

Author Comment

by:amigan_99
ID: 39745128
Thank you again very much.
0

Featured Post

On Demand Webinar - Networking for the Cloud Era

This webinar discusses:
-Common barriers companies experience when moving to the cloud
-How SD-WAN changes the way we look at networks
-Best practices customers should employ moving forward with cloud migration
-What happens behind the scenes of SteelConnect’s one-click button

Question has a verified solution.

If you are experiencing a similar issue, please ask a related question

Suggested Solutions

The really strange introduction Once upon a time there were individuals who intentionally put the grass seeds to the soil with anticipation of solving their nutrition problems. Or they maybe only played with seeds and noticed what happened... Som…
The purpose of this article is to demonstrate how we can use conditional statements using Python.
Learn the basics of lists in Python. Lists, as their name suggests, are a means for ordering and storing values. : Lists are declared using brackets; for example: t = [1, 2, 3]: Lists may contain a mix of data types; for example: t = ['string', 1, T…
Learn the basics of if, else, and elif statements in Python 2.7. Use "if" statements to test a specified condition.: The structure of an if statement is as follows: (CODE) Use "else" statements to allow the execution of an alternative, if the …

752 members asked questions and received personalized solutions in the past 7 days.

Join the community of 500,000 technology professionals and ask your questions.

Join & Ask a Question