2

I have a program (say, "prog") written in C that makes many numerical operations. I want to write a "driver" utility in python that runs the "prog" with different configurations in a parallel way, reads its outputs and logs them. There are several issues to take into account:

  1. All sort of things can go bad any time so logging has to be done as soon as possible after any prog instance finishes.
  2. Several progs can finish simultaneously so logging should be done centralized
  3. workers may be killed somehow and driver has to handle that situation properly
  4. all workers and logger must be terminated correctly without tons of backtraces when KeyboardInterrupt is handled

The first two points make me think that all workers have to send their results to some centralized logger worker through for example multiprocessing.Queue. But it seems that the third point makes this solution a bad one because if a worker is killed the queue is going to become corrupted. So the Queue is not suitable. Instead I can use multiple process to process pipes (i.e. every worker is connected through the pipe with a logger). But then the other problems raise:

  1. reading from pipe is a blocking operation so one logger can't read asynchronously from several workers (use threads?)
  2. if a worker is killed and a pipe is corrupted, how the logger can diagnose this?

P.S. point #4 seems to be solveable -- a have to

  1. disable default SIGINT handling in all workers and logger;

  2. add try except block to main process that makes pool.terminate();pool.join() calls in case of SIGINT exception handled.

Could you please suggest a better design approach if possible and if not than how to tackle the problems described above?

P.S. python 2.7

John Zwinck
  • 239,568
  • 38
  • 324
  • 436
DimG
  • 1,641
  • 1
  • 16
  • 23
  • My advice is to read these two articles first - (1) http://www.jeffknupp.com/blog/2012/03/31/pythons-hardest-problem/ and (2) https://www.jeffknupp.com/blog/2013/06/30/pythons-hardest-problem-revisited/ . The first one explains why Python might not be the best option for concurrency and the second - some ways of improving this drawback. – rbaleksandar Sep 16 '15 at 09:11
  • @rbaleksandar. Ha! You know, I was thinking that using Threads was not the way to go BUT the GIL maybe can really solve the problem! Each worker start the `prog` with subprocess.call(...) and after that saves the output to some shared variable. Owing to GIL, writing to shared variable will be absolutely safe. Despite thread locking the spawned `prog`s will run parallel. And there will be no need in all that bulky inter-process messaging tools. What do you think? – DimG Sep 16 '15 at 09:20
  • Are you able to modify `prog` or is it fixed and we can't touch it? What sort of input and output does it do? – John Zwinck Sep 16 '15 at 09:39
  • @JohnZwinck It's fixed. It takes a configuration file as input and outputs some formatted metrics to stdout. `driver` is responsible for configuration file preparation, spawning `prog`s and fetching their output. – DimG Sep 16 '15 at 10:17
  • The behaviour you're describing reminds me a LOT of how ROS (robot operating system) works. Especially the logging part is done pretty neatly there. May I suggest that you first look into it (that is - look at solutions that are already implemented and well tested)? Also the interpreter choice in this case is crucial. PyPy or Jython are a good pick (you can read a short overview of both in the second article I have posted in my first comment. – rbaleksandar Sep 16 '15 at 10:22

1 Answers1

1

You can start from the answer given here: https://stackoverflow.com/a/23369802/4323

The idea is to not use subprocess.call() which is blocking, but instead subprocess.Popen which is non-blocking. Set stdout of each instance to e.g. a StringIO object you create for each prog child. Spawn all the progs, wait for them, write their output. Should be not far off from the code shown above.

Community
  • 1
  • 1
John Zwinck
  • 239,568
  • 38
  • 324
  • 436