0

Sorry if the question is too open-ended or otherwise not suitable, but this is due to my lack of understanding about several pieces of technology/software, and I'm quite lost. I have a project where I have an existing java swing GUI, which runs MPI jobs on a local machine. However, it is desired to support running MPI jobs on HPC clusters (let's assume linux cluster with ssh access). To be more specific, the main backend executable (linux and windows) that I need to, erm, execute uses a very simple master-slave system where all relevant output is performed by the master node only. Currently, to run my backend executable on multiple machines, I would simply need to copy all necessary files to the machines (assuming no shared filespace) and call "mpiexec" or "mpirun" as is usual practice. The output produced by the master needs to be read in (or partially read in) by my GUI.

The main problem as I see things is this: Where to run the GUI? Several options:

  • Local machine - potential problem is needing to read data from cluster back to local machine (and also reading stdout/stderr of the cluster processes) to display current progress to user.
  • Login node - obvious problem of hogging precious resources, and in many cases will be banned.
  • Compute node - sounds pretty dodgy - especially if the cluster has a queuing system (slurm, sun grid, etc)! Also possibly banned.

Of these three options, the first seems the most reasonable, and also seems least likely to upset any HPC admin people, but is also the hardest to implement! There are multiple problems associated with that setup:

  • Passing data from cluster to local machine - because we're using a cluster - by definition we probably will generate large amounts of data, which the user wants to see at least part of! Also, how should this be done? I can see how to execute commands on remote machine via ssh using jsch or similar, but if i'm currently logged in on the remote machine - how do I communicate information back to the local machine?
  • Displaying stdout/stderr of backend in local machine. Similar to above.
  • Dealing with peculiar aspects of individual clusters - the only way I see around that is to allow the user to write custom slurm scripts or such like.
  • How to detect if backend computations have finished/failed - this problem interacts with any custom slurm scripts written by user.

Hopefully it should be clear from the above that I'm quite confused. I've had a look at apache camel, jsch, ganemede ssh, apache mina, netty, slurm, Sun Grid, open mpi, mpich, pmi, but there's so much information that I think I need to ask for some help and advice. I would greatly appreciate any comments regarding these problems!

Thanks

================================

Edit

Actually, I just came across this: link which seems to suggest that if the cluster allows an "interactive"-mode job, then you can run a GUI from a compute node. However, I don't know much about this, nor do I know if this is common. I would be grateful for comments on this aspect.

queenbee
  • 155
  • 1
  • 7

1 Answers1

2

You may be able to leverage the approach shown here: a ProcessBuilder is used to execute a command in the background of a SwingWorker, while the command's output is displayed in a suitable component. In the example, ls -l would become ssh username@host 'ls -l'. Use JPasswordField as required.

Community
  • 1
  • 1
trashgod
  • 203,806
  • 29
  • 246
  • 1,045
  • Hi, thanks for your answer. However, I already know how to use ProcessBuilder to execute system commands - this is what I already do to execute MPI jobs on local machine. I also know that I can login using ssh in the way that you describe, or using a library such as JSch or similar. The difficulty is working out a reliable and robust system for extension to running MPI jobs on HPC clusters. – queenbee Jul 04 '14 at 08:57
  • I wondered about that; I'm guessing that there is no general solution. See also potential [MPI-Java bindings](http://en.wikipedia.org/wiki/Message_Passing_Interface#Java). If the cluster exposes a well-known service on an established TCP port, e.g. JDBC, you can forward the port via SSH. – trashgod Jul 04 '14 at 10:41
  • Unfortunately, the MPI stuff is already implemented in the C++ backend. My Java GUI only deals with _execution_ of the MPI backend as opposed to performing MPI communication itself. You are probably right about there being no general solution - perhaps I need to define my scope carefully - however, that means knowing enough to define the scope! :( – queenbee Jul 04 '14 at 13:55