Skip to main content

Problem: mpirun / mpiexec calls from Java Runtime.exec() Fails on Large MPI Problems

This is a follow-up to my earlier post:

Question: How to Launch an MPI Parallel Program Programmatically

An Iceberg Problem

I've been stuck on my MPI-Java integration experiment for more than a week and so far haven't found a way out of this problem. My problem is quite similar to this posting on the comp.parallel.mpi newsgroup. Unfortunately, the thread was first posted in the year 2001 and, almost five years later, no one has answered the question.

Let me explain my situation: I want to convert my parallel code, which is written in C and is based on the MPICH and PETSc parallel-computing libraries, into a Web service. From my point of view, three solutions are available: The first one is to use Apache Axis C++ Web Services library to extend the existing code into a Web service. The second one is to leave the parallel code as is and then develop a Java Web service that acts a proxy to execute the parallel code via a Java Runtime.exec() method call. The third one is to use the parallel code in the same way the old-fashioned Web applications were developed, thru some Common Gateway Interface (CGI) mechanisms.

On my first attempt, I went the Java Runtime.exec() plus mpiexec way. It worked well for small Hello World code with simple matrix computation and MPI function calls but it doesn't work when I told Runtime.exec() to launch my parallel structural analysis code. An excerpt of the error is attached at the end of this post.

I also looked at the Axis C++ way, which, for me, is the most direct (but painful) way to create a Web service from existing C code. From preliminary study, if I choose to avoid mpiexec to make a self-contained Web service, it seems like I will have to use the MPI_Comm_spawn function to initialize the master and the worker nodes. Looking at the sample codes, my feeling is that this is a bit much for me as I will have to learn how to use another toolkit -- the Axis C++ -- and to invest considerable time to be familiar enough to master it. I still remember how I spent almost one month in 2004 learning undocumented features of Apache Axis to create a custom serializer-deserializer module for my Java codes.

I also had a glance at CGI and again don't feel like investing my time in learning yet another technology.

I think I will have to stick with the "Java Runtime.exec() plus mpiexec" way. I will post further development about this on this blog.


References:

Question: How to Launch an MPI Parallel Program Programmatically
Google Groups : comp.parallel.mpi
Problems with Runtime.getRuntime().exec


Keywords: , , ,


Update: Jan 18, 06

The poster of the comp.parallel.mpi thread also cross-posted to comp.lang.java.programmer and there are 5 messages on this thread. Here is the link: Problems with Runtime.getRuntime().exec. Unfortunately, this thread leads to no solution.


Update: Jan 18, 06

A friend of mine suggested another solution: using socket communication -- by letting the Java proxy Web service communicate with the master of the cluster, which is implemented in C, thru socket communication. He also introduced me the class ProcessBuilder, which is a new Java class introduced in JDK 5.0, as an alternative to the Runtime.exec() solution.


Attachment:

Error from Java Runtime.exec() plus mpiexec:

Executing: mpiexec -n 2 /home/shared/thitiv/parefg-20060107_1451/parefg /home/shared/thitiv/parefg-20060107_1451/tension.in /home/shared/thitiv/parefg-20060107_1451/tension.out
OUTPUT>Petsc Release Version 2.3.0, Patch 43, April, 26, 2005
OUTPUT>See docs/changes/index.html for recent updates.
OUTPUT>See docs/faq.html for hints about trouble shooting.
OUTPUT>See docs/index.html for manual pages.
OUTPUT>-----------------------------------------------------------------------
OUTPUT>/home/shared/thitiv/parefg-20060107_1451/parefg on a linux-gnu named ubuntu by thitiv Wed Jan 18 14:50:13 2006
OUTPUT>Libraries linked from /usr/local/lib/petsc-2.3.0/lib/linux-gnu
OUTPUT>Configure run at Tue Jan 3 09:36:34 2006
OUTPUT>Configure options --with-mpi-dir=/usr/local/bin/mpich2 --with-shared=0
OUTPUT>-----------------------------------------------------------------------
OUTPUT>[0]PETSC ERROR: Caught signal number 11 SEGV: Segmentation Violation, probably memory access out of range
OUTPUT>[0]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
OUTPUT>[0]PETSC ERROR: likely location of problem given in stack below
OUTPUT>[0]PETSC ERROR: --------------- Stack Frames ---------------
OUTPUT>[0]PETSC ERROR: Note: The EXACT line numbers in the stack are not available,
OUTPUT>[0]PETSC ERROR: INSTEAD the line number of the start of the function
OUTPUT>[0]PETSC ERROR: is given.
OUTPUT>[0]PETSC ERROR: --------------------------------------------
OUTPUT>[0]PETSC ERROR: User provided function() line 0 in unknown directory unknown file
OUTPUT>[0]PETSC ERROR: Signal received!
OUTPUT>[0]PETSC ERROR: !
ERROR>[cli_0]: aborting job:
ERROR>application called MPI_Abort(MPI_COMM_WORLD, 59) - process 0
OUTPUT>rank 0 in job 3 ubuntu_32773 caused collective abort of all ranks
OUTPUT> exit status of rank 0: killed by signal 9
ExitValue: 137


Comments

Popular posts from this blog

"Microsoft.ACE.OLEDB.12.0 provider is not registered on the local machine" Error on Windows 7 (64-bit) + Office 2010 (64-bit) + Visual Studio 2010

If you use (1) Windows 7 (64-bit), and (2) Office 2010 (64-bit), and  (3) Visual Studio 2010 to write an ASP.NET code to connect to Access or Excel database using the Microsoft.ACE.OLEDB.12.0 provider and consistently get the "Microsoft.ACE.OLEDB.12.0 provider is not registered on the local machine" error, try installing the 2007 Office System Driver: Data Connectivity Components , which is basically a Microsoft Access Database Engine 2007 Redistributable for Windows (32-bit) from http://www.microsoft.com/download/en/details.aspx?id=23734 Many forums suggested by Google Search suggest installing the Microsoft Access Database Engine 2010 Redistributable for Windows (32-bit, 64-bit) downloadable from http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=13255 but it wouldn't help because Visual Studio 2010 is a 32-bit application; what you need is a 32-bit Data Connectivity component. The 2010 download will not allow you to install i...

Tips: Mac OS X: Full ANSI Color Support in Terminal.app

I'm trying to switch my Java development platform from Windows XP to Mac OS X Tiger. Wondering how to colorize the Terminal screen, I spent some time googling. From the discussions at the end of this page: macosxhints.com - Add full ANSI color support to Terminal.app Here's a summary of how to enable it: With bash shell as default, simply add export TERM=xterm-color [I prefer this for Linux compatibility] or export TERM=dtterm in the ~/.profile (single-user) or /etc/profile (system-wide) Color terminal is enabled. Use ' ls -G ' (the -G enables color output) to test. Add alias ls='ls -G' in the profile file for convenience. Keywords: mac-os-x , unix , terminal , shell , tips

Lenovo IdeaPad S10: Annoying Fan Noise Fix and Heat Conduction Upgrade

Here's a repost of my message on the Lenovo.com board: I would like to thank Slash (on Lenovo.com board) so much for his messages on this thread: http://forums.lenovo.com/lnv/board/message?board.id=IdeaPad_Netbook&message.id=1183#M1183 I decide today (February 28, 2009) to walk into a Lenovo Service Center inside a computer mall in downtown Bangkok because I cannot tolerate the constant grinding noise from my S10 anymore. I bought my S10 in late November 2008 and I started to hear some grinding noise in late December 2008. I have been using the S10 as my primary computer both for work and for home. At work, this noise has embarrassed me many, many times. At home, I left the S10 with Vista in High Performance mode on overnight so that it can complete the maintenance tasks, e.g., HDD Defrag and Indexing, but the fan noise from high CPU temperature woke me up at 4:45 in the morning! I learned from Slash's picture http://i41.tinypic.com/25alq3d.jpg that Lenovo chose to use chea...