Skip to main content

MPICH2: mpdboot failure due to conflicting /etc/hosts entry

One single line in the /etc/hosts file on my Ubuntu Linux machine wasted my precious whole afternoon. I had some assumptions about the cause of this problem but I don't want to spend too much time investigating on what really caused the trouble. Anyway, here are the listings of what worked and what didn't:

/etc/hostname
ubuntu

mpd.hosts.cluster
master
worker-01

Bad /etc/hosts
127.0.0.1 localhost localhost.localdomain ubuntu
192.168.200.128 master
192.168.200.129 worker-01

Result
thitiv@ubuntu:~$ mpdboot -n 2 -f mpd.hosts.cluster
thitiv@master's password:
mpdboot_ubuntu (handle_mpd_output 359): failed to ping mpd on master; recvd output={}

Good /etc/hosts
127.0.0.1 localhost localhost.localdomain
192.168.200.128 ubuntu master
192.168.200.129 worker-01

Result
thitiv@ubuntu:~$ mpdboot -n 2 -f mpd.hosts.cluster
thitiv@worker-01's password:
thitiv@ubuntu:~$

At this point, with the /etc/hosts now fixed, the MPICH2 cluster could be booted up successfully.

I don't want to invest my time trying to understand this right now but I would really appreciate anyone explaining to me what went wrong.

Keywords: , , , , ,

Comments

wheatazogg said…
Good evening!

Usually, you have to make sure that all of the hosts are reachable from all of the machines on the cluster's internal network.

Is ubuntu an alias for one of the machines in the hosts file?

With your original /etc/hosts file:
127.0.0.1 localhost localhost.localdomain ubuntu
192.168.200.128 master
192.168.200.129 worker-01

If you share that hosts file on all of the machines, the name 'ubuntu' always refers to the machine you're on. That is, regardless of whether you're on master or worker-01, each machine things it's 'ubuntu' as well.

You can get serious identity issues when the MPDs try to ping each other if things are inconsistent. It helps to make sure that all of the nodes are listening on interfaces on the same subnet -- which you did by moving the name.

Matthew
Dr. Thiti said…
Thank you very much for pointing out the loop-back issue, Matthew. I think this is the cause of the problem.
Anonymous said…
Couple of questions, regarding your MPI blogs. Are you running Ubuntu on all the nodes? If yes, how did you setup rsh/rlogin on the nodes and the master? I couldn't get beyond setting up openssh-server, I need to supply password to every attempt -

ssh hostname date

requires password.
Dr. Thiti said…
Hi Indy,

Yes, I was running Ubuntu on all nodes. I used apt-get install to install the OpenSSH package supplied with the Ubuntu distribution. I also compiled MPICH-2 from the latest source.

Unlike the MPICH-1 version that I used many years ago, I didn't have to setup an rlogin file for MPICH-2. I think MPICH-2 has switched from rsh/rlogin to SSH. So basically we should make sure that SSH itself works before we start installing MPICH-2.

I tried to simplify things a little bit by creating the same login names and passwords on all machines.

I would suggest you to consult an SSH how-to document and try to SSH between machines before you proceed with the MPICH-2 installation.

Thiti.
Kanibal said…
Hi, to this point works for my cluster, but then i have this problem:
kanibal@kubuntu:~$ mpiexec -n 2 cpi
problem with execution of cpi on nodo2: [Errno 2] No such file or directory
problem with execution of cpi on kabuntu: [Errno 2] No such file or directory

do you know a posible answer?
kanibal
kanibalv@gmail(NOSPAM).com
Dr. Thiti said…
Kanibal,

Something might have gone wrong with your shared file system – perhaps the NFS shared directory was not properly mounted. I would suggest you log on to each machine manually and verify that “cpi” is accessible from each node.

--
Thiti.
Kanibal said…
thank for write me back, but, since i don't have much experience with NFS, to this moment i can't make it to work properly, would you help me with the way for configuation of the nfs file system on the nodes.
i was thinking, if there is not much problem, a small tutorial for MPICH2 on ubuntu.
Thanks for your motivation.

kanibal.-
Kanibal said…
finally work's...
hi, I made a small tutorial, please tell me how it is... (specially my english)
the link is:
http://kanibalv.blogspot.com/
"Installation and configuration of MPICH2 for a Beowulf Cluster".

thank's for all...

kanibal..

Popular posts from this blog

"Microsoft.ACE.OLEDB.12.0 provider is not registered on the local machine" Error on Windows 7 (64-bit) + Office 2010 (64-bit) + Visual Studio 2010

If you use (1) Windows 7 (64-bit), and (2) Office 2010 (64-bit), and  (3) Visual Studio 2010 to write an ASP.NET code to connect to Access or Excel database using the Microsoft.ACE.OLEDB.12.0 provider and consistently get the "Microsoft.ACE.OLEDB.12.0 provider is not registered on the local machine" error, try installing the 2007 Office System Driver: Data Connectivity Components , which is basically a Microsoft Access Database Engine 2007 Redistributable for Windows (32-bit) from http://www.microsoft.com/download/en/details.aspx?id=23734 Many forums suggested by Google Search suggest installing the Microsoft Access Database Engine 2010 Redistributable for Windows (32-bit, 64-bit) downloadable from http://www.microsoft.com/download/en/details.aspx?displaylang=en&id=13255 but it wouldn't help because Visual Studio 2010 is a 32-bit application; what you need is a 32-bit Data Connectivity component. The 2010 download will not allow you to install i...

Tips: Mac OS X: Full ANSI Color Support in Terminal.app

I'm trying to switch my Java development platform from Windows XP to Mac OS X Tiger. Wondering how to colorize the Terminal screen, I spent some time googling. From the discussions at the end of this page: macosxhints.com - Add full ANSI color support to Terminal.app Here's a summary of how to enable it: With bash shell as default, simply add export TERM=xterm-color [I prefer this for Linux compatibility] or export TERM=dtterm in the ~/.profile (single-user) or /etc/profile (system-wide) Color terminal is enabled. Use ' ls -G ' (the -G enables color output) to test. Add alias ls='ls -G' in the profile file for convenience. Keywords: mac-os-x , unix , terminal , shell , tips

Lenovo IdeaPad S10: Annoying Fan Noise Fix and Heat Conduction Upgrade

Here's a repost of my message on the Lenovo.com board: I would like to thank Slash (on Lenovo.com board) so much for his messages on this thread: http://forums.lenovo.com/lnv/board/message?board.id=IdeaPad_Netbook&message.id=1183#M1183 I decide today (February 28, 2009) to walk into a Lenovo Service Center inside a computer mall in downtown Bangkok because I cannot tolerate the constant grinding noise from my S10 anymore. I bought my S10 in late November 2008 and I started to hear some grinding noise in late December 2008. I have been using the S10 as my primary computer both for work and for home. At work, this noise has embarrassed me many, many times. At home, I left the S10 with Vista in High Performance mode on overnight so that it can complete the maintenance tasks, e.g., HDD Defrag and Indexing, but the fan noise from high CPU temperature woke me up at 4:45 in the morning! I learned from Slash's picture http://i41.tinypic.com/25alq3d.jpg that Lenovo chose to use chea...