Dvbmonkey’s Blog

February 27, 2009

Getting started with Open MPI on Fedora

Filed under: linux — dvbmonkey @ 4:36 pm
Tags: , , , ,

Recently rediscovered the world of parallel computing after wondering what to do with a bunch of mostly idle Linux boxes, all running various versions of Fedora Core Linux. I found this guide particularly useful and decided to elaborate on the subject here.

Background

Open MPI is an open-source implementation of the Message Passing Interface which allows programmers to write software that runs on several machines simultaneously. Furthermore it allows these copies of the program to communicate/cooperate with each other to say… share the load of an intensive calculation amongst each other or, daisy-chain the results from one ‘node’ to another. This is not new, its been around for decades and today it is one of the main techniques used in Supercomputing platforms.

The basic principle is you need two things, firstly the MPI development suite in order to build your MPI-capable applications (e.g. Open MPI) and secondly a client/server queue manager to distribute the programs to remote computers and return the results (e.g. TORQUE). Both these components are distributed by the Fedora Project and are readily available.

Setting up the TORQUE server

Firstly, you will need to doctor the /etc/hosts file, placing your preferred hostname infront of “localhost” on the “127.0.0.1” line, example:

127.0.0.1 mpimaster localhost.localdomain localhost

Now, you will need to install the following packages, using something like YUM, the package torque-client will require some GUI related libraries (freetype, libX*, tcl, tk etc.) even if you’re not using X on the torque-server.

$ sudo yum install torque torque-client torque-server torque-mom libtorque

Next you will need to do some setup stuff, if you get a warning that pbs_server is already running do a /etc/init.d/pbs_server stop:

$ sudo /usr/sbin/pbs_server -t create
$ sudo /usr/share/doc/torque-2.1.10/torque.setup root

Now, create the following file and put the hostname of this server.

/var/torque/mom_priv/config:
$pbsserver mpimaster

Create another file, this will contain a list of all the nodes/clients we’re going to be using. The parameter “np=4” describes the number of processors (or cores) available on this node, in both cases below the client will be a QuadCore processor so I have set “np=4”. If you need to add more nodes to your MPI cluster at a later time, this is where you configure them.

/var/torque/server_priv/nodes:
mpinode01 np=4
mpinode02 np=4

We create another config file, this time just containing the hostname of the server machine.

/var/torque/server_name:
mpiserver

Now we update IPTables to allow incoming connections to the server, an example of my own configuration with the additional two lines in bold opening up tcp/udp ports 15000 to 15004. Once done run $ sudo /etc/init.d/iptables restart to pickup the new settings.

/etc/sysconfig/iptables:
# Firewall configuration written by system-config-firewall
# Manual customization of this file is not recommended.
*filter
:INPUT ACCEPT [0:0]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [0:0]
-A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT
-A INPUT -p icmp -j ACCEPT
-A INPUT -i lo -j ACCEPT
-A INPUT -m state --state NEW -m tcp -p tcp --dport 22 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 15000:15004 -j ACCEPT
-A INPUT -p udp -m udp --dport 15000:15004 -j ACCEPT

-A INPUT -j REJECT --reject-with icmp-host-prohibited
-A FORWARD -j REJECT --reject-with icmp-host-prohibited
COMMIT

IMPORTANT: Commands are sent to the client nodes over RSH/SSH, in order to make this all work its assumed you’ve setup key-based SSH from the server to each of the client nodes.

All done, a quick restart of the torque server and we’re onto setting up our client nodes.

$ sudo /etc/init.d/pbs_server restart
$ sudo /etc/init.d/pbs_mom restart

Setting up the TORQUE client nodes

Going for speed/efficiency, I devised a one-line shell command to install and configure each of the clients if you are logged on as root:

# yum -y install torque-client torque-mom && echo -e "192.168.0.240\tmpimaster" >> /etc/hosts && echo "mpimaster" >> /var/torque/server_name && echo "\$pbsserver mpimaster" >> /var/torque/mom_priv/config && /etc/init.d/pbs_mom start

But basically it breaks down into the following:

* Install the client software
$ sudo yum install openmpi torque-client torque-mom

* Add the server’s hostname and address to the /etc/hosts file
# echo -e "192.168.0.240\tmpimaster" >> /etc/hosts

* Set the server’s hostname in the config file(s)
# echo "mpimaster" >> /var/torque/server_name
# echo "\$pbsserver mpimaster" >> /var/torque/mom_priv/config

* Start the service
/etc/init.d/pbs_mom start

Testing it out

From the the ‘mpimaster’ machine, you should be able to issue the command pbsnodes -a and see the client machines connected e.g.

$ pbsnodes -a
mpinode01
state = free
np = 4
ntype = cluster
status = opsys=linux,uname=Linux pepe 2.6.27.12-170.2.5.fc10.i686.PAE #1 SMP Wed Jan 21 01:54:56 EST 2009 i686,sessions=? 0,nsessions=? 0,nusers=0,idletime=861421,
totmem=5359032kb,availmem=5277996kb,physmem=4146624kb,ncpus=4,loadave=0.00,netload=104310870,state=free,jobs=? 0,rectime=1235751237

mpinode02
state = free
np = 4
ntype = cluster
status = opsys=linux,uname=Linux taz 2.6.27.12-170.2.5.fc10.i686.PAE #1 SMP Wed Jan 21 01:54:56 EST 2008 i686,sessions=? 0,nsessions=? 0,nusers=0,idletime=366959,
totmem=5359048kb,availmem=5277268kb,physmem=4146640kb,ncpus=4,loadave=0.00,netload=46008061,state=free,jobs=? 0,rectime=1235751223

If you see this, congratulations! you are ready to rock! If your client nodes are not connected, check the configuration, network connectivity and lastly, check the ‘pbs_mom’ service is running on each client, optionally try restarting the ‘pbs_mom’ service.

MPI Development

You’ll need to install a couple of additional packages on your development machine,

$ sudo yum install openmpi openmpi-devel openmpi-libs

Now lets start with the inevitable ‘Hello World!’ example,

hello.c:

#include <stdio.h>
#include <mpi.h>
#include <unistd.h>

int main(int argc, char *argv[]) {
   int numprocs, rank, namelen;
   char processor_name[MPI_MAX_PROCESSOR_NAME];
   MPI_Init(&argc, &argv);
   MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
   MPI_Get_processor_name(processor_name, &namelen);
   printf("Hello World! from process %d out of %d on %s\n", rank, numprocs, processor_name);
   MPI_Finalize();
}

Normally we’d just use gcc to build this, but for convenience MPI provides a mpicc which handles the include and library paths for you.

$ mpicc hello.c -o hello

In order to tell Open MPI / Torque where to run your application we must provide it with a “hostfile”, similar to the file /var/torque/server_priv/nodes we made earlier:

./myhostfile:

mpinode01 slots=4
mpinode02 slots=4

Now, we’re ready to run it for the first time. Note, in this example I did my development work on the machine acting as the ‘mpiserver’ – if you try submitting an MPI job from another machine you might need slightly different configuration.

$ mpirun --hostfile myhostfile hello
Hello World! from process 0 out of 8 on mpinode01
Hello World! from process 1 out of 8 on mpinode01
Hello World! from process 2 out of 8 on mpinode01
Hello World! from process 3 out of 8 on mpinode01
Hello World! from process 4 out of 8 on mpinode02
Hello World! from process 5 out of 8 on mpinode02
Hello World! from process 6 out of 8 on mpinode02
Hello World! from process 7 out of 8 on mpinode02

Voilà, you have just submitted an MPI task and had it execute on a number of your processors.

MPI makes distributing & communication between copies of your programs easy, however its up to you to use this potential to provide a real speed up in a real-work application. A really simple example is a program that operates on a set of 8 large files. Normally, while running on a single processor you would process these files sequentially. Using MPI you could load 8 copies of your program on 8 processing nodes, and have each node process a different file. Effectively giving you a 8-times speed up compared to running it on a single processor.

I’ve loosely tested the approach described here on different systems running Fedora Core Linux versions 8, 9 & 10. Any questions / comments welcomed!

Troubleshooting
Firstly, try out the Open MPI FAQ’s, personally I encountered the following problems:

  • mpirun appears to ‘hang’: caused by iptables, I just shut down iptables to resolve the issue.
  • Fedora Core 7: the package sets the wrong library path in /etc/ld.so.conf
  • Fedora Core 7: the package included with the distribution ‘doesnt work’, library issues

Updated: 2nd March 2009
Ooppss! as Jeff Squyres pointed out in his comment below, the way I configured things in the original post meant that “mpirun” just spawned 8 processes on my localhost – not the remote nodes. I’ve reworked the configuration to account for this. Many thanks Jeff!

Advertisements

3 Comments »

  1. A minor clarification — when you invoke “mpirun …”, if you are running inside of a Torque job, Open MPI will look around and try to figure out what hosts to run your application on. If not, OMPI will simply run on the localhost. In your first example:

    mpirun hello

    If you’re just running on mpimaster, the output will be as you mentioned. However, note that the hello program _will be running on mpimaster_ — it won’t be sent to any back-end Torque nodes.

    Similarly, in your second example

    mpirun -n 8 hello

    Those 8 processes will be running _on mpimaster_ — there was no dispatch to the other nodes.

    To get Open MPI to dispatch the processes to other nodes, there’s two common options:

    1. Use a hostfile. Explicitly provide the –hostfile option to mpirun specifying which hosts to run on. For example:

    $ cat myhostfile
    node1
    node2
    $ mpirun -n 2 –hostfile myhostfile hello
    –> OMPI will run one copy of “hello” on node1 and one copy of “hello” on node2

    2. Run inside of a Torque job. If you submit a script to Torque, Torque will wait for appropriate nodes to be available and then run the script on the first node that was allocated to you. If that script contains

    mpirun hello

    Then OMPI will look around, realize that it’s in a Torque job with N cores, and launch N copies of “hello” on the nodes that were allocated by Torque. No hostfile is required because Torque may allocate different nodes / values of N for any given job. So OMPI directly queries Torque and asks “what is the value of N?” / “what hosts should I launch jobs on?”

    You can, of course, supply an explicit -n value on the command line, even in a Torque job. For example:

    mpirun -n 1 hello

    Will run one copy of hello, regardless of the value of N in your Torque job, etc. But usually we encourage you not to do so; you picked the value of N when you submitted the Torque job, so there’s no need to specify it again.

    If you’re ever confused about the launching pattern, note that you can use the MPI_GET_PROCESSOR_NAME function to see what host you’re on, or, if you’re using Open MPI, you can run non-MPI applications through mpirun (mainly for convenience), such as:

    mpirun hostname
    or
    mpirun -n 8 hostname

    Hope that helps!

    Comment by Jeff Squyres — February 28, 2009 @ 12:57 pm | Reply

  2. […] Filed under: linux — dvbmonkey @ 2:26 pm Tags: linux, mpi, open mpi Building on the Getting started… post from last week I’ve knocked up a quick example showing one way to get your MPI processes […]

    Pingback by An Open MPI Master & Servant Example « Dvbmonkey’s Blog — March 2, 2009 @ 2:27 pm | Reply

  3. hi,
    when we tried to run mpi “hello program” in two laptop ,the following error was occurred
    [root@MohammedGamal openmpi-1.3.2]# mpicc test.c -o hello
    [root@MohammedGamal openmpi-1.3.2]# mpirun –hostfile myhostfile hello
    ————————————————————————–
    Open RTE was unable to open the hostfile:
    myhostfile
    Check to make sure the path and filename are correct.
    ————————————————————————–
    [MohammedGamal:05599] [[9056,0],0] ORTE_ERROR_LOG: Not found in file base/ras_base_allocate.c at line 200
    [MohammedGamal:05599] [[9056,0],0] ORTE_ERROR_LOG: Not found in file base/plm_base_launch_support.c at line 72
    [MohammedGamal:05599] [[9056,0],0] ORTE_ERROR_LOG: Not found in file plm_rsh_module.c at line 990
    ————————————————————————–
    A daemon (pid unknown) died unexpectedly on signal 1 while attempting to
    launch so we are aborting.

    There may be more information reported by the environment (see above).

    This may be because the daemon was unable to find all the needed shared
    libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
    location of the shared libraries on the remote nodes and this will
    automatically be forwarded to the remote nodes.
    ————————————————————————–
    ————————————————————————–
    mpirun noticed that the job aborted, but has no info as to the process
    that caused that situation.
    ————————————————————————–
    mpirun: clean termination accomplished

    [root@MohammedGamal openmpi-1.3.2]# vim myhostfile
    [root@MohammedGamal openmpi-1.3.2]# mpirun –hostfile myhostfile hello
    root@aboallol’s password:
    [aboallol:05159] Error: unknown option “–daemonize”
    Usage: orted [OPTION]…
    Start an Open RTE Daemon

    –bootproxy Run as boot proxy for
    -d|–debug Debug the OpenRTE
    -d|–spin Have the orted spin until we can connect a debugger
    to it
    –debug-daemons Enable debugging of OpenRTE daemons
    –debug-daemons-file Enable debugging of OpenRTE daemons, storing output
    in files
    –gprreplica Registry contact information.
    -h|–help This help message
    –mpi-call-yield
    Have MPI (or similar) applications call yield when
    idle
    –name Set the orte process name
    –no-daemonize Don’t daemonize into the background
    –nodename Node name as specified by host/resource
    description.
    –ns-nds set sds/nds component to use for daemon (normally
    not needed)
    –nsreplica Name service contact information.
    –num_procs Set the number of process in this job
    –persistent Remain alive after the application process
    completes
    –report-uri Report this process’ uri on indicated pipe
    –scope Set restrictions on who can connect to this
    universe
    –seed Host replicas for the core universe services
    –set-sid Direct the orted to separate from the current
    session
    –tmpdir Set the root for the session directory tree
    –universe Set the universe name as
    username@hostname:universe_name for this
    application
    –vpid_start Set the starting vpid for this job
    ————————————————————————–
    A daemon (pid 5619) died unexpectedly with status 251 while attempting
    to launch so we are aborting.

    There may be more information reported by the environment (see above).

    This may be because the daemon was unable to find all the needed shared
    libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
    location of the shared libraries on the remote nodes and this will
    automatically be forwarded to the remote nodes.
    ————————————————————————–
    ————————————————————————–
    mpirun noticed that the job aborted, but has no info as to the process
    that caused that situation.
    ————————————————————————–
    mpirun: clean termination accomplished

    can you help us

    Comment by colena — June 20, 2009 @ 12:25 pm | Reply


RSS feed for comments on this post. TrackBack URI

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Blog at WordPress.com.

%d bloggers like this: