Once you can do all the tests shown above, you should be able to run a program. From here on in, the instructions are lam specific.
Go back to the head node, log in as wolf, and enter the following commands:
cat > /nnt/wolf/lamhosts wolf01 wolf02 wolf03 wolf04 <control d>
Go to the lam examples directory, and compile “hello.c”:
mpicc -o hello hello.c cp hello /mnt/wolf
Then, as shown in the lam documentation, start up lam:
[wolf@wolf00 wolf]$ lamboot -v lamhosts LAM 7.0/MPI 2 C++/ROMIO - Indiana University n0<2572> ssi:boot:base:linear: booting n0 (wolf00) n0<2572> ssi:boot:base:linear: booting n1 (wolf01) n0<2572> ssi:boot:base:linear: booting n2 (wolf02) n0<2572> ssi:boot:base:linear: booting n3 (wolf04) n0<2572> ssi:boot:base:linear: finished
So we are now finally ready to run an app. [Remember, I am using lam; your message passing interface may have different syntax].
[wolf@wolf00 wolf]$ mpirun n0-3 /mnt/wolf/hello Hello, world! I am 0 of 4 Hello, world! I am 3 of 4 Hello, world! I am 2 of 4 Hello, world! I am 1 of 4 [wolf@wolf00 wolf]$
Recall I mentioned the use of NFS above. I am telling the nodes to all use the nfs shared directory, which will bottleneck when using a larger number of boxes. You could easily copy the executable to each box, and in the mpirun command, specify node local directories: mpirun n0-3 /home/wolf/hello. The prerequisite for this is to have all the files available locally. In fact I have done this, and it worked better than using the nfs shared executable. Of course this theory breaks down if my cluster application needs to modify a file shared across the cluster.