Manual wolf
This document describes
step by step instructions on building a Beowulf cluster. After seeing all of the documentation that
was available, I felt there were enough gaps and omissions that my own document
that accurately describes how to build a Beowulf cluster would be beneficial.
I first saw Thomas
Sterling’s article in Scientific American, and immediately got the book,
because its title was “How to Build a Beowulf”.
NO DOUBT - it was a valuable reference, but it really does not walk you
through instructions on exactly what to do.
So next, I consulted the
web. There were hundreds of web pages,
and still, certain important details were vaguely implied, or just plain left
out.
After working it out on my
own, off and on, for several months, I re-consulted the web, and found a very
good web page that stated all of the important details in no uncertain
terms: http://www.phy.duke.edu/brahma/beowulf_book/node62.html through node6n.html. It also made me feel better, because that
author did the same things I did.
Thanks, Brahma!
If you went to the above
web page, you will see that author suggests that you manually configure each
box, and then, later on, after you get the feel of doing this whole “wolfing
up” procedure, you can set up new nodes automatically, which I will describe in
a later document.
So here is a description
of what I got to work. It is only one
example – my example. You may choose a
different message passing interface; you may choose a different Linux
distribution. You may also spend as
much time as I did researching and experimenting, and learn on your own.
I have proven these
instructions to work on Red Hat 8 and 6; I am currently doing experiments on
making it work on other distributions.
Let’s briefly outline your
requirements:
1. More than one box, each
equipped with a network card.
2. A switch or hub to
connect them.
3. Linux
4. A message passing
interface. [I used lam]
I recall, during my
research, seeing someone use a binary tree of kvm
switches so he could switch between every single box in his cluster. It is not a requirement to have a kvm switch, but merely a TWO port switch is convenient
while setting up and / or debugging.
So let’s get wolfing. Choose the most powerful box to be the head
node. Install linux on there, and choose every
package you want. The only requirement
is that you [in RH speak] choose “Network Servers” because you need to have NFS
and rsh. That’s
all you need. But in my case, I was
going to do development of the Beowulf application, so I added X and C
development.
Those of you researching
Beowulf systems will also know how you can have a 2nd network card
on the head node so you can access it from the outside world. Pretty good advice, but in my case I
didn’t. The only thing on MY wolf
connected to the outside world is the power cord. Therefore I dispensed with all of the extra
work of setting up firewalls.
Log on to the head node as
root, because you will be doing sysadmin commands.
If you use lam as your
message passing interface, you will read in the manual to turn OFF the
firewalls, because they use random port numbers to communicate between
nodes. Here is a rule: If the manual tells you to do something, DO IT! The lam
manual also tells you to run as a non-root user. Make the same user for every
box. Every box on the cluster will have the same user “wolf” with the same
password.
Another thing I learned
the hard way: use a password that obeys the strong password constraints for
your distribution. I used an
easily-typed password like “a” for my user, and the
whole thing did not work. When I changed
my password to a good “B1l3M5l2a” or something, and it worked.
In order to responsibly
set your cluster up, you should have some measure of security. After you create your user, create a group,
and add the user to the group. Then, you
may modify your files and directories to only be accessible by the users within
that group:
groupadd
beowulf
usermod
–g beowulf wolf
… and
add the following to .bash_profile:
umask
007
Now any files created by
the user “wolf” [or any user within the group] will automatically be only
readable by the group “beowulf”.
My network is
192.168.0.nnn because it is one of the “private” network
IP ranges. Thomas Sterling talks about
it on page 106 of his book. It is inside
my firewall, and works just fine. My
head node, which I call “wolf00” is 192.168.0.100, and
every other node is named “wolfnn”, with an ip of 192.168.0.100 + nn. I am following the sage advice of many of the
web pages out there, and setting myself up for an easier task of scaling up my
cluster.
Refer to the following web
site:
http://www.ibiblio.org/mdw/HOWTO/NFS-HOWTO/server.html#CONFIG
Print that up, and have it
at your side. I will be directing you
how to modify your system in order to create an NFS server, but I have found
this site invaluable, as you may also.
Make a directory for
everybody to share:
mkdir
/mnt/wolf
chmod
770 /mnt/wolf
chown
wolf:beowulf /mnt/wolf -R
Go to the /etc directory,
and add your “shared” directory to the exports file:
cd
/etc
cat exports
cat >> exports
/mnt/wolf 192.168.0.100/192.168.0.255 (rw)
<control d>
Now modify hosts. You will
see the comments telling you to leave the “localhost”
line alone. I blatantly ignored that
advice and fixed it to not include my hostname as the loopback
address.
The line used to say: 127.0.0.1 wolf00 localhost.localdomain localhost
It now says: 127.0.0.1
localhost.localdomain localhost
Then I added all the boxes
on my network. Note: This is not
required for the operation of a Beowulf cluster; only convenient for me, so
that I may type a simple “wolf01” instead of 192.168.0.101:
192.168.0.100
wolf00
192.168.0.101
wolf01
192.168.0.102
wolf02
192.168.0.103
wolf03
When you use rsh to remote login to another box, you will be prompted
for userid and password. You can fix that with hosts.equiv:
cat > hosts.equiv
wolf00
wolf01
wolf02
wolf03
<control d>
Make sure that services
that I want are up:
chkconfig
–add rsh
chkconfig
–add telnet
chkconfig
–add nfs
chkconfig
–add rexec
chkconfig
–add rlogin
chkconfig
–level 3 rsh on
chkconfig
–level 3 telnet on
chkconfig
–level 3 nfs on
chkconfig
–level 3 rexec on
chkconfig
–level 3 rlogin on
Telnet? I added
this just as a convenience. It is not
needed but it is nice to have while debugging your nfs
stuff. How are you going to log into a
box if you cant rsh to the
box? Here is the only reason why I used
the kvm switch.
It is useful for going back and forth between the head node and the node
I am currently setting up.
…And, during startup, I
saw some services that I know I don’t want, and in my opinion, could be
removed:
chkconfig
–
chkconfig
–
To be responsible, we make
ssh work.
chkconfig
--add sshd
chkconfig
--level 3 sshd on
add to the end of /etc/rc.d/rc.local:
sshd
&
once, do this:
ssh
–keygen –b 1024 –f filename –t rsa
–N big fat passphrase
… and
log on once, where it will ask you a question or two to initialize. After which you should be able to log on with
ssh and not be asked
anything [but password, but we will fix this].
Lastly, put your message
passing interface on the box. You can
either build it using the supplied source, or use their precompiled
package. It is not in the scope of this
document to describe that – I just got the source and followed the directions,
and in another experiment I installed their rpm; both of them worked fine. Remember the whole reason we are doing this
is to learn – go forth and learn.
Okay, get your network
cables out. Install Linux on the first
non-head node. Going with my example
node names and IP addresses, this is what I chose during setup:
Workstation
auto partition
remove all partitions on system
use LILO as the boot loader
put boot loader on the MBR
host name wolf01
ip address
192.168.0.101
add the user “wolf” with the same password as on all
other nodes
NO firewall
ONLY package
installed: network servers. UN select all other
packages.
I don’t care what else you
choose; this is the minimum of what you need.
Like I mentioned earlier, many Beowulf-ers are
using legacy hand me down boxes with limited resources, so why fill the box up
with non-essential software you will never use? My research has been concentrated on finding
that minimal configuration to get up and running.
Here’s another very
important point. When you move on to an
automated install and config, you really will NEVER
log in to the box. Only during setup and
install do I type anything directly on the box. It makes me laugh when I think of the guy
with his pile of n-1 kvm switches.
When the computer starts
up, it will complain if it does not have a keyboard connected. I was not able to modify the BIOS, because I
had older discarded boxes with no documentation, so I just connected a “fake”
keyboard. I am in the computer industry,
and see hundreds of keyboards come and go, and some occasionally end up in the
garbage. I get the old dead keyboard out of the
garbage, remove JUST the cord with the tiny circuit board up there in the
corner, where the num lock and caps lock lights are. Then I plug the cord in, and the computer
thinks it has a complete keyboard without incident. Again, you would be better off modifying your
bios, if you are able to. This is just a
trick to use in the case that you don’t have a bios program.
After your
newly installed box reboots, log on as root again, and…
1.
do the same chkconfig
commands stated above to set up the right services.
2.
modify hosts;
remove “wolf0n” from localhost, and just add wolf0n and wolf00.
3.
install lam
4.
make the dir /mnt/wolf, chmod 777 /mnt/wolf
Up to this point, we are
pretty much the same as the head node.
I do NOT do the modification of the exports file. And, I do a new thing or two:
cat >> /etc/fstab
wolf00:/mnt/wolf
/mnt/wolf
nfs rw,hard,intr 0 0
<control d>
Then I modify /etc/lilo.conf. The 2nd
line of this file says timeout=nn
This is where my wondrous
use of cat and the redirection operators breaks down. Notice every modification I have done so far
has been using cat. So you need to
somehow modify that 2nd line to say “timeout=1200”. I broke down and used vi,
but you can do it however you want, and if you hate vi enough, modify and copy
this lilo.conf file on a floppy and just copy it to
your newly created system.
After it is modified, as
root, say /sbin/lilo, and it will make the changes take effect. It will say “Added linux
*”.
Why do I do this lilo modification?
If you were researching Beowulf on the web, and understand everything I
have done so far, you would wonder, “I don’t remember reading anything about lilo.conf.”
My Beowulf cluster all
sits on a single power strip. I turn on
the power strip, and every box on the cluster starts up immediately. As the startup procedure progresses, it
mounts file systems. Seeing that the
non-head nodes mount the shared directory from the head node, they all will
have to wait a little bit until the head node is up, with NFS ready to go. So, I make each non-head node wait 2 minutes
in the lilo step.
Meanwhile, the head node is coming up, and making the shared directory
available. By then, the non-head nodes
finally start booting up because lilo has waited 2
minutes.
All done! You are almost ready to start wolfing.
Reboot your boxes. Did they all come
up? Can you ping the head node from each
box? Can you ping each node from the head node? Can you telnet? Can you rsh? Don’t worry about doing rsh
as root; only as wolf. If you are
logged in as wolf, and rsh to a box, does it go
automatically, without prompting for password?
After the node boots up,
log in, and say “mount”. Does it show
wolf00:/mnt/wolf mounted? On the head
node, copy a file into /mnt/wolf. Can you read and write that file from the
node box? This is really not required;
it is merely convenient to have a common directory reside on the head node. You can easily do rcp
to copy files between boxes. Also
Once you can do all the
tests shown above, you should be able to run a program. From here on in, the instructions are lam
specific. Go back to the head node, log in as wolf, and:
cat > /mnt/wolf/lamhosts
wolf00
wolf01
wolf02
wolf03
wolf04
<control d>
Go to the lam examples
directory, and compile “hello.c”:
mpicc
–o hello hello.c
cp hello /mnt/wolf
Then, as shown in the lam
documentation, start up lam:
[wolf@wolf00 wolf]$ lamboot -v lamhosts
LAM 7.0/MPI 2 C++/ROMIO -
n0<2572> ssi:boot:base:linear: booting n0 (wolf00)
n0<2572> ssi:boot:base:linear: booting n1 (wolf01)
n0<2572> ssi:boot:base:linear: booting n2 (wolf02)
n0<2572> ssi:boot:base:linear: booting n3 (wolf04)
n0<2572> ssi:boot:base:linear: finished
So we are now finally
ready to run an app. [Remember, I am
using lam; your message passing interface may have different syntax].
[wolf@wolf00 wolf]$ mpirun n0-3 /mnt/wolf/hello
Hello, world! I am 0 of 4
Hello, world! I am 3 of 4
Hello, world! I am 2 of 4
Hello, world! I am 1 of 4
[wolf@wolf00 wolf]$
Recall I mentioned the use
of
So now you know how I did
it; hope this helps; have fun on your own project.
You may wonder – “Why does
he say ‘wolf’ and not ‘beowulf’ like it is supposed
to be called?” I say to you – “Because
it’s fun. Nobody else calls their
cluster a ‘bay wolf’ so I am.”
Auto–wolf
Now let’s automate the
install so you may create a node by merely inserting a floppy, and the box will
completely build itself with no user interaction.
On the head node, make
another directory, and install directory, for everybody to share:
mkdir
/mnt/install
Go to the /etc directory,
and add your “shared” directory to the exports file:
cd
/etc
cat exports
cat >> exports
/mnt/install 192.168.0.100/192.168.0.255 (ro)
<control d>
Go to your original
distribution, and note the directory structure. A directory called RedHat contains a directory called RPMS, which contains all
the packages that you choose from when doing an install. Copy the whole RedHat
directory tree into the /mnt/install directory. After you are complete, modify their security
so they are only accessible to the group “beowulf”:
chmod
770 /mnt/install -R
chown
wolf:beowulf /mnt/install
-R
Remember, for non-head
nodes, we only installed Network Servers package, and
none others. So you may carefully
choose just the RPMs that are necessary. In order to
see this, you can refer to /root/install.log on one
of your manually built non-head nodes.
Go to your install CD, and
just like you created your original install floppy to do your first Linux
install, choose the “bootnet.img” file to create a
network install floppy. Use that floppy
to install your next node. Instead of
asking for you to insert a CD ROM, it will ask you the ip
address of your head node, and where the shared install directory is. Doing this build will also make sure you have
copied all of the right RPMs into your shared install
directory. If any of the files are
missing, you will see an error stating that a file is missing. Copy the missing file to your shared install
directory, and continue the install.
Next, install DHCP on your
head node. Go to http://tldp.org/HOWTO/DHCP/x369.html
and follow the instructions. Here is a
basic summary of what I did:
Gunzip and untar the file.
Go to the directory that
it just created: dhcp-3.0p12
Do the following commands:
./configure
make
make install
mv
dhcp-3.0p12/work.linux-2.2/server/dhcpd /usr/sbin
route add –host
255.255.255.255 dev eth0
mkdir
/var/state/dhcp
touch /var/state/dhcp/dhcpd.leases
and add the following line to the end of /etc/rc.d/rc.local:
/usr/sbin/dhcpd –d –f
> /var/log/dhcp.log 2>&1 &
In my experience, I found
that I had to add the following line in dhcpd.conf:
ddns-update-style ad-hoc;
… after
the “max-lease-time” line.
After successfully
installing the dhcp server on your head node, you may
now install a node again with your network install floppy like before. When it comes to the screen that has you
enter an IP address, you may now choose “use bootp / dhcp”. After the
install is complete, the newly installed box should have a dynamically assigned
IP address.
I saw, when specifying a
range of IP addresses in the dhcpd.conf file, the
addresses were handed out in reverse order – that is:
First box: x.x.x.255
2nd box: x.x.x.254
3rd box: x.x.x.253
and so on. The
web site mentioned before speaks about statically assigning IPs
to specific MAC addresses: http://www.phy.duke.edu/brahma/beowulf_book/node64.html
Really, it does not matter
if you use one method over another – the goal is to build a box without any
user intervention, including the assignment of the IP address. The reason I chose to use static IPs is because I stick a label on the front of each box
saying what its IP address is. So when
192.168.0.103 has a problem, I know which
So now we are at a point
where we can completely automate our install.
By now you probably have reinstalled your nodes enough times to be bored
with it, and truly would appreciate if a program would perform these repetitive
tasks for us.
Run the kickstart configurator program,
which will ask you all the steps you have performed in your installs. It will save your answers in a file, ks.cfg.
In
my experience, after making this file, I also had to manually modify it like
so:
#Disk
partitioning information
part swap
--recommended
part / --fstype ext3 --size=1 --grow --maxsize
100000
Sure, there could have
possibly been a proper set of responses within the configurator
to create this output, but I found it easier to just manually modify the file
as shown.
On your network install
floppy, modify the file syslinux.cfg. As you recall, after the box boots off this
floppy, it gives you a minute or so to enter any parameters, and then
automatically goes into the interactive install. We want to force it to use the kickstart file we generated.
Change the line default
linux to say
default
ks
Change the line timeout
nnn to say timeout 2
On the label ks section, add “append ks=floppy” like
so:
label ks
kernel vmlinuz
append ks=floppy initrd=initrd.img lang= devfs=nomount ramdisk_size=9216
Copy
this modified syslinux.cfg on to
the network install floppy, along with your ks.cfg
file, and do another install. You may
sit back and watch the show on your monitor.
You should not have to type a thing.
It should go through the whole install process, answering every question
the way you would have manually entered it.
Lastly,
we will add the “post” section to the ks.cfg
file. As the earlier instructions show,
we do a series of modifications to our newly installed system to make it a
Beowulf node: install lam, change files in /etc, start and stop services, and
so on.
Here
is the resulting ks.cfg file:
#Generated by
Kickstart Configurator
#System
language
lang en_US
#Language
modules to install
langsupport en_US
#System
keyboard
keyboard us
#System mouse
mouse none
#System timezone
timezone --utc
America/Chicago
#Root
password
rootpw --iscrypted big
garbage string
#Reboot after
installation
reboot
#Use text mode install
text
#Install Red
Hat Linux instead of upgrade
install
#Use
nfs --server 192.168.0.100 --dir /mnt/install
#System bootloader configuration
bootloader --useLilo
--linear --location=mbr --append ks=floppy
#Clear the
Master Boot Record
zerombr yes
#Clear only
Linux partitions from the disk
clearpart --linux --initlabel
#Disk
partitioning information
part swap
--recommended
part / --fstype ext3 --size=1 --grow --maxsize
100000
#Use DHCP
networking
network --bootproto dhcp
#System
authorization information
auth --useshadow
--enablemd5
#Firewall
configuration
firewall
--disabled
#Do not configure the X Window System
skipx
%packages --resolvedeps
@Network
Servers
%post
mkdir /mnt/wolf
chmod 777 /mnt/wolf
/usr/sbin/useradd wolf
sleep 20
/usr/sbin/usermod -p 'big garbage string' wolf
/sbin/chkconfig --add telnet
/sbin/chkconfig --add rsh
/sbin/chkconfig --add nfs
/sbin/chkconfig --add rexec
/sbin/chkconfig --add rlogin
/sbin/chkconfig --level 3 telnet on
/sbin/chkconfig --level 3 rsh on
/sbin/chkconfig --level 3 nfs on
/sbin/chkconfig --level 3 rexec on
/sbin/chkconfig --level 3 rlogin on
/sbin/chkconfig --
/sbin/chkconfig --
/sbin/chkconfig --
/sbin/chkconfig --
mkdir /mnt/inst
chmod 777 /mnt/inst
mount
192.168.0.100:/mnt/install /mnt/inst
ls -l /mnt/inst
>> /home/wolf/proof1.txt
cp /mnt/inst/etc/lilo.conf /etc
/sbin/lilo >> /home/wolf/proof2.txt 2>&1
ls -l /mnt/inst/etc/ho*
>> /home/wolf/proof3.txt
cp /mnt/inst/etc/ho* /etc
cat /mnt/inst/etc/fstab >> /etc/fstab
ls -l /mnt/inst/home
>> /home/wolf/proof4.txt
cp /mnt/inst/home/* /home/wolf
rpm -i /home/wolf/lam.rpm
As
you see, I chose “reboot” on the end, which puts a little responsibility on
your shoulders. If you put the floppy
in, turn on the box, and walk away, it will go through all the steps, and then
reboot. Upon rebooting, it will boot
from floppy, and start the whole process over again.
You
could do two things:
1.
Skip the reboot, and let the box just sit there when it is done installing.
Then you will have to eject the floppy, and reboot the box with the reboot
command, or the “trip over the power cord” reboot, which has its own ups and
downs.
2.
I timed the amount of time it took to read all of its data off the
floppy, which was a minute or two, and then ejected the floppy. The long part, approximately 20 minutes,
will go on and on, and reboot itself safely, because I got the floppy out of
the way. I have an 18 minute window to remember to eject the floppy.
There
is another issue worthy of mention: this
easy “insert a floppy” install implies that you have saved off any important
data from the box, because it will get completely erased and rebuilt. But as the Beowulf documentation out there
describes, you should not be saving important data on the nodes – you should
view each node as an expendable resource, and at the moment it acts up, you
would have no hard feelings in completely destroying and rebuilding it.