Requirements
2x Linux Server as Compute Node
1 Linux Server as Master/Login Node
Debian 10 Buster Installed on all Servers
Making Linux Server Ready
Install Debian 10 Buster on each server and configure Network as Static by configuring /etc/network/interfaces
#In my case setting for master is as below
auto ens32
iface ens32 inet static
address 192.168.137.110
netmask 255.255.255.0
gateway 192.168.137.2
dns-nameservers 192.168.137.2 8.8.8.8
Configure same for compute notes with different IP Addresses as I assigned below
master 192.168.137.110
node1 192.168.137.111
node2 192.168.137.112
Setting the hostname
Configure hostname for each server as below by editing /etc/hostname file
On Master
echo 'master' > /etc/hostname
On Node1
echo 'node1' > /etc/hostname
On node2
echo 'node2' > /etc/hostname
Add IP Addresses against hostnames in all of three servers by editing /etc/hosts file one by one
#127.0.0.1 localhost
192.168.137.110 master
192.168.137.111 node1
192.168.137.112 node2
# The following lines are desirable for IPv6 capable hosts
#::1 localhost ip6-localhost ip6-loopback
#ff02::1 ip6-allnodes
#ff02::2 ip6-allrouters
Set correct timezone
It is Important in cluster configuration to have correct time for each node. Time synchronization must be same for all nodes.
To Set correct timezone for your server edit /etc/timezone file and add your timezone in it. In My case I have added as below Asia/Karachi.
After above configurations reboot servers.
Configurations of Master/Login node
Shared storage:
In order for a cluster to work well, a job should be able to be run on any of the nodes in the cluster. This means that each node needs to be able to access the same files.
SO we Have to configure NFS Server on master and mount it to all compute nodes.
(Note that this should be the same across all the nodes.) In my cluster, I used
/clusterfs
:sudo mkdir /clusterfs
sudo chown nobody.nogroup -R /clusterfs
sudo chmod 777 -R /clusterfs
NFS server
Now, we need to export the /clusterfs as a network file system share so the other nodes can access it. Do this process on the master node.
Install the NFS server.
sudo apt install nfs-kernel-server -y
Edit
/etc/exports
and add the following line:/clusterfs <ip addr>(rw,sync,no_root_squash,no_subtree_check)
Replace
<ip addr>
with the IP address schema used on your local network. This will allow any LAN client to mount the share. For example, if your LAN addresses were 192.168.1.X
, you would have:/clusterfs 192.168.137.0/24(rw,sync,no_root_squash,no_subtree_check)
rw
gives the client Read-Write access, sync
forces changes to be written on each transaction, no_root_squash
enables the root users of clients to write files as root permissions, and no_subtree_check
prevents errors caused by a file being changed while another system is using it.
Lastly, run the following command to update the NFS kernel server:
sudo exportfs -a
Mount the NFS Share on the Clients
Now that we’ve got the NFS share exported from the master node, we want to mount it on all of the other nodes so they can access it. Repeat this process for all of the other nodes.
Install the NFS client.
sudo apt install nfs-common -y
Create the mount folder.
This should be the same directory that you mounted the flash drive to on the master node. In my case, this is
This should be the same directory that you mounted the flash drive to on the master node. In my case, this is
/clusterfs
:sudo mkdir /clusterfs
sudo chown nobody.nogroup /clusterfs
sudo chmod -R 777 /clusterfs
Setup automatic mounting.
We want the NFS share to mount automatically when the nodes boot. Edit
We want the NFS share to mount automatically when the nodes boot. Edit
/etc/fstab
to accomplish this by adding the following line:192.168.137.110:/clusterfs /clusterfs nfs defaults 0 0
Now mount it with
sudo mount -a
and you should be able to create a file in /clusterfs
and have it show up at the same path across all the nodes.Install packages required
apt install openssh-server vim net-tools slurm-wlm munge -y
SLURM Configuration
We’ll use the default SLURM configuration file as a base. Copy it over:
cd /etc/slurm-llnl
cp /usr/share/doc/slurm-client/examples/slurm.conf.simple.gz .
gzip -d slurm.conf.simple.gz
mv slurm.conf.simple slurm.conf
Then edit
/etc/slurm-llnl/slurm.conf
.
Set the control machine info.
Modify the first configuration line to include the hostname of the master node, and its IP address:
Modify the first configuration line to include the hostname of the master node, and its IP address:
SlurmctldHost=node01(<ip addr of node01>)
# e.g.: node01(192.168.137.110)
Customize the scheduler algorithm.
SLURM can allocate resources to jobs in a number of different ways, but for our cluster we’ll use the “consumable resources” method. This basically means that each node has a consumable resource (in this case, CPU cores), and it allocates resources to jobs based on these resources. So, edit the
SLURM can allocate resources to jobs in a number of different ways, but for our cluster we’ll use the “consumable resources” method. This basically means that each node has a consumable resource (in this case, CPU cores), and it allocates resources to jobs based on these resources. So, edit the
SelectType
field and provide parameters, like so:SelectType=select/cons_res
SelectTypeParameters=CR_Core
Set the cluster name.This is somewhat superficial, but you can customize the cluster name in the “LOGGING AND ACCOUNTING” section:
ClusterName=Cluster101
Add the nodes.
Now we need to tell SLURM about the compute nodes. Near the end of the file, there should be an example entry for the compute node. Delete it, and add the following configurations for the cluster nodes:
Now we need to tell SLURM about the compute nodes. Near the end of the file, there should be an example entry for the compute node. Delete it, and add the following configurations for the cluster nodes:
NodeName=master NodeAddr=192.168.137.110 CPUs=4 State=UNKNOWN
NodeName=node1 NodeAddr=192.168.137.111 CPUs=4 State=UNKNOWN
NodeName=node2 NodeAddr=192.168.137.112 CPUs=4 State=UNKNOWN
Create a partition.
SLURM runs jobs on ‘partitions,’ or groups of nodes. We’ll create a default partition and add our 3 compute nodes to it. Be sure to delete the example partition in the file, then add the following on one line:
SLURM runs jobs on ‘partitions,’ or groups of nodes. We’ll create a default partition and add our 3 compute nodes to it. Be sure to delete the example partition in the file, then add the following on one line:
PartitionName=mycluster Nodes=node[1-2] Default=YES MaxTime=INFINITE State=UP
Configure cgroups Support
The latest update of SLURM brought integrated support for cgroups kernel isolation, which restricts access to system resources. We need to tell SLURM what resources to allow jobs to access. To do this, create the file
The latest update of SLURM brought integrated support for cgroups kernel isolation, which restricts access to system resources. We need to tell SLURM what resources to allow jobs to access. To do this, create the file
/etc/slurm-llnl/cgroup.conf
:CgroupMountpoint="/sys/fs/cgroup"
CgroupAutomount=yes
CgroupReleaseAgentDir="/etc/slurm-llnl/cgroup"
AllowedDevicesFile="/etc/slurm-llnl/cgroup_allowed_devices_file.conf"
ConstrainCores=no
TaskAffinity=no
ConstrainRAMSpace=yes
ConstrainSwapSpace=no
ConstrainDevices=no
AllowedRamSpace=100
AllowedSwapSpace=0
MaxRAMPercent=100
MaxSwapPercent=100
MinRAMSpace=30
Now, whitelist system devices by creating the file
/etc/slurm-llnl/cgroup_allowed_devices_file.conf
:/dev/null
/dev/urandom
/dev/zero
/dev/sda*
/dev/cpu/*/*
/dev/pts/*
/clusterfs*
Note that this configuration is pretty permissive, but for our purposes, this is okay. You could always tighten it up to suit your needs.
Copy the Configuration Files to Shared Storage
In order for the other nodes to be controlled by SLURM, they need to have the same configuration file, as well as the Munge key file. Copy those to shared storage to make them easier to access, like so:
sudo cp slurm.conf cgroup.conf cgroup_allowed_devices_file.conf /clusterfs
sudo cp /etc/munge/munge.key /clusterfs
A word about Munge:
Munge is the access system that SLURM uses to run commands and processes on the other nodes. Similar to key-based SSH, it uses a private key on all the nodes, then requests are timestamp-encrypted and sent to the node, which decrypts them using the identical key. This is why it is so important that the system times be in sync, and that they all have the
munge.key
file
Munge:
sudo systemctl enable sshsudo systemctl start ssh sudo systemctl enable munge
sudo systemctl start munge
The SLURM daemon:
sudo systemctl enable slurmd
sudo systemctl start slurmd
And the control daemon:
sudo systemctl enable slurmctld
sudo systemctl start slurmctld
Reboot. (optional)
This step is optional, but if you are having problems with Munge authentication, or your nodes can’t communicate with the SLURM controller, try rebooting it.
Configure the Compute Nodes
Install Required packages
sudo apt install slurmd slurm-client munge openssh-server net-tools vim -y
Copy the Configuration Files
We need to make sure that the configuration on the compute nodes matches the configuration on the master node exactly. So, copy it over from shared storage:
sudo cp /clusterfs/munge.key /etc/munge/munge.key
sudo cp /clusterfs/slurm.conf /etc/slurm-llnl/slurm.conf
sudo cp /clusterfs/cgroup* /etc/slurm-llnl
Munge! Important
We will test that the Munge key copied correctly and that the SLURM controller can successfully authenticate with the client nodes.
Enable and start Munge.
sudo systemctl enable munge
sudo systemctl start munge
Test Munge.
We can manually test Munge to see if it is communicating. Run the following to generate a key on the master node and try to have the client node decrypt it. (Run this on the client.)
We can manually test Munge to see if it is communicating. Run the following to generate a key on the master node and try to have the client node decrypt it. (Run this on the client.)
ssh root@node1 munge -n | unmunge
If it works, you should see something like this:
root@master ~> ssh node01 munge -n | unmungeroot@node01's password:
STATUS: Success (0)
ENCODE_HOST: node01
ENCODE_TIME: 2018-11-15 15:48:56 -0600 (1542318536)
DECODE_TIME: 2018-11-15 15:48:56 -0600 (1542318536)
TTL: 300
CIPHER: aes128 (4)
MAC: sha1 (3)
ZIP: none (0)
UID: pi
GID: pi
LENGTH: 0
If you get an error, make sure that the
/etc/munge/munge.key
file is the same across all the different nodes, then reboot them all and try again.Start the Daemons on Compute nodes
sudo systemctl enable sshsudo systemctl start ssh sudo systemctl enable slurmdsudo systemctl start slurmd sudo systemctl enable slurmd sudo systemctl start slurmd
Complete this configuration on each of the compute nodes.
Test SLURM
Now that we’ve configured the SLURM controller and each of the nodes, we can check to make sure that SLURM can see all of the nodes by running
sinfo
on the master node (a.k.a. “the login node”):PARTITION AVAIL TIMELIMIT NODES STATE NODELIST
mycluster* up infinite 2 idle node[1-2]
Now we can run a test job by telling SLURM to give us 3 nodes, and run the
hostname
command on each of them:srun --nodes=2 hostname
If all goes well, we should see something like:
node1
node2
To be continued.......
Further OpenMPI configurations will be updated in this topic. These guides are designed to be followed in a top-down sequential order. If you’re having problems with a command, feel free to leave a comment below with the exact step you are stuck on, and I’ll try to answer if I can.
No comments:
Post a Comment