Requirements

2x Linux Server as Compute Node
1 Linux Server as Master/Login Node
Debian 10 Buster Installed on all Servers

Making Linux Server Ready

Install Debian 10 Buster on each server and configure Network as Static by configuring /etc/network/interfaces

#In my case setting for master is as below

 auto ens32
iface ens32 inet static
        address 192.168.137.110
        netmask 255.255.255.0
        gateway 192.168.137.2
        dns-nameservers 192.168.137.2 8.8.8.8

Configure same for compute notes with different IP Addresses as I assigned below

master 192.168.137.110
node1 192.168.137.111
node2 192.168.137.112

Setting the hostname

Configure hostname for each server as below by editing /etc/hostname file

On Master

echo 'master' > /etc/hostname

On Node1

echo 'node1' > /etc/hostname

On node2

echo 'node2' > /etc/hostname

Add IP Addresses against hostnames in all of three servers by editing /etc/hosts file one by one

#127.0.0.1      localhost


192.168.137.110 master
192.168.137.111 node1
192.168.137.112 node2 
# The following lines are desirable for IPv6 capable hosts
#::1     localhost ip6-localhost ip6-loopback
#ff02::1 ip6-allnodes
#ff02::2 ip6-allrouters

Set correct timezone

It is Important in cluster configuration to have correct time for each node. Time synchronization must be same for all nodes.

To Set correct timezone for your server edit /etc/timezone file and add your timezone in it. In My case I have added as below Asia/Karachi.

After above configurations reboot servers.

Configurations of Master/Login node

Shared storage:

In order for a cluster to work well, a job should be able to be run on any of the nodes in the cluster. This means that each node needs to be able to access the same files.

SO we Have to configure NFS Server on master and mount it to all compute nodes.

(Note that this should be the same across all the nodes.) In my cluster, I used /clusterfs:

sudo mkdir /clusterfs
sudo chown nobody.nogroup -R /clusterfs
sudo chmod 777 -R /clusterfs

NFS server

Now, we need to export the /clusterfs as a network file system share so the other nodes can access it. Do this process on the master node.

Install the NFS server.

sudo apt install nfs-kernel-server -y

Edit /etc/exports and add the following line:

/clusterfs    <ip addr>(rw,sync,no_root_squash,no_subtree_check)

Replace <ip addr> with the IP address schema used on your local network. This will allow any LAN client to mount the share. For example, if your LAN addresses were 192.168.1.X, you would have:

/clusterfs 192.168.137.0/24(rw,sync,no_root_squash,no_subtree_check)

rw gives the client Read-Write access, sync forces changes to be written on each transaction, no_root_squash enables the root users of clients to write files as root permissions, and no_subtree_check prevents errors caused by a file being changed while another system is using it.

Lastly, run the following command to update the NFS kernel server:

sudo exportfs -a

Mount the NFS Share on the Clients

Now that we’ve got the NFS share exported from the master node, we want to mount it on all of the other nodes so they can access it. Repeat this process for all of the other nodes.

Install the NFS client.

sudo apt install nfs-common -y

Create the mount folder.
This should be the same directory that you mounted the flash drive to on the master node. In my case, this is /clusterfs:

sudo mkdir /clusterfs
sudo chown nobody.nogroup /clusterfs
sudo chmod -R 777 /clusterfs

Setup automatic mounting.
We want the NFS share to mount automatically when the nodes boot. Edit /etc/fstab to accomplish this by adding the following line:

192.168.137.110:/clusterfs    /clusterfs    nfs    defaults   0 0

Now mount it with sudo mount -a and you should be able to create a file in /clusterfs and have it show up at the same path across all the nodes.

Install packages required

apt install openssh-server vim net-tools slurm-wlm munge -y

SLURM Configuration

We’ll use the default SLURM configuration file as a base. Copy it over:

cd /etc/slurm-llnl
cp /usr/share/doc/slurm-client/examples/slurm.conf.simple.gz .
gzip -d slurm.conf.simple.gz
mv slurm.conf.simple slurm.conf

Then edit /etc/slurm-llnl/slurm.conf.

Set the control machine info.
Modify the first configuration line to include the hostname of the master node, and its IP address:

SlurmctldHost=node01(<ip addr of node01>)
# e.g.: node01(192.168.137.110)

Customize the scheduler algorithm.
SLURM can allocate resources to jobs in a number of different ways, but for our cluster we’ll use the “consumable resources” method. This basically means that each node has a consumable resource (in this case, CPU cores), and it allocates resources to jobs based on these resources. So, edit the SelectType field and provide parameters, like so:

SelectType=select/cons_res
SelectTypeParameters=CR_Core

Set the cluster name.This is somewhat superficial, but you can customize the cluster name in the “LOGGING AND ACCOUNTING” section:

ClusterName=Cluster101

Add the nodes.
Now we need to tell SLURM about the compute nodes. Near the end of the file, there should be an example entry for the compute node. Delete it, and add the following configurations for the cluster nodes:

NodeName=master NodeAddr=192.168.137.110 CPUs=4 State=UNKNOWN
NodeName=node1 NodeAddr=192.168.137.111 CPUs=4 State=UNKNOWN
NodeName=node2 NodeAddr=192.168.137.112 CPUs=4 State=UNKNOWN

Create a partition.
SLURM runs jobs on ‘partitions,’ or groups of nodes. We’ll create a default partition and add our 3 compute nodes to it. Be sure to delete the example partition in the file, then add the following on one line:

PartitionName=mycluster Nodes=node[1-2] Default=YES MaxTime=INFINITE State=UP

Configure cgroups Support
The latest update of SLURM brought integrated support for cgroups kernel isolation, which restricts access to system resources. We need to tell SLURM what resources to allow jobs to access. To do this, create the file /etc/slurm-llnl/cgroup.conf:

CgroupMountpoint="/sys/fs/cgroup"
CgroupAutomount=yes
CgroupReleaseAgentDir="/etc/slurm-llnl/cgroup"
AllowedDevicesFile="/etc/slurm-llnl/cgroup_allowed_devices_file.conf"
ConstrainCores=no
TaskAffinity=no
ConstrainRAMSpace=yes
ConstrainSwapSpace=no
ConstrainDevices=no
AllowedRamSpace=100
AllowedSwapSpace=0
MaxRAMPercent=100
MaxSwapPercent=100
MinRAMSpace=30

Now, whitelist system devices by creating the file /etc/slurm-llnl/cgroup_allowed_devices_file.conf:

/dev/null
/dev/urandom
/dev/zero
/dev/sda*
/dev/cpu/*/*
/dev/pts/*
/clusterfs*

Note that this configuration is pretty permissive, but for our purposes, this is okay. You could always tighten it up to suit your needs.

Copy the Configuration Files to Shared Storage

In order for the other nodes to be controlled by SLURM, they need to have the same configuration file, as well as the Munge key file. Copy those to shared storage to make them easier to access, like so:

sudo cp slurm.conf cgroup.conf cgroup_allowed_devices_file.conf /clusterfs
sudo cp /etc/munge/munge.key /clusterfs

A word about Munge:

Munge is the access system that SLURM uses to run commands and processes on the other nodes. Similar to key-based SSH, it uses a private key on all the nodes, then requests are timestamp-encrypted and sent to the node, which decrypts them using the identical key. This is why it is so important that the system times be in sync, and that they all have the munge.key file

Munge:

sudo systemctl enable sshsudo systemctl start ssh sudo systemctl enable munge
sudo systemctl start munge

The SLURM daemon:

sudo systemctl enable slurmd
sudo systemctl start slurmd

And the control daemon:

sudo systemctl enable slurmctld
sudo systemctl start slurmctld

Reboot. (optional)

This step is optional, but if you are having problems with Munge authentication, or your nodes can’t communicate with the SLURM controller, try rebooting it.

Configure the Compute Nodes

Install Required packages

sudo apt install slurmd slurm-client munge openssh-server net-tools vim -y

Copy the Configuration Files

We need to make sure that the configuration on the compute nodes matches the configuration on the master node exactly. So, copy it over from shared storage:

sudo cp /clusterfs/munge.key /etc/munge/munge.key
sudo cp /clusterfs/slurm.conf /etc/slurm-llnl/slurm.conf
sudo cp /clusterfs/cgroup* /etc/slurm-llnl

Munge! Important

We will test that the Munge key copied correctly and that the SLURM controller can successfully authenticate with the client nodes.

Enable and start Munge.

sudo systemctl enable munge
sudo systemctl start munge

Test Munge.
We can manually test Munge to see if it is communicating. Run the following to generate a key on the master node and try to have the client node decrypt it. (Run this on the client.)

ssh root@node1 munge -n | unmunge

If it works, you should see something like this:

root@master ~> ssh node01 munge -n | unmungeroot@node01's password: 
STATUS:           Success (0)
ENCODE_HOST:      node01
ENCODE_TIME:      2018-11-15 15:48:56 -0600 (1542318536)
DECODE_TIME:      2018-11-15 15:48:56 -0600 (1542318536)
TTL:              300
CIPHER:           aes128 (4)
MAC:              sha1 (3)
ZIP:              none (0)
UID:              pi
GID:              pi
LENGTH:           0

If you get an error, make sure that the /etc/munge/munge.key file is the same across all the different nodes, then reboot them all and try again.

Start the Daemons on Compute nodes

sudo systemctl enable sshsudo systemctl start ssh sudo systemctl enable slurmdsudo systemctl start slurmd sudo systemctl enable slurmd
sudo systemctl start slurmd

Complete this configuration on each of the compute nodes.

Test SLURM

Now that we’ve configured the SLURM controller and each of the nodes, we can check to make sure that SLURM can see all of the nodes by running sinfo on the master node (a.k.a. “the login node”):

PARTITION    AVAIL  TIMELIMIT  NODES  STATE NODELIST
mycluster*      up   infinite      2   idle node[1-2]

Now we can run a test job by telling SLURM to give us 3 nodes, and run the hostname command on each of them:

srun --nodes=2 hostname

If all goes well, we should see something like:

node1
node2

To be continued.......

Further OpenMPI configurations will be updated in this topic. These guides are designed to be followed in a top-down sequential order. If you’re having problems with a command, feel free to leave a comment below with the exact step you are stuck on, and I’ll try to answer if I can.

Discuss Computer

Wednesday, 11 December 2019

Hight Performance Computing -HPC Cluster -Linux Debian 10 Buster