Lustre Server

Installation, configuration, and management of a Lustre server cluster.

Terminology

  • MDT: Metadata Target

  • MDS: Metadata Server

  • MGT: Management Target

  • MGS: Management Service

  • OST: Object Storage Target

  • OSS: Object Storage Service

Installation

Here we’ll be using Rocky Linux 8.6 as a base.

First, set your HTTP/s proxy so you can reach the greater internet:

cat >> /etc/environment<< EOF
#Proxies for LR1
http_proxy="http://proxy.houston.hpecorp.net:8080/"
https_proxy="http://proxy.houston.hpecorp.net:8080/"
ftp_proxy="http://proxy.houston.hpecorp.net:8080/"
no_proxy="localhost,127.0.0.1,.us.cray.com,.americas.cray.com,.dev.cray.com,.eag.rdlabs.hpecorp.net"
EOF

Next, create a new repo file for the following repos:

  • Lustre server pieces

    • Patched kernel, kernel modules, and utilities built with MOFED Infiniband support

  • e2fsprogs for Lustre

    • Normally e2fsprogs is a set of utilities for maintaining ext2, ext3, and ext4 filesystems

    • Lustre has their own set built for managing Lustre filesystems

  • Lustre client pieces

    • Modules built with MOFED Infiniband support

cat >> /etc/yum.repos.d/lustre.repo<< EOF
[lustre-server]
name=rl8.6-ib - Lustre
baseurl=https://downloads.whamcloud.com/public/lustre/lustre-2.15.1-ib/MOFED-5.6-2.0.9.0/el8.6/server/
gpgcheck=0

[e2fsprogs]
name=rl8.6-ib - Ldiskfs
baseurl=https://downloads.whamcloud.com/public/e2fsprogs/latest/el8/
gpgcheck=0

[lustre-client]
name=rl8.6-ib - Lustre
baseurl=https://downloads.whamcloud.com/public/lustre/lustre-2.15.1-ib/MOFED-5.6-2.0.9.0/el8.6/client/
gpgcheck=0
EOF

Now that you’ve added these repos, install the following packages using dnf

  • epel: Extra packages for Enterprise Linux, needed as a dependency in Lustre install

dnf install epel-release e2fsprogs lustre -y

Reboot for changes to take effect, new patched kernel to be loaded, etc.

reboot

Load the IP over Infiniband (ipoib) module, allowing us to assign our Infiniband device an IP address.

modprobe ib_ipoib

Install Infiniband-compliant Subnet Manager opensm. The Infiniband switch we are connected to is an unmanaged switch, so the switch can’t be the Subnet Manager. So, somewhere on the Infiniband network we need a Subnet Manager to manage the network. This can be done on the first node, or on a dedicated management node.

If we had a managed Infiniband switch we could run the Subnet Manager there.

How to do this with a Managed Switch…​ TODO
dnf install -y opensm

Load the Infiniband Userspace MAD (Management Datagrams) module. This is needed for Open Subnet Manager in the following step.

modprobe ib_umad

Start the Infiniband-compliant Subnet Manager opensm systemd service

systemctl start opensm

Assign a static IP address to the ib0 device, and set the link state to UP

ip addr add 192.168.0.103/24 dev ib0
ip link set dev ib0 up

Load the LNET module

modprobe -v lnet

Load the lustre server modules

modprobe -v lustre
A better way to do this persistently is to set the following fields in /etc/sysconfig/network-scripts/ifcfg-ib0
ONBOOT=yes
BOOTPROTO=none
IPADDR=192.168.0.103
NETMASK=255.255.255.0

A prerequisite for this is to have the ib_ipoib module loaded, which can be done by adding an entry to /etc/modules-load.d/. While we’re here we can also add on-boot modprobing for lnet and lustre.

echo ib_ipoib > /etc/modules-load.d/ipoib.conf
echo lnet > /etc/modules-load.d/lnet.conf
echo lustre > /etc/modules-load.d/lustre.conf

Configure LNET, and add the ib0 physical interface as the o2ib network

lnetctl lnet configure
lnetctl net add --net o2ib --if ib0

Bring up the LNET network

lctl network up

At this point we should have the following modules loaded and visible via lsmod

[root@mawenzi-03 ~]# lsmod | grep -i mlx
mlx5_ib               454656  0
ib_uverbs             155648  1 mlx5_ib
ib_core               438272  8 rdma_cm,ib_ipoib,ko2iblnd,iw_cm,ib_umad,ib_uverbs,mlx5_ib,ib_cm
mlx5_core            1912832  1 mlx5_ib
mlxfw                  28672  1 mlx5_core
pci_hyperv_intf        16384  1 mlx5_core
tls                   102400  1 mlx5_core
psample                20480  1 mlx5_core
mlxdevm               180224  1 mlx5_core
mlx_compat             16384  11 rdma_cm,ib_ipoib,mlxdevm,ko2iblnd,iw_cm,ib_umad,ib_core,ib_uverbs,mlx5_ib,ib_cm,mlx5_core
[root@mawenzi-03 ~]# lsmod | grep -i ib
ko2iblnd              237568  1
rdma_cm               118784  1 ko2iblnd
lnet                  704512  7 osc,ko2iblnd,obdclass,ptlrpc,lmv,lustre
libcfs                266240  11 fld,lnet,osc,fid,ko2iblnd,obdclass,ptlrpc,lov,mdc,lmv,lustre
ib_umad                28672  6
ib_ipoib              155648  0
ib_cm                 114688  2 rdma_cm,ib_ipoib
nft_fib_inet           16384  1
nft_fib_ipv4           16384  1 nft_fib_inet
nft_fib_ipv6           16384  1 nft_fib_inet
nft_fib                16384  3 nft_fib_ipv6,nft_fib_ipv4,nft_fib_inet
nf_tables             180224  235 nft_ct,nft_reject_inet,nft_fib_ipv6,nft_fib_ipv4,nft_chain_nat,nf_tables_set,nft_reject,nft_fib,nft_fib_inet
libcrc32c              16384  4 nf_conntrack,nf_nat,nf_tables,xfs
mlx5_ib               454656  0
ib_uverbs             155648  1 mlx5_ib
ib_core               438272  8 rdma_cm,ib_ipoib,ko2iblnd,iw_cm,ib_umad,ib_uverbs,mlx5_ib,ib_cm
mlx5_core            1912832  1 mlx5_ib
mlx_compat             16384  11 rdma_cm,ib_ipoib,mlxdevm,ko2iblnd,iw_cm,ib_umad,ib_core,ib_uverbs,mlx5_ib,ib_cm,mlx5_core
[root@mawenzi-03 ~]# lsmod | grep -i lustre
lustre               1040384  0
lmv                   204800  1 lustre
mdc                   282624  1 lustre
lov                   344064  2 mdc,lustre
ptlrpc               2478080  7 fld,osc,fid,lov,mdc,lmv,lustre
obdclass             3624960  8 fld,osc,fid,ptlrpc,lov,mdc,lmv,lustre
lnet                  704512  7 osc,ko2iblnd,obdclass,ptlrpc,lmv,lustre
libcfs                266240  11 fld,lnet,osc,fid,ko2iblnd,obdclass,ptlrpc,lov,mdc,lmv,lustre

Lustre Filesystem Creation

Create MGT on /dev/sdb, make a directory under /mnt for it, then mount /dev/sdb to the directory.

mkfs.lustre --mgs /dev/sdb
mkdir /mnt/mgt
mount -t lustre /dev/sdb /mnt/mgt

Create MDT on /dev/sdc, make a directory under /mnt for it, then mount /dev/sdc to the directory.

mkfs.lustre --fsname=lustre --mgsnode=192.168.0.103@o2ib --mdt --index=0 /dev/sdc
mkdir /mnt/mdt
mount -t lustre /dev/sdc /mnt/mdt

Create OST on /dev/sdd, make a directory under /mnt for it, then mount /dev/sdd to the directory.

mkfs.lustre --fsname=lustre --ost --mgsnode=192.168.0.103@o2ib --index=0 /dev/sdd
mkdir /mnt/ost0
mount -t lustre /dev/sdd /mnt/ost0

Example: Setting Up Vanilla Rocky 9.2 Node as Lustre Server

In this example we’ll be setting up a node installed with Rocky Linux 9.2 (minimal) as a Lustre Server, everything built from scratch.

This example begins just after I’ve installed Rocky 9.2 (minimal) on the node, but before installing any dependencies or building anything.

Important: pay attention to the kernel that was installed, by default, using uname -r. In my case it was 5.14.0-284.11.1.el9_2.x86_64.

dnf Proxy Configurations

First thing to do is set up any dnf proxy information so our node can reach the internet from the lab. Replace /etc/dnf/dnf.conf with this file:

[main]
gpgcheck=1
installonly_limit=3
clean_requirements_on_remove=True
best=True
skip_if_unavailable=False
proxy=http://proxy.houston.hpecorp.net:8080

dnf/yum Repos

By default, Rocky 9.X repo URLs will point to the latest-and-greatest packages being hosted by Rocky. This presents a problem for trying to install kernel-related packages that match the "kickstart" kernel version we got out of the box from our 9.2 install. Let dnf find the right packages for our default install by adding a repofile /etc/yum.repos.d/Rocky-92-Development.repo:

[devel92]
name=Rocky Linux 9.2 - Devel (kickstart)
baseurl=https://dl.rockylinux.org/vault/rocky/9.2/devel/x86_64/kickstart/
gpgcheck=1
enabled=1
countme=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-Rocky-9

[extras92]
name=Rocky Linux 9.2 - Extras (kickstart)
baseurl=https://dl.rockylinux.org/vault/rocky/9.2/extras/x86_64/kickstart/
gpgcheck=1
enabled=1
countme=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-Rocky-9

[plus92]
name=Rocky Linux 9.2 - Plus (kickstart)
baseurl=https://dl.rockylinux.org/vault/rocky/9.2/plus/x86_64/kickstart/
gpgcheck=1
enabled=1
countme=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-Rocky-9

[baseos92]
name=Rocky Linux 9.2 - BaseOS (kickstart)
baseurl=https://dl.rockylinux.org/vault/rocky/9.2/BaseOS/x86_64/kickstart/
gpgcheck=1
enabled=1
countme=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-Rocky-9

[appstream92]
name=Rocky Linux 9.2 - AppStream (kickstart)
baseurl=https://dl.rockylinux.org/vault/rocky/9.2/AppStream/x86_64/kickstart/
gpgcheck=1
enabled=1
countme=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-Rocky-9

[baseos-debug92]
name=Rocky Linux 9.2 - BaseOS Debug (kickstart)
baseurl=https://dl.rockylinux.org/vault/rocky/9.2/BaseOS/x86_64/debug/tree/
gpgcheck=1
enabled=1
countme=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-Rocky-9

[crb92]
name=Rocky Linux 9.2 - CRB (kickstart)
baseurl=https://dl.rockylinux.org/vault/rocky/9.2/CRB/x86_64/kickstart/
gpgcheck=1
enabled=1
countme=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-Rocky-9

[epel92]
name=Rocky Linux 9.2 - Fedora EPEL
baseurl=https://dl.fedoraproject.org/pub/archive/epel/9.2/Everything/x86_64/
gpgcheck=0
enabled=1
countme=1

These point at the archived "kickstart" and "debug" repos that have packages for our default-installed kernel.

e2fsprogs

We’ll also need to install Whamcloud’s built version of e2fsprogs, this replaces the default e2fsprogs on the system with a version that adds more functionality for ldiskfs/ldiskfsprogs to work.

Important: You MUST install these packages if you want to build/install Lustre server packages with ldiskfs as a backend.

Create /etc/yum.repos.d/e2fsprogs.repo

[e2fsprogs]
name=Whamcloud e2fsprogs
baseurl="https://downloads.whamcloud.com/public/e2fsprogs/latest/el9/"
gpgcheck=0
enabled=1
countme=1

Install Dependencies

Hint: always search for multiple versions of a package before installing. Dnf likes to choose what it thinks is the best choice for you (usually the latest version), then hide the other choices. This might pose a problem if you want a specific version of a package, from a specific repo, like e2fsprogs-devel-1.47.1-wc1.el9.x86_64, from the @e2fsprogs repo we created before, but DNF picks the generic e2fsprogs-devel hosted by the @appstream repo when you do a dnf install e2fsprogs-devel.

You can show all versions of a package, along with the repos they come from, by using

dnf search --showduplicates --verbose e2fsprogs-devel

Example:

[root@mawenzi-01 ~]# dnf search e2fsprogs-devel --showduplicates  --verbose
Loaded plugins: builddep, changelog, config-manager, copr, debug, debuginfo-install, download, generate_completion_cache, groups-manager, needs-restarting, playground, repoclosure, repodiff, repograph, repomanage, reposync, system-upgrade
DNF version: 4.14.0
cachedir: /var/cache/dnf
Last metadata expiration check: 0:49:37 ago on Wed 28 Aug 2024 07:48:08 AM MDT.
================================================================== Name Exactly Matched: e2fsprogs-devel ==================================================================
e2fsprogs-devel-1.47.1-wc1.el9.x86_64 : Ext2/3/4 file system specific libraries and headers
Repo        : @System
Matched from:
Provide    : e2fsprogs-devel = 1.47.1-wc1.el9

e2fsprogs-devel-1.46.5-3.el9.x86_64 : Ext2/3/4 file system specific libraries and headers
Repo        : devel92
Matched from:
Provide    : e2fsprogs-devel = 1.46.5-3.el9

e2fsprogs-devel-1.46.5-3.el9.i686 : Ext2/3/4 file system specific libraries and headers
Repo        : appstream92
Matched from:
Provide    : e2fsprogs-devel = 1.46.5-3.el9

e2fsprogs-devel-1.46.5-3.el9.x86_64 : Ext2/3/4 file system specific libraries and headers
Repo        : appstream92
Matched from:
Provide    : e2fsprogs-devel = 1.46.5-3.el9

e2fsprogs-devel-1.47.1-wc1.el9.x86_64 : Ext2/3/4 file system specific libraries and headers
Repo        : e2fsprogs
Matched from:
Provide    : e2fsprogs-devel = 1.47.1-wc1.el9

e2fsprogs-devel-1.46.5-5.el9.i686 : Ext2/3/4 file system specific libraries and headers
Repo        : appstream
Matched from:
Provide    : e2fsprogs-devel = 1.46.5-5.el9

e2fsprogs-devel-1.46.5-5.el9.x86_64 : Ext2/3/4 file system specific libraries and headers
Repo        : appstream
Matched from:
Provide    : e2fsprogs-devel = 1.46.5-5.el9

With the proper repos in place, install the following dependencies, using:

  • KERNEL_VERSION="5.14.0-284.11.1.el9_2", and

  • E2FSPROGS_VERSION="1.47.1-wc1.el9.x86_64"

#!/bin/bash

E2FSPROGS_VERSION="1.47.1-wc1.el9.x86_64"
KERNEL_VERSION="5.14.0-284.11.1.el9_2"

dnf install -y \
    audit-libs-devel   \
    automake           \
    bc                 \
    binutils-devel     \
    createrepo         \
    dkms               \
    e2fsprogs-${E2FSPROGS_VERSION}	       \
    e2fsprogs-devel-${E2FSPROGS_VERSION}       \
    e2fsprogs-libs-${E2FSPROGS_VERSION}        \
    git                \
    gcc                \
    gcc-fortran        \
    kernel-abi-stablelists-${KERNEL_VERSION}.noarch     \
    kernel-core-${KERNEL_VERSION}.x86_64	        \
    kernel-devel-${KERNEL_VERSION}.x86_64	        \
    kernel-debug-devel-${KERNEL_VERSION}.x86_64	        \
    kernel-headers-${KERNEL_VERSION}.x86_64	        \
    kernel-modules-${KERNEL_VERSION}.x86_64		\
    kernel-modules-core-${KERNEL_VERSION}.x86_64	\
    kernel-modules-extra-${KERNEL_VERSION}.x86_64	\
    kernel-debuginfo-common-x86_64-${KERNEL_VERSION}.x86_64  \
    kernel-srpm-macros \
    kernel-rpm-macros  \
    lftp               \
    libaio-devel       \
    libattr-devel      \
    libblkid-devel     \
    libmount           \
    libmount-devel     \
    libnl3-devel       \
    libselinux-devel   \
    libssh-devel       \
    libtirpc-devel     \
    libtool            \
    libuuid-devel      \
    libyaml            \
    libyaml-devel      \
    llvm-toolset       \
    lsof	       \
    m4                 \
    ncurses-devel      \
    openldap-devel     \
    openssl-devel      \
    pciutils-devel     \
    perl               \
    perl-devel         \
    python39           \
    python3-devel      \
    python3-docutils   \
    redhat-lsb         \
    rpm-build          \
    texinfo            \
    texinfo-tex        \
    tk		       \
    tcsh	       \
    wget               \
    vim

Disable System Firewall

To make things easier for us laterwhen we’re trying to send IPoIB traffic between nodes, go ahead and disable the system firewall:

systemctl stop firewalld
systemctl disable firewalld
setenforce 0

Install MOFED

Once all these dependencies have installed, we’ll need to acquire and install MOFED on the system. In our example, we’re using MOFED-5.8-3.0.7.0, but this might be an old version by the time you’re reading this, so just install the latest version that’s been built for your OS.

Get the MOFED sources from here:

Choose your MOFED version, OS version, and download the .tgz file, i.e:

You’ll have to accept an EULA message but then SCP the .tgz from your laptop downloads to the target machine, and unpack it using tar -xzvf MLNX_OFED_LINUX-5.8-3.0.7.0-rhel9.2-x86_64.tgz

This will leave you with the MLNX_OFED_LINUX-5.8-3.0.7.0-rhel9.2-x86_64 directory.

Follow NVidia’s documentation for installing the MOFED packages, adding kernel support, etc here:

I was lucky in that the MOFED suite I downloaded was already built for the default kernel, but if it’s not, you’ll have to rebuild RPMs for the kernel you’re running. Here’s the options to use for that.

./mlnx_add_kernel_support.sh \
        --make-tgz \
        --tmpdir /tmp \
        --kernel "5.14.0-284.11.1.el9_2.x86_64" \
        --kernel-sources /usr/src/kernels/5.14.0-284.11.1.el9_2.x86_64/ \
        --mlnx_ofed /root/MLNX_OFED_LINUX-5.8-3.0.7.0-rhel9.2-x86_64
Example:
# ./mlnx_add_kernel_support.sh --make-tgz --tmpdir /tmp --kernel "5.14.0-284.11.1.el9_2.x86_64" --kernel-sources /usr/src/kernels/5.14.0-284.11.1.el9_2.x86_64/ --mlnx_ofed /root/MLNX_OFED_LINUX-5.8-3.0.7.0-rhel9.2-x86_64
Note: This program will create MLNX_OFED_LINUX TGZ for rhel9.2 under /tmp directory.
Do you want to continue?[y/N]:y
See log file /tmp/mlnx_iso.263699_logs/mlnx_ofed_iso.263699.log

Checking if all needed packages are installed...
Building MLNX_OFED_LINUX RPMS . Please wait...
Creating metadata-rpms for 5.14.0-284.11.1.el9_2.x86_64 ...
WARNING: If you are going to configure this package as a repository, then please note
WARNING: that it contains unsigned rpms, therefore, you need to disable the gpgcheck
WARNING: by setting 'gpgcheck=0' in the repository conf file.
Created /tmp/MLNX_OFED_LINUX-5.8-3.0.7.0-rhel9.2-x86_64-ext.tgz

Then, take the .tgz that was just created above and unpack it in your home directory. Check the .supported_kernels file in the unpacked directory.

# cat .supported_kernels
5.14.0-284.11.1.el9_2.x86_64

Now, add the path to the RPMS directory to a yum repofile /etc/yum.repos.d/mlnx_ofed.repo:

[mlnx_ofed]
name=MLNX_OFED Repository
baseurl=file:///root/test/MLNX_OFED_LINUX-5.8-3.0.7.0-rhel9.2-x86_64-ext/RPMS
enabled=1
gpgcheck=0

Then, install MOFED using dnf:

dnf install --nogpgcheck mlnx-ofed-all opensm mlnx-ofa_kernel-devel

You should now have /usr/src/ofa_kernel installed on the machine.

Load the ib_umad module:

modprobe ib_umad
modprobe mlx5_ib

If this machine will be running the Subnet Manager for the fabric, go ahead and systemctl start opensm if it’s a systemctl service, otherwise launch it as a daemon:

/etc/init.d/opensmd start

Now that OpenSM is started, go ahead and load the ipoib module:

modprobe ib_ipoib

Use NetworkManager CLI to set a static IP address on your ibs1 interface:

nmcli connection modify ibs1 ipv4.method manual ipv4.addresses 192.168.0.101/24
nmcli connection up ibs1
nmcli connection modify ibs1 connection.autoconnect yes

At this point, the ConnectX-6 cards on the system show the following for ip a:

7: ibs1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group default qlen 256
    link/infiniband 00:00:10:29:fe:80:00:00:00:00:00:00:94:40:c9:ff:ff:b3:4b:d0 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    altname ibp133s0
    inet 192.168.0.101/24 brd 192.168.0.255 scope global noprefixroute ibs1
       valid_lft forever preferred_lft forever
    inet6 fe80::9640:c9ff:ffb3:4bd0/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
8: ibs2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 4092 qdisc mq state DOWN group default qlen 256
    link/infiniband 00:00:10:29:fe:80:00:00:00:00:00:00:94:40:c9:ff:ff:b3:5b:4c brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    altname ibp3s0

and ibstat shows:

CA 'mlx5_0'
	CA type: MT4123
	Number of ports: 1
	Firmware version: 20.37.1700
	Hardware version: 0
	Node GUID: 0x9440c9ffffb34bd0
	System image GUID: 0x9440c9ffffb34bd0
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 100
		Base lid: 1
		LMC: 0
		SM lid: 1
		Capability mask: 0xa659e84a
		Port GUID: 0x9440c9ffffb34bd0
		Link layer: InfiniBand
CA 'mlx5_1'
	CA type: MT4123
	Number of ports: 1
	Firmware version: 20.37.1700
	Hardware version: 0
	Node GUID: 0x9440c9ffffb35b4c
	System image GUID: 0x9440c9ffffb35b4c
	Port 1:
		State: Down
		Physical state: Disabled
		Rate: 10
		Base lid: 65535
		LMC: 0
		SM lid: 0
		Capability mask: 0xa659e848
		Port GUID: 0x9440c9ffffb35b4c
		Link layer: InfiniBand

This concludes the MOFED/network layer configuration for the Lustre server.

Get ext4 Source from Kernel Sources

In order to build the Lustre server RPMs with ldiskfs, we’ll need the ext4 source in place. Currently, the kernel-devel packages put an incomplete fs/ext4/ directory in place without any sources, so we’ll need to get the full source from the kernel source RPM and extract it to the right spot.

Download the kernel source RPM for your target kernel. I had to get mine from a third-party website as Rocky was no longer hosting the .src.rpm in their archives. Install the .src.rpm, then replace the contents of /usr/src/kernels/5.14.0-284.11.1.el9_2.x86_64/fs/ext4 with the installed /root/rpmbuild/SOURCES/linux-5.14.0-284.11.1.el9_2/fs/ext4.

https_proxy=http://proxy.houston.hpecorp.net:8080 wget https://mirror.math.princeton.edu/pub/centos-stream/SIGs/9/kmods/source/kernels/kernel-5.14.0-284.11.1.el9_2.src.rpm
rpm -ivh kernel-5.14.0-284.11.1.el9_2.src.rpm
cd ~/rpmbuild/SOURCES
tar xJf linux-5.14.0-284.11.1.el9_2.tar.xz
cd /usr/src/kernels/5.14.0-284.11.1.el9_2.x86_64/fs
mv ext4/ ext4.orig
cp -r /root/rpmbuild/SOURCES/linux-5.14.0-284.11.1.el9_2/fs/ext4 .

Clone Lustre Repo

Here we’ll clone the lustre-wc-rel repo and check out the git refspec we want to build off of. Unless you want to set up SSH keys or other auth, just use HTTP to anonymously clone the git repo:

git clone http://es-gerrit.hpc.amslabs.hpecorp.net/lustre-wc-rel

Fetch/checkout the PR head you want to build:

cd lustre-wc-rel/
git fetch http://es-gerrit.hpc.amslabs.hpecorp.net/lustre-wc-rel refs/changes/31/162631/1 && git checkout FETCH_HEAD

Build Lustre Server RPMs

Run the following script, build_server_rpms.sh to build the server RPMs:

#!/bin/bash

# git clone lustre-wc-rel, check out whatever branch
cd lustre-wc-rel

# install build dependencies

# set build vars
KERNEL_VERSION=$(uname -r)
LINUX_OBJ_DIR=$(ls -d /usr/src/kernels/$KERNEL_VERSION)
LINUX_DIR=$(ls -d /usr/src/kernels/$KERNEL_VERSION)

# Configure autotools
sh autogen.sh

# configure
./configure \
  --enable-server \
  --disable-gss-keyring \
  --enable-gss="no" \
  --enable-mpitests="no" \
  --enable-ldiskfs \
  --with-o2ib="/usr/src/ofa_kernel/default/" \
  --with-linux="$LINUX_DIR" \
  --with-linux-obj="$LINUX_OBJ_DIR"

# make server rpms
make rpms

If everything completes successfully you’ll have the following RPMs built:

[root@mawenzi-01 ~]# ls lustre-wc-rel/*.rpm
lustre-wc-rel/kmod-lustre-2.15.1.2_cray_416_g3ab60c6-1.el9.x86_64.rpm
lustre-wc-rel/kmod-lustre-debuginfo-2.15.1.2_cray_416_g3ab60c6-1.el9.x86_64.rpm
lustre-wc-rel/kmod-lustre-osd-ldiskfs-2.15.1.2_cray_416_g3ab60c6-1.el9.x86_64.rpm
lustre-wc-rel/kmod-lustre-osd-ldiskfs-debuginfo-2.15.1.2_cray_416_g3ab60c6-1.el9.x86_64.rpm
lustre-wc-rel/kmod-lustre-tests-2.15.1.2_cray_416_g3ab60c6-1.el9.x86_64.rpm
lustre-wc-rel/kmod-lustre-tests-debuginfo-2.15.1.2_cray_416_g3ab60c6-1.el9.x86_64.rpm
lustre-wc-rel/lustre-2.15.1.2_cray_416_g3ab60c6-1.el9.x86_64.rpm
lustre-wc-rel/lustre-2.15.1.2_cray_416_g3ab60c6-1.src.rpm
lustre-wc-rel/lustre-debuginfo-2.15.1.2_cray_416_g3ab60c6-1.el9.x86_64.rpm
lustre-wc-rel/lustre-debugsource-2.15.1.2_cray_416_g3ab60c6-1.el9.x86_64.rpm
lustre-wc-rel/lustre-devel-2.15.1.2_cray_416_g3ab60c6-1.el9.x86_64.rpm
lustre-wc-rel/lustre-iokit-2.15.1.2_cray_416_g3ab60c6-1.el9.x86_64.rpm
lustre-wc-rel/lustre-osd-ldiskfs-mount-2.15.1.2_cray_416_g3ab60c6-1.el9.x86_64.rpm
lustre-wc-rel/lustre-osd-ldiskfs-mount-debuginfo-2.15.1.2_cray_416_g3ab60c6-1.el9.x86_64.rpm
lustre-wc-rel/lustre-resource-agents-2.15.1.2_cray_416_g3ab60c6-1.el9.x86_64.rpm
lustre-wc-rel/lustre-tests-2.15.1.2_cray_416_g3ab60c6-1.el9.x86_64.rpm
lustre-wc-rel/lustre-tests-debuginfo-2.15.1.2_cray_416_g3ab60c6-1.el9.x86_64.rpm

Install Server RPMs

Finally, install the server RPMs:

#!/bin/bash

cd lustre-wc-rel/
dnf install \
	kmod-lustre-2.15.1.2_cray_416_g3ab60c6-1.el9.x86_64.rpm \
	kmod-lustre-debuginfo-2.15.1.2_cray_416_g3ab60c6-1.el9.x86_64.rpm \
	kmod-lustre-osd-ldiskfs-2.15.1.2_cray_416_g3ab60c6-1.el9.x86_64.rpm \
	kmod-lustre-osd-ldiskfs-debuginfo-2.15.1.2_cray_416_g3ab60c6-1.el9.x86_64.rpm \
	kmod-lustre-tests-2.15.1.2_cray_416_g3ab60c6-1.el9.x86_64.rpm \
	kmod-lustre-tests-debuginfo-2.15.1.2_cray_416_g3ab60c6-1.el9.x86_64.rpm \
	lustre-2.15.1.2_cray_416_g3ab60c6-1.el9.x86_64.rpm \
	lustre-debuginfo-2.15.1.2_cray_416_g3ab60c6-1.el9.x86_64.rpm \
	lustre-debugsource-2.15.1.2_cray_416_g3ab60c6-1.el9.x86_64.rpm \
	lustre-devel-2.15.1.2_cray_416_g3ab60c6-1.el9.x86_64.rpm \
	lustre-iokit-2.15.1.2_cray_416_g3ab60c6-1.el9.x86_64.rpm \
	lustre-osd-ldiskfs-mount-2.15.1.2_cray_416_g3ab60c6-1.el9.x86_64.rpm \
	lustre-osd-ldiskfs-mount-debuginfo-2.15.1.2_cray_416_g3ab60c6-1.el9.x86_64.rpm \
	lustre-resource-agents-2.15.1.2_cray_416_g3ab60c6-1.el9.x86_64.rpm \
	lustre-tests-2.15.1.2_cray_416_g3ab60c6-1.el9.x86_64.rpm \
	lustre-tests-debuginfo-2.15.1.2_cray_416_g3ab60c6-1.el9.x86_64.rpm

Configure LNet

First, make sure the right modules reload if the server is rebooted.

echo ib_ipoib > /etc/modules-load.d/ib_ipoib.conf
echo lnet > /etc/modules-load.d/lnet.conf
echo lustre > /etc/modules-load.d/lustre.conf

Then load them manually now.

modprobe ib_ipoib
modprobe lustre
modprobe lnet

Configure LNet using the static IP address of the ibs1 device you assigned earlier, 192.168.0.101/24:

[root@mawenzi-01 ~]# lnetctl lnet configure
[root@mawenzi-01 ~]# lnetctl net add --net o2ib --if ibs1
[root@mawenzi-01 ~]# lctl network up
LNET configured
[root@mawenzi-01 ~]# lnetctl net show
net:
    - net type: lo
      local NI(s):
        - nid: 0@lo
          status: up
    - net type: o2ib
      local NI(s):
        - nid: 192.168.0.101@o2ib
          status: up
          interfaces:
              0: ibs1

Make the Lustre Filesystem

Context matters for this; in our case, this is our disk layout:

[root@mawenzi-01 ~]# lsblk
NAME                    MAJ:MIN RM   SIZE RO TYPE MOUNTPOINTS
sda                       8:0    0   1.6T  0 disk
├─sda1                    8:1    0   600M  0 part /boot/efi
├─sda2                    8:2    0     1G  0 part /boot
└─sda3                    8:3    0   1.6T  0 part
  ├─rl_mawenzi--01-root 253:0    0    70G  0 lvm  /
  ├─rl_mawenzi--01-swap 253:1    0     4G  0 lvm  [SWAP]
  └─rl_mawenzi--01-home 253:2    0   1.6T  0 lvm  /home
sdb                       8:16   0 372.6G  0 disk
sdc                       8:32   0   1.5T  0 disk
sdd                       8:48   0   1.5T  0 disk

/dev/sda being the OS disk, and we want /dev/sdb to be our combined MGT/MDT, while /sdc/sdd can be OSTs. These look like spinning disks but are actually SAS SSDs.

Create the combined MDT/MGT on /dev/sdb:

mkfs.lustre --fsname=<fs_name> --index=0 --mgs --mdt /dev/sdb
[root@mawenzi-01 ~]# mkfs.lustre --fsname=testfs --index=0 --mgs --mdt /dev/sdb

   Permanent disk data:
Target:     testfs:MDT0000
Index:      0
Lustre FS:  testfs
Mount type: ldiskfs
Flags:      0x65
              (MDT MGS first_time update )
Persistent mount opts: user_xattr,errors=remount-ro
Parameters:

checking for existing Lustre data: not found
device size = 381554MB
formatting backing filesystem ldiskfs on /dev/sdb
	target name   testfs:MDT0000
	kilobytes     390711384
	options        -J size=4096 -I 1024 -i 2560 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,project,huge_file,ea_inode,large_dir,^fast_commit,flex_bg -E lazy_journal_init="0",lazy_itable_init="0" -F
mkfs_cmd = mke2fs -j -b 4096 -L testfs:MDT0000  -J size=4096 -I 1024 -i 2560 -q -O dirdata,uninit_bg,^extents,dir_nlink,quota,project,huge_file,ea_inode,large_dir,^fast_commit,flex_bg -E lazy_journal_init="0",lazy_itable_init="0" -F /dev/sdb 390711384k
Writing CONFIGS/mountdata
This may take up to around 10 minutes to complete. OST formatting is faster.

Make a directory and mount the MDT. Also, set identity provider to NONE.

mkdir /mnt/mdt
mount -t lustre /dev/sdb /mnt/mdt/
lctl set_param mdt.*.identity_upcall=NONE

Reformat/create an OST on /dev/sdc, using the MGS NID of our server:

[root@mawenzi-01 ~]# mkfs.lustre --reformat --index=0 --fsname=testfs --ost --mgsnode=192.168.0.101@o2ib /dev/sdc

   Permanent disk data:
Target:     testfs:OST0000
Index:      0
Lustre FS:  testfs
Mount type: ldiskfs
Flags:      0x62
              (OST first_time update )
Persistent mount opts: ,errors=remount-ro
Parameters: mgsnode=192.168.0.101@o2ib

device size = 1526185MB
formatting backing filesystem ldiskfs on /dev/sdc
	target name   testfs:OST0000
	kilobytes     1562813784
	options        -J size=1024 -I 512 -i 262144 -q -O extents,uninit_bg,dir_nlink,quota,project,huge_file,^fast_commit,flex_bg -G 256 -E resize="4290772992",lazy_journal_init="0",lazy_itable_init="0" -F
mkfs_cmd = mke2fs -j -b 4096 -L testfs:OST0000  -J size=1024 -I 512 -i 262144 -q -O extents,uninit_bg,dir_nlink,quota,project,huge_file,^fast_commit,flex_bg -G 256 -E resize="4290772992",lazy_journal_init="0",lazy_itable_init="0" -F /dev/sdc 1562813784k
Writing CONFIGS/mountdata

Mount the OST directory to /mnt/ost

mkdir -p /mnt/ost
mount -t lustre /dev/sdc /mnt/ost/

Verify the filesystem creation by creating a client mountpoint and mounting the FS there.

[root@mawenzi-01 ~]# mkdir /mnt/testfs
[root@mawenzi-01 ~]# mount -t lustre 192.168.0.101@o2ib:/testfs /mnt/testfs
[root@mawenzi-01 ~]# mount -t lustre
/dev/sdb on /mnt/mdt type lustre (ro,svname=testfs-MDT0000,mgs,osd=osd-ldiskfs,user_xattr,errors=remount-ro)
/dev/sdc on /mnt/ost type lustre (ro,svname=testfs-OST0000,mgsnode=192.168.0.101@o2ib,osd=osd-ldiskfs,errors=remount-ro)
192.168.0.101@o2ib:/testfs on /mnt/testfs type lustre (rw,seclabel,checksum,flock,nouser_xattr,lruresize,lazystatfs,nouser_fid2path,verbose,encrypt)

This concludes the Lustre Server installation for Rocky Linux 9.2. To build and install a client on a separate node that mounts this filesystem over the network, see my Lustre Client doc.