Lustre Development

Git Configuration

After cloning the Git repo, replace .git/config with the following:

[core]
        repositoryformatversion = 0
        filemode = true
        bare = false
        logallrefupdates = true
        ignorecase = true
        precomposeunicode = true
[remote "origin"]
        url = ssh://carlsonc@es-gerrit.hpc.amslabs.hpecorp.net:29418/lustre-wc-rel
        fetch = +refs/heads/*:refs/remotes/origin/*
        fetch = +refs/dev/*:refs/remotes/origin/dev/*
[remote "gh"]
        url = git@github.hpe.com:hpe/hpc-lus-filesystem.git
        fetch = +refs/heads/*:refs/remotes/gh/*
[remote "wc"]
        url = ssh://carlsonc@review.whamcloud.com:29418/fs/lustre-release
        fetch = +refs/heads/*:refs/remotes/wc/*
[branch "master"]
        remote = origin
        merge = refs/heads/master
[branch "cray-2.15-int"]
        remote = origin
        merge = refs/heads/cray-2.15-int
[user]
        email = caleb.carlson@hpe.com

Add in any branch entries for tracking remote dev counterparts. This will let you git pull from the remote to your local version.

Rebasing a Development Branch

Occasionally the integration branch cray-2.15-int branch gets rebased on the release branch cray-2.15.

When things are cherry-picked into the cray-2.15 the commits get different commit hashes than what’s in cray-2.15-int. Also the order can change from what’s in cray-2.15-int.

The history of the release branch totally changes from what’s in the integration branch.

When you try to rebase your dev branch, using git rebase -i origin/cray-2.15-int (with the old cray-2.15-int history) onto the new cray-2.15-int, git will "mostly" find the same commits and skip them, but it won’t catch all of them. The ones it doesn’t catch will be lumped into a list with your actual new commits on cray-2.15-int. All you have to do is to not pick the commits you didn’t commit, because they’re most likely already there in cray-2.15-int, it’s just that git didn’t find a match for them.

git checkout LUS-12345-foo
git fetch origin
git rebase -i origin/cray-2.15-int

Then, only pick your commits, deleting the rest.

SSH Configuration

Make sure you have SSH entries configured for the different git remotes. Gerrit is a little outdated so you’ll have to use an older RSA key.

# GitHub
Host github.com
  Hostname github.com
  IdentityFile ~/.ssh/caleb_id_ecdsa

# Es-Gerrit
Host es-gerrit.hpc.amslabs.hpecorp.net
  Hostname es-gerrit.hpc.amslabs.hpecorp.net
  KexAlgorithms +diffie-hellman-group1-sha1
  HostkeyAlgorithms +ssh-rsa
  PubkeyAcceptedAlgorithms +ssh-rsa
  User carlsonc
  IdentityFile ~/.ssh/id_rsa

# Whamcloud
Host review.whamcloud.com
  HostName review.whamcloud.com
  User carlsonc
  IdentityFile ~/.ssh/caleb_id_ecdsa

# HPE GitHub
Host github.hpe.com
  Hostname github.hpe.com
  IdentityFile ~/.ssh/caleb_id_ecdsa

Push dev branch to HPE GH remote

hornc@cassini-hosta:~/lustre-wc-rel> git branch dev/LUS-12345-test
hornc@cassini-hosta:~/lustre-wc-rel> git push gh dev/LUS-12345-test
Total 0 (delta 0), reused 0 (delta 0), pack-reused 0
remote:
remote: Create a pull request for 'dev/LUS-12345-test' on GitHub by visiting:
remote:      https://github.hpe.com/hpe/hpc-lus-filesystem/pull/new/dev/LUS-12345-test
remote:
To github.hpe.com:hpe/hpc-lus-filesystem.git
 * [new branch]            dev/LUS-12345-test -> dev/LUS-12345-test
hornc@cassini-hosta:~/lustre-wc-rel>

Check out branch from another remote

Make sure you’ve fetched the latest versions of the remote branches first.

➜  lustre-wc-rel git:(cray-2.15-int) git checkout gh/release/uss-1.1
Note: switching to 'gh/release/uss-1.1'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at 4846b50d77 LUS-12377 dkms: mofed fallback to kABI
➜  lustre-wc-rel git:(4846b50d77) git checkout wc/master
Previous HEAD position was 4846b50d77 LUS-12377 dkms: mofed fallback to kABI
HEAD is now at 8b6719f1b3 LU-17887 obd: do not update obd_memory from RCU

Gathering Build Logs for Trivial Changes

Clone lustre-wc-rel on a test system.

#!/bin/bash

set -ex

# Git settings
cd lustre-wc-rel
git fetch -p
git reset --hard HEAD
git checkout <branch>
git clean -dfx > /dev/null
git log --pretty=oneline | head -4

# Modify this for respective distro you're using
KERNEL_VERSION="5.14.21-150500.53"
ARCH="x86_64"
LINUX_DIR=$(ls -d /usr/src/linux-${KERNEL_VERSION})
LINUX_OBJ_DIR=$(ls -d /usr/src/linux-${KERNEL_VERSION}-obj/${ARCH}/default)

./LUSTRE-VERSION-GEN

# Modify this to include configure options for the build you're doing
sh ./autogen.sh
./configure \
  --enable-client \
  --disable-server \
  --disable-gss-keyring \
  --enable-gss="no" \
  --enable-mpitests="no" \
  --enable-ldap="no" \
  --with-o2ib="/usr/src/ofa_kernel/default" \
  --with-linux="$LINUX_DIR" \
  --with-linux-obj="$LINUX_OBJ_DIR"

make rpms
rpm -q --requires lustre-client-2.15.3.*.x86_64.rpm | grep ldap

Then, run ./build.sh 2>&1 | tee build_<commit-id>.log.

Building with `rpmbuild`

#!/bin/bash

function print_usage {
  echo -e "\nUsage: ./build_lustre_client.sh <lustre_version> <kernel_version>"
  echo -e "Example:\n\t./build_lustre_client.sh cray-2.15-int 5.14.21-150500.53"
}

function error {
  echo "$@" 1>&2; exit 1
}

# Check args
[[ $# -ne 2 ]] && print_usage && exit 1

set -ex

LUSTRE_REFSPEC=$1
KERNEL_VERSION=$2

# Set architecture type, arm64 or x86_64. Default is x86_64.
ARCH="x86_64"
[[ $PLATFORM == "linux/arm64" ]] && ARCH="aarch64"

cd lustre-wc-rel
#  git fetch --all --tags --prune && \
#  git checkout ${LUSTRE_REFSPEC}

sh ./autogen.sh && ./configure --enable-dist || error "Unable to autogen and configure"
make lustre.spec lustre-dkms.spec dist Makefile || error "Unable to make dist and spec files"

# Find linux kernel source and linux kernel object source.
# On RHEL they're the same directory, but OpenSUSE and other
# distros they are usually different directories under /usr/src.
LINUX_DIR=$(ls -d /usr/src/linux-${KERNEL_VERSION})
LINUX_OBJ_DIR=$(ls -d /usr/src/linux-${KERNEL_VERSION}-obj/${ARCH}/default)
RPMBUILD_DIR="/tmp/work/rpmbuild"

# Create rpmbuild dir
rm -rf $RPMBUILD_DIR/
mkdir -p $RPMBUILD_DIR/SPECS $RPMBUILD_DIR/SOURCES
cp -v rpm/* lustre-*.tar.gz $RPMBUILD_DIR/SOURCES/
cp -v lustre.spec lustre-dkms.spec $RPMBUILD_DIR/SPECS

CONFIGURE_ARGS="'--disable-gss-keyring' '--enable-gss=no' '--enable-mpitests=no'"
[[ -n ${MOFED_VERSION} ]] && CONFIGURE_ARGS="${CONFIGURE_ARGS} '--with-o2ib=/usr/src/ofa_kernel/default'"

# Build the userspace, devel, iokit, debug, and kmod/kmp RPMs
rpmbuild \
  --without mpi \
  --without servers \
  --without lustre_tests \
  --without lustre_iokit \
  --define "_topdir $RPMBUILD_DIR" \
  --define "kobjdir $LINUX_OBJ_DIR" \
  --define "kver $KERNEL_VERSION" \
  --define "kversion $KERNEL_VERSION" \
  --define "kdir $LINUX_DIR" \
  --define "_with_lnet_dlc lnet_dlc" \
  --define "configure_args $CONFIGURE_ARGS" \
  -ba lustre.spec 2>&1 | tee /tmp/work/rpmbuild.log \
  || error "Failed to build lustre.spec"

Debugging

Using the Ring Buffer

Lustre stores debug statements in a ring buffer on the system. What goes into this ring buffer is determined by the module parameter debug.

Here’s an example default debug parameter value:

mawenzi-06:~ # lctl get_param debug
debug=ioctl neterror warning error emerg ha config console lfsck

In the code, you’ll want to add CDEBUG statements to print messages to the ring buffer.

Here’s an example CDEBUG message printed in the LNet code path:

CDEBUG(D_NET, "Allocate new FMR pool\n");

These messages won’t be present in the ring buffer by default unless you add net to the debug parameter:

lctl set_param debug=+net

You can then trigger some LNet activity by pinging another network interface over LNet.

lctl ping 192.168.0.103@o2ib

Then, dump the contents of the ring buffer to a file:

lctl dk > /tmp/dk.log

And see your message somewhere in the output file /tmp/dk.log.

Testing LNet Dev Changes

You can’t hot swap the kernel modules. The old ones must be unloaded and new ones loaded. Most likely, you won’t actually need the filesystem mounted; you’ll just need LNet loaded and configured. You can skip dealing with RPMs by loading the .ko files out of the source tree after running make. You can find the paths of these built .ko files by running the following:

mawenzi-06:~ # find lustre-wc-rel/ -name "*.ko"
lustre-wc-rel/libcfs/libcfs/libcfs.ko
lustre-wc-rel/lnet/klnds/o2iblnd/ko2iblnd.ko
lustre-wc-rel/lnet/klnds/socklnd/ksocklnd.ko
lustre-wc-rel/lnet/lnet/lnet.ko
lustre-wc-rel/lnet/selftest/lnet_selftest.ko
lustre-wc-rel/lustre/fid/fid.ko
lustre-wc-rel/lustre/fld/fld.ko
lustre-wc-rel/lustre/llite/lustre.ko
lustre-wc-rel/lustre/lmv/lmv.ko
lustre-wc-rel/lustre/lov/lov.ko
lustre-wc-rel/lustre/mdc/mdc.ko
lustre-wc-rel/lustre/mgc/mgc.ko
lustre-wc-rel/lustre/obdclass/llog_test.ko
lustre-wc-rel/lustre/obdclass/obdclass.ko
lustre-wc-rel/lustre/obdecho/obdecho.ko
lustre-wc-rel/lustre/osc/osc.ko
lustre-wc-rel/lustre/ptlrpc/ptlrpc.ko
lustre-wc-rel/lustre/tests/kernel/kinode.ko

Insert the LNet kernel modules from the local paths. This is for an o2ib net.

lustre="/root/lustre-wc-rel"
insmod $lustre/libcfs/libcfs/libcfs.ko
insmod $lustre/lnet/lnet/lnet.ko
insmod $lustre/lnet/klnds/o2iblnd/ko2iblnd.ko

If you’re changing userspace tools then you want to manipulate PATH so that it finds your built binaries/scripts first instead of the ones installed by previous RPMs (unless you remove the rpms beforehand):

lustre="/root/lustre-wc-rel"
export PATH="$lustre/lustre/utils:$lustre/lnet/utils:$lustre/lustre/scripts:$PATH"

Here’s a script I set up to do all the above in one go:

#!/bin/bash

echo -e "Make sure you've checked out your latest changes with git and have run ./configure"

set -ex

# Uninstall the old stuff if it exists
for entry in $(mount -t lustre | awk '{print $3}'); do
	echo "Unmounting $entry"
	umount -t lustre $entry
done
which lustre_rmmod && lustre_rmmod
#zypper remove --no-confirm lustre-client lustre-client-dkms lustre-client-kmp-default

# Build the client utils/binaries and kernel objects (.ko)
lustre="/root/lustre-wc-rel"
cd $lustre
make -j 16

# Insert kernel modules, and configure bin/sbin tools
export PATH="$lustre/lustre/utils:$lustre/lnet/utils:$lustre/lustre/scripts:$PATH"
insmod $lustre/libcfs/libcfs/libcfs.ko
insmod $lustre/lnet/lnet/lnet.ko
insmod $lustre/lnet/klnds/o2iblnd/ko2iblnd.ko
insmod $lustre/lnet/klnds/socklnd/ksocklnd.ko
cp lustre-wc-rel/lustre/scripts/ksocklnd-config /usr/sbin

# Configure LNet
lnetctl lnet configure
lnetctl net add --net o2ib --if ib0
lnetctl net add --net tcp --if eth0
lctl network up
lnetctl net show

# Run debugging steps
lctl set_param debug=+net
lctl ping 192.168.0.103@o2ib
lctl ping 10.214.130.4@tcp
lctl dk > /tmp/dk.log

Note: ksocklnd-config script manipulates ip routes/rules. This can potentially break things in ClusterStor or Shasta because the IP network config is already defined a certain way. You can avoid calling it on net add by passing an option:

lnetctl net add --skip-mr-route-setup …

There’s also a kernel module parameter:

options ksocklnd skip_mr_route_setup=1

Configuring Kernel Module Parameters

You can pass the args to insmod when inserting the module to establish module parameter values:

cassini-hosta:/home/hornc/lustre-wc-rel # insmod libcfs/libcfs/libcfs.ko libcfs_debug=-1
cassini-hosta:/home/hornc/lustre-wc-rel # cat /sys/module/libcfs/parameters/libcfs_debug
-1

modprobe, which uses insmod under the hood, lets you set up .conf files for modules:

cassini-hosta:/home/hornc/lustre-wc-rel # cat /etc/modprobe.d/lustre.conf
options ksocklnd  skip_mr_route_setup=1
options libcfs cpu_npartitions=8 cpu_pattern=""
options kkfilnd traffic_class=bulk_data
options lnet ip2nets="tcp(heth0) 172.18.2.[5-6]; tcp(enp137s0f0np0) 172.18.2.[7-8]"
options lnet lock_prim_nid=1

Otherwise, the source of these parameters all live under /sys/module/<module>:

cassini-hosta:/home/hornc/lustre-wc-rel # cat /sys/module/lnet/parameters/lnet_transaction_timeout
50
cassini-hosta:/home/hornc/lustre-wc-rel # cat /sys/module/lnet/parameters/sock_timeout
0

Running LNet Sanity Tests Locally

# After ./configure && make -j 16 ...
cd lustre-wc-rel/lustre/tests/
./auster -Nv sanity-lnet --only 232