Lustre Development
Git Configuration
After cloning the Git repo, replace .git/config
with the following:
[core]
repositoryformatversion = 0
filemode = true
bare = false
logallrefupdates = true
ignorecase = true
precomposeunicode = true
[remote "origin"]
url = ssh://carlsonc@es-gerrit.hpc.amslabs.hpecorp.net:29418/lustre-wc-rel
fetch = +refs/heads/*:refs/remotes/origin/*
fetch = +refs/dev/*:refs/remotes/origin/dev/*
[remote "gh"]
url = git@github.hpe.com:hpe/hpc-lus-filesystem.git
fetch = +refs/heads/*:refs/remotes/gh/*
[remote "wc"]
url = ssh://carlsonc@review.whamcloud.com:29418/fs/lustre-release
fetch = +refs/heads/*:refs/remotes/wc/*
[branch "master"]
remote = origin
merge = refs/heads/master
[branch "cray-2.15-int"]
remote = origin
merge = refs/heads/cray-2.15-int
[user]
email = caleb.carlson@hpe.com
Add in any branch
entries for tracking remote dev counterparts. This will let
you git pull
from the remote to your local version.
Rebasing a Development Branch
Occasionally the integration branch cray-2.15-int
branch gets rebased on the release branch cray-2.15
.
When things are cherry-picked into the cray-2.15
the commits get different commit hashes
than what’s in cray-2.15-int
. Also the order can change from what’s in cray-2.15-int
.
The history of the release branch totally changes from what’s in the integration branch.
When you try to rebase your dev branch, using git rebase -i origin/cray-2.15-int
(with the old cray-2.15-int
history) onto the new cray-2.15-int
, git will
"mostly" find the same commits and skip them, but it won’t catch all of them.
The ones it doesn’t catch will be lumped into a list with your actual new commits
on cray-2.15-int
. All you have to do is to not pick the commits you didn’t commit,
because they’re most likely already there in cray-2.15-int
, it’s just that git
didn’t find a match for them.
git checkout LUS-12345-foo
git fetch origin
git rebase -i origin/cray-2.15-int
Then, only pick
your commits, deleting the rest.
SSH Configuration
Make sure you have SSH entries configured for the different git remotes. Gerrit is a little outdated so you’ll have to use an older RSA key.
# GitHub
Host github.com
Hostname github.com
IdentityFile ~/.ssh/caleb_id_ecdsa
# Es-Gerrit
Host es-gerrit.hpc.amslabs.hpecorp.net
Hostname es-gerrit.hpc.amslabs.hpecorp.net
KexAlgorithms +diffie-hellman-group1-sha1
HostkeyAlgorithms +ssh-rsa
PubkeyAcceptedAlgorithms +ssh-rsa
User carlsonc
IdentityFile ~/.ssh/id_rsa
# Whamcloud
Host review.whamcloud.com
HostName review.whamcloud.com
User carlsonc
IdentityFile ~/.ssh/caleb_id_ecdsa
# HPE GitHub
Host github.hpe.com
Hostname github.hpe.com
IdentityFile ~/.ssh/caleb_id_ecdsa
Push dev branch to HPE GH remote
hornc@cassini-hosta:~/lustre-wc-rel> git branch dev/LUS-12345-test
hornc@cassini-hosta:~/lustre-wc-rel> git push gh dev/LUS-12345-test
Total 0 (delta 0), reused 0 (delta 0), pack-reused 0
remote:
remote: Create a pull request for 'dev/LUS-12345-test' on GitHub by visiting:
remote: https://github.hpe.com/hpe/hpc-lus-filesystem/pull/new/dev/LUS-12345-test
remote:
To github.hpe.com:hpe/hpc-lus-filesystem.git
* [new branch] dev/LUS-12345-test -> dev/LUS-12345-test
hornc@cassini-hosta:~/lustre-wc-rel>
Check out branch from another remote
Make sure you’ve fetched the latest versions of the remote branches first.
➜ lustre-wc-rel git:(cray-2.15-int) git checkout gh/release/uss-1.1
Note: switching to 'gh/release/uss-1.1'.
You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.
If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:
git switch -c <new-branch-name>
Or undo this operation with:
git switch -
Turn off this advice by setting config variable advice.detachedHead to false
HEAD is now at 4846b50d77 LUS-12377 dkms: mofed fallback to kABI
➜ lustre-wc-rel git:(4846b50d77) git checkout wc/master
Previous HEAD position was 4846b50d77 LUS-12377 dkms: mofed fallback to kABI
HEAD is now at 8b6719f1b3 LU-17887 obd: do not update obd_memory from RCU
Gathering Build Logs for Trivial Changes
Clone lustre-wc-rel
on a test system.
#!/bin/bash
set -ex
# Git settings
cd lustre-wc-rel
git fetch -p
git reset --hard HEAD
git checkout <branch>
git clean -dfx > /dev/null
git log --pretty=oneline | head -4
# Modify this for respective distro you're using
KERNEL_VERSION="5.14.21-150500.53"
ARCH="x86_64"
LINUX_DIR=$(ls -d /usr/src/linux-${KERNEL_VERSION})
LINUX_OBJ_DIR=$(ls -d /usr/src/linux-${KERNEL_VERSION}-obj/${ARCH}/default)
./LUSTRE-VERSION-GEN
# Modify this to include configure options for the build you're doing
sh ./autogen.sh
./configure \
--enable-client \
--disable-server \
--disable-gss-keyring \
--enable-gss="no" \
--enable-mpitests="no" \
--enable-ldap="no" \
--with-o2ib="/usr/src/ofa_kernel/default" \
--with-linux="$LINUX_DIR" \
--with-linux-obj="$LINUX_OBJ_DIR"
make rpms
rpm -q --requires lustre-client-2.15.3.*.x86_64.rpm | grep ldap
Then, run ./build.sh 2>&1 | tee build_<commit-id>.log
.
Building with rpmbuild
#!/bin/bash
function print_usage {
echo -e "\nUsage: ./build_lustre_client.sh <lustre_version> <kernel_version>"
echo -e "Example:\n\t./build_lustre_client.sh cray-2.15-int 5.14.21-150500.53"
}
function error {
echo "$@" 1>&2; exit 1
}
# Check args
[[ $# -ne 2 ]] && print_usage && exit 1
set -ex
LUSTRE_REFSPEC=$1
KERNEL_VERSION=$2
# Set architecture type, arm64 or x86_64. Default is x86_64.
ARCH="x86_64"
[[ $PLATFORM == "linux/arm64" ]] && ARCH="aarch64"
cd lustre-wc-rel
# git fetch --all --tags --prune && \
# git checkout ${LUSTRE_REFSPEC}
sh ./autogen.sh && ./configure --enable-dist || error "Unable to autogen and configure"
make lustre.spec lustre-dkms.spec dist Makefile || error "Unable to make dist and spec files"
# Find linux kernel source and linux kernel object source.
# On RHEL they're the same directory, but OpenSUSE and other
# distros they are usually different directories under /usr/src.
LINUX_DIR=$(ls -d /usr/src/linux-${KERNEL_VERSION})
LINUX_OBJ_DIR=$(ls -d /usr/src/linux-${KERNEL_VERSION}-obj/${ARCH}/default)
RPMBUILD_DIR="/tmp/work/rpmbuild"
# Create rpmbuild dir
rm -rf $RPMBUILD_DIR/
mkdir -p $RPMBUILD_DIR/SPECS $RPMBUILD_DIR/SOURCES
cp -v rpm/* lustre-*.tar.gz $RPMBUILD_DIR/SOURCES/
cp -v lustre.spec lustre-dkms.spec $RPMBUILD_DIR/SPECS
CONFIGURE_ARGS="'--disable-gss-keyring' '--enable-gss=no' '--enable-mpitests=no'"
[[ -n ${MOFED_VERSION} ]] && CONFIGURE_ARGS="${CONFIGURE_ARGS} '--with-o2ib=/usr/src/ofa_kernel/default'"
# Build the userspace, devel, iokit, debug, and kmod/kmp RPMs
rpmbuild \
--without mpi \
--without servers \
--without lustre_tests \
--without lustre_iokit \
--define "_topdir $RPMBUILD_DIR" \
--define "kobjdir $LINUX_OBJ_DIR" \
--define "kver $KERNEL_VERSION" \
--define "kversion $KERNEL_VERSION" \
--define "kdir $LINUX_DIR" \
--define "_with_lnet_dlc lnet_dlc" \
--define "configure_args $CONFIGURE_ARGS" \
-ba lustre.spec 2>&1 | tee /tmp/work/rpmbuild.log \
|| error "Failed to build lustre.spec"
Debugging
Using the Ring Buffer
Lustre stores debug statements in a ring buffer on the system. What goes into
this ring buffer is determined by the module parameter debug
.
Here’s an example default debug
parameter value:
mawenzi-06:~ # lctl get_param debug
debug=ioctl neterror warning error emerg ha config console lfsck
In the code, you’ll want to add CDEBUG
statements to print messages to the
ring buffer.
Here’s an example CDEBUG
message printed in the LNet code path:
CDEBUG(D_NET, "Allocate new FMR pool\n");
These messages won’t be present in the ring buffer by default unless you add
net
to the debug parameter:
lctl set_param debug=+net
You can then trigger some LNet activity by pinging another network interface over LNet.
lctl ping 192.168.0.103@o2ib
Then, dump the contents of the ring buffer to a file:
lctl dk > /tmp/dk.log
And see your message somewhere in the output file /tmp/dk.log
.
Testing LNet Dev Changes
You can’t hot swap the kernel modules. The old ones must be unloaded and new
ones loaded. Most likely, you won’t actually need the filesystem mounted; you’ll
just need LNet loaded and configured. You can skip dealing with RPMs by loading
the .ko
files out of the source tree after running make
. You can find the
paths of these built .ko
files by running the following:
mawenzi-06:~ # find lustre-wc-rel/ -name "*.ko"
lustre-wc-rel/libcfs/libcfs/libcfs.ko
lustre-wc-rel/lnet/klnds/o2iblnd/ko2iblnd.ko
lustre-wc-rel/lnet/klnds/socklnd/ksocklnd.ko
lustre-wc-rel/lnet/lnet/lnet.ko
lustre-wc-rel/lnet/selftest/lnet_selftest.ko
lustre-wc-rel/lustre/fid/fid.ko
lustre-wc-rel/lustre/fld/fld.ko
lustre-wc-rel/lustre/llite/lustre.ko
lustre-wc-rel/lustre/lmv/lmv.ko
lustre-wc-rel/lustre/lov/lov.ko
lustre-wc-rel/lustre/mdc/mdc.ko
lustre-wc-rel/lustre/mgc/mgc.ko
lustre-wc-rel/lustre/obdclass/llog_test.ko
lustre-wc-rel/lustre/obdclass/obdclass.ko
lustre-wc-rel/lustre/obdecho/obdecho.ko
lustre-wc-rel/lustre/osc/osc.ko
lustre-wc-rel/lustre/ptlrpc/ptlrpc.ko
lustre-wc-rel/lustre/tests/kernel/kinode.ko
Insert the LNet kernel modules from the local paths. This is for an o2ib net.
lustre="/root/lustre-wc-rel"
insmod $lustre/libcfs/libcfs/libcfs.ko
insmod $lustre/lnet/lnet/lnet.ko
insmod $lustre/lnet/klnds/o2iblnd/ko2iblnd.ko
If you’re changing userspace tools then you want to manipulate PATH
so that it
finds your built binaries/scripts first instead of the ones installed by
previous RPMs (unless you remove the rpms beforehand):
lustre="/root/lustre-wc-rel"
export PATH="$lustre/lustre/utils:$lustre/lnet/utils:$lustre/lustre/scripts:$PATH"
Here’s a script I set up to do all the above in one go:
#!/bin/bash
echo -e "Make sure you've checked out your latest changes with git and have run ./configure"
set -ex
# Uninstall the old stuff if it exists
for entry in $(mount -t lustre | awk '{print $3}'); do
echo "Unmounting $entry"
umount -t lustre $entry
done
which lustre_rmmod && lustre_rmmod
#zypper remove --no-confirm lustre-client lustre-client-dkms lustre-client-kmp-default
# Build the client utils/binaries and kernel objects (.ko)
lustre="/root/lustre-wc-rel"
cd $lustre
make -j 16
# Insert kernel modules, and configure bin/sbin tools
export PATH="$lustre/lustre/utils:$lustre/lnet/utils:$lustre/lustre/scripts:$PATH"
insmod $lustre/libcfs/libcfs/libcfs.ko
insmod $lustre/lnet/lnet/lnet.ko
insmod $lustre/lnet/klnds/o2iblnd/ko2iblnd.ko
insmod $lustre/lnet/klnds/socklnd/ksocklnd.ko
cp lustre-wc-rel/lustre/scripts/ksocklnd-config /usr/sbin
# Configure LNet
lnetctl lnet configure
lnetctl net add --net o2ib --if ib0
lnetctl net add --net tcp --if eth0
lctl network up
lnetctl net show
# Run debugging steps
lctl set_param debug=+net
lctl ping 192.168.0.103@o2ib
lctl ping 10.214.130.4@tcp
lctl dk > /tmp/dk.log
Note:
ksocklnd-config
script manipulates ip routes/rules. This can potentially break things in ClusterStor or Shasta because the IP network config is already defined a certain way. You can avoid calling it on net add by passing an option:
lnetctl net add --skip-mr-route-setup …
There’s also a kernel module parameter:
options ksocklnd skip_mr_route_setup=1
Configuring Kernel Module Parameters
You can pass the args to insmod
when inserting the module to establish module
parameter values:
cassini-hosta:/home/hornc/lustre-wc-rel # insmod libcfs/libcfs/libcfs.ko libcfs_debug=-1
cassini-hosta:/home/hornc/lustre-wc-rel # cat /sys/module/libcfs/parameters/libcfs_debug
-1
modprobe
, which uses insmod
under the hood, lets you set up .conf
files
for modules:
cassini-hosta:/home/hornc/lustre-wc-rel # cat /etc/modprobe.d/lustre.conf
options ksocklnd skip_mr_route_setup=1
options libcfs cpu_npartitions=8 cpu_pattern=""
options kkfilnd traffic_class=bulk_data
options lnet ip2nets="tcp(heth0) 172.18.2.[5-6]; tcp(enp137s0f0np0) 172.18.2.[7-8]"
options lnet lock_prim_nid=1
Otherwise, the source of these parameters all live under /sys/module/<module>
:
cassini-hosta:/home/hornc/lustre-wc-rel # cat /sys/module/lnet/parameters/lnet_transaction_timeout
50
cassini-hosta:/home/hornc/lustre-wc-rel # cat /sys/module/lnet/parameters/sock_timeout
0