InfiniBand

Details and notes for IB management, commands, notes, configuration, and installation.

Subnet Manager

The Subnet Manager is an entity running either on a managed switch, node, or somewhere in the fabric, and is responsible for discovering and configuring all the InfiniBand fabric devices to enable traffic flow between those devices.

OpenSM

opensm is an InfiniBand compliant Subnet Manager and Subnet Administrator that runs on top of the Mellanox OFED stack.

Installing opensm

In order for opensm to function, you need to first enable ib_umad so opensm can use Userspace MADs:

modprobe ib_umad

Then, install opensm via dnf:

dnf install -y opensm

And start the daemon via systemctl:

systemctl start opensm

Verify it’s running:

[root@mawenzi-03 ~]# systemctl status opensm
● opensmd.service - LSB: Activates/Deactivates InfiniBand Subnet Manager
   Loaded: loaded (/etc/rc.d/init.d/opensmd; generated)
   Active: active (running) since Fri 2023-06-30 20:28:27 MDT; 6s ago
     Docs: man:systemd-sysv-generator(8)
  Process: 16797 ExecStart=/etc/rc.d/init.d/opensmd start (code=exited, status=0/SUCCESS)
 Main PID: 16806 (opensm)
    Tasks: 144 (limit: 821428)
   Memory: 15.6M
   CGroup: /system.slice/opensmd.service
           ├─16806 /usr/sbin/opensm --daemon
           └─16809 osm_crashd

Jun 30 20:28:26 mawenzi-03 systemd[1]: Starting LSB: Activates/Deactivates InfiniBand Subnet Manager...
Jun 30 20:28:26 mawenzi-03 OpenSM[16806]: /var/log/opensm.log log file opened
Jun 30 20:28:26 mawenzi-03 OpenSM[16806]: OpenSM 5.11.0.MLNX20220418.fd3d650
Jun 30 20:28:26 mawenzi-03 OpenSM[16806]: Entering DISCOVERING state
Jun 30 20:28:26 mawenzi-03 OpenSM[16806]: Entering STANDBY state
Jun 30 20:28:27 mawenzi-03 opensmd[16797]: Starting IB Subnet Manager.
Jun 30 20:28:27 mawenzi-03 opensmd[16952]: Starting IB Subnet Manager.
Jun 30 20:28:27 mawenzi-03 opensmd[16952]: hich: no ibdiagm.sh in (/sbi
Jun 30 20:28:27 mawenzi-03 opensmd[16797]: hich: no
Jun 30 20:28:27 mawenzi-03 systemd[1]: Started LSB: Activates/Deactivates InfiniBand Subnet Manager.

Mellanox OFED (MOFED)

MOFED Utilities

  • mlxfwmanager: Firmware Update and Query Tool

  • mlxlink: The mlxlink tool is used to check and debug link status and issues related to them. The tool can be used on different links and cables (passive, active, transceiver and backplane).

  • mlxconfig: Allows the user to change some of the device configurations without having to create and burn a new firmware

  • mlxfwreset: The tool provides the following functionality in order to load new firmware:

    1. Query the device for the supported reset-level and reset-type

    2. Perform reset operation on the device

  • mlxcables: Mellanox Cables Tool

MOFED Kernel Module Relationships

This describes the various modules of MLNX_OFED relations with the other Linux Kernel modules.

MOFED Installation

In this example we’ll be installing MOFED for Rocky Linux 8.6.

Go to Nvidia Mellanox Download center, and download the .iso for Rocky 8.6:

MOFED Download Center

Once you have it on your target system, mount it to /mnt

mount -o ro,loop MLNX_OFED_LINUX-5.4-3.7.5.0-rhel8.6-x86_64.iso /mnt

Then, gather all the needed dependencies for the install script

dnf install perl gcc-gfortran python36 tk lsof tcl tcsh pkgconf-pkg-config pciutils

Finally, run the install script itself

/mnt/mlnxofedinstall

Once that has finished installing, restart the openibd service

/etc/init.d/openibd restart

Verifying Installation

Install the Infiniband Diagnostics utility package

dnf install infiniband-diags

Make sure all the right modules are loaded with lsmod

[root@mawenzi-06 ~]# lsmod | grep -P "(ib_|_ib|mlx|rdma)"
rdma_ucm               32768  0
rdma_cm               118784  1 rdma_ucm
iw_cm                  53248  1 rdma_cm
ib_ipoib              151552  0
ib_cm                  57344  2 rdma_cm,ib_ipoib
ib_umad                28672  0
mlx5_ib               430080  0
mlx5_core            1789952  1 mlx5_ib
mlxdevm               176128  1 mlx5_core
ib_uverbs             151552  2 rdma_ucm,mlx5_ib
ib_core               421888  8 rdma_cm,ib_ipoib,iw_cm,ib_umad,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm
mlx_compat             16384  11 rdma_cm,ib_ipoib,mlxdevm,iw_cm,ib_umad,ib_core,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm,mlx5_core
psample                20480  1 mlx5_core
mlxfw                  28672  1 mlx5_core
tls                   102400  1 mlx5_core
pci_hyperv_intf        16384  1 mlx5_core
nft_fib_inet           16384  1
nft_fib_ipv4           16384  1 nft_fib_inet
nft_fib_ipv6           16384  1 nft_fib_inet
nft_fib                16384  3 nft_fib_ipv6,nft_fib_ipv4,nft_fib_inet
nf_tables             180224  235 nft_ct,nft_reject_inet,nft_fib_ipv6,nft_fib_ipv4,nft_chain_nat,nf_tables_set,nft_reject,nft_fib,nft_fib_inet

Run ibstat to view local card info

[root@mawenzi-06 ~]# ibstat
CA 'mlx5_0'
	CA type: MT4123
	Number of ports: 1
	Firmware version: 20.35.2000
	Hardware version: 0
	Node GUID: 0x9440c9ffffb33b60
	System image GUID: 0x9440c9ffffb33b60
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 100
		Base lid: 8
		LMC: 0
		SM lid: 1
		Capability mask: 0xa659e848
		Port GUID: 0x9440c9ffffb33b60
		Link layer: InfiniBand
CA 'mlx5_1'
	CA type: MT4123
	Number of ports: 1
	Firmware version: 20.35.2000
	Hardware version: 0
	Node GUID: 0x9440c9ffff88dd98
	System image GUID: 0x9440c9ffff88dd98
	Port 1:
		State: Down
		Physical state: Disabled
		Rate: 10
		Base lid: 65535
		LMC: 0
		SM lid: 0
		Capability mask: 0xa659e848
		Port GUID: 0x9440c9ffff88dd98
		Link layer: InfiniBand

Here we can see 2 single-port CX-6 cards, one that’s disconnected (mlx5_1) and doesn’t have anything plugged in, and one that is fully connected (mlx5_0) to the InfiniBand switch. We can also see the Local ID (LID) of the port, 8, and the Subnet Manager (SM) LID of 1.

Next, we can run iblinkinfo to view information about the whole InfiniBand fabric. Note our own node, mawenzi-06, at the bottom.

[root@mawenzi-06 ~]# iblinkinfo
CA: mawenzi-05 mlx5_0:
      0x9440c9ffffb33bdc      7    1[  ] ==( 4X      25.78125 Gbps Active/  LinkUp)==>       3    9[  ] "SwitchIB Mellanox Technologies" ( )
CA: mawenzi-07 mlx5_0:
      0x9440c9ffffb32bd4      6    1[  ] ==( 4X      25.78125 Gbps Active/  LinkUp)==>       3   13[  ] "SwitchIB Mellanox Technologies" ( )
CA: mawenzi-01 mlx5_0:
      0x9440c9ffffb34bd0      1    1[  ] ==( 4X      25.78125 Gbps Active/  LinkUp)==>       3    1[  ] "SwitchIB Mellanox Technologies" ( )
CA: mawenzi-04 mlx5_0:
      0x9440c9ffffb31bc4      5    1[  ] ==( 4X      25.78125 Gbps Active/  LinkUp)==>       3    7[  ] "SwitchIB Mellanox Technologies" ( )
CA: mawenzi-03 mlx5_0:
      0x9440c9ffffb35b44      2    1[  ] ==( 4X      25.78125 Gbps Active/  LinkUp)==>       3    5[  ] "SwitchIB Mellanox Technologies" ( )
CA: mawenzi-02 mlx5_0:
      0x9440c9ffffb34bf4      4    1[  ] ==( 4X      25.78125 Gbps Active/  LinkUp)==>       3    3[  ] "SwitchIB Mellanox Technologies" ( )
Switch: 0x248a07030074dd50 SwitchIB Mellanox Technologies:
           3    1[  ] ==( 4X      25.78125 Gbps Active/  LinkUp)==>       1    1[  ] "mawenzi-01 mlx5_0" ( )
           3    2[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3    3[  ] ==( 4X      25.78125 Gbps Active/  LinkUp)==>       4    1[  ] "mawenzi-02 mlx5_0" ( )
           3    4[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3    5[  ] ==( 4X      25.78125 Gbps Active/  LinkUp)==>       2    1[  ] "mawenzi-03 mlx5_0" ( )
           3    6[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3    7[  ] ==( 4X      25.78125 Gbps Active/  LinkUp)==>       5    1[  ] "mawenzi-04 mlx5_0" ( )
           3    8[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3    9[  ] ==( 4X      25.78125 Gbps Active/  LinkUp)==>       7    1[  ] "mawenzi-05 mlx5_0" ( )
           3   10[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   11[  ] ==( 4X      25.78125 Gbps Active/  LinkUp)==>       8    1[  ] "mawenzi-06 HCA-1" ( )
           3   12[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   13[  ] ==( 4X      25.78125 Gbps Active/  LinkUp)==>       6    1[  ] "mawenzi-07 mlx5_0" ( )
           3   14[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   15[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   16[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   17[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   18[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   19[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   20[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   21[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   22[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   23[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   24[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   25[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   26[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   27[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   28[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   29[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   30[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   31[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   32[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   33[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   34[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   35[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   36[  ] ==(                Down/ Polling)==>             [  ] "" ( )
CA: mawenzi-06 HCA-1:
      0x9440c9ffffb33b60      8    1[  ] ==( 4X      25.78125 Gbps Active/  LinkUp)==>       3   11[  ] "SwitchIB Mellanox Technologies" ( )

Refer to most InfiniBand utilities or MOFED utilities for other diagnostic utilities.

Card Configuration

Here we’ll be using a Mellanox ConnectX-6 card for this set of examples. Make sure that you’ve installed MOFED and have loaded all the required modules.

Enable Card on Boot

Rocky Linux 8.6

For Rocky 8.6, we’ll be using the network-scripts ifcfg configuration file to persist card configuration.

Edit /etc/sysconfig/network-scripts/ifcfg-ib0, enabling ONBOOT and disabling DHCP as boot protocol

sed -i -e 's/ONBOOT=no/ONBOOT=yes/g' -e 's/BOOTPROTO=dhcp/BOOTPROTO=none/g' /etc/sysconfig/network-scripts/ifcfg-ib0

Now, reboot the node.

Rocky Linux 9.1

For Rocky 9.X onwards, everything is done using the newer NetworkManager system. You can still convert your old ifcfg files to the new format, by using nmcli connection migrate.

Update Firmware

Find PCI ID using lspci:

[root@mawenzi-06 ~]# lspci | grep Mellanox
03:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]
87:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]

The 03:00.0 and 87:00.0 are the PCI device names of the two cards we have on the system.

HPE-Branded Firmware Updates

Check if the cards are HPE-branded, using lspci in verbose mode with selected device. Under Vital Product Data, note the entry: Product Name: HPE InfiniBand HDR/Ethernet 200Gb 1-port MCX653105A-HDAT QSFP56 x16 Adapter. This means that we can’t do a firmware update using generic files downloaded from Mellanox website; instead we’ll have to use ones from HPE. Use the product info to find the right fabric firmware image here:

Ctrl+F for the Part number: P24250-001 that comes from the following lspci output:

[root@mawenzi-04 ~]# lspci -vv -s 85:00.0
85:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]
	Subsystem: Mellanox Technologies Device 0068
	Physical Slot: 1
	Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
	Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
	Latency: 0, Cache Line Size: 64 bytes
	Interrupt: pin A routed to IRQ 157
	NUMA node: 0
	IOMMU group: 28
	Region 0: Memory at ac000000 (64-bit, prefetchable) [size=32M]
	Expansion ROM at ab400000 [virtual] [disabled] [size=1M]
	Capabilities: [60] Express (v2) Endpoint, MSI 00
		DevCap:	MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
			ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75.000W
		DevCtl:	CorrErr- NonFatalErr+ FatalErr+ UnsupReq-
			RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
			MaxPayload 512 bytes, MaxReadReq 4096 bytes
		DevSta:	CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
		LnkCap:	Port #0, Speed 16GT/s, Width x16, ASPM not supported
			ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
		LnkCtl:	ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
			ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
		LnkSta:	Speed 16GT/s (ok), Width x16 (ok)
			TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
		DevCap2: Completion Timeout: Range ABC, TimeoutDis+ NROPrPrP- LTR-
			 10BitTagComp+ 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
			 EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
			 FRS- TPHComp- ExtTPHComp-
			 AtomicOpsCap: 32bit- 64bit- 128bitCAS-
		DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
			 AtomicOpsCtl: ReqEn+
		LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
		LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
			 Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
			 Compliance De-emphasis: -6dB
		LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
			 EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
			 Retimer- 2Retimers- CrosslinkRes: unsupported
	Capabilities: [48] Vital Product Data
		Product Name: HPE InfiniBand HDR/Ethernet 200Gb 1-port MCX653105A-HDAT QSFP56 x16 Adapter
		Read-only fields:
			[PN] Part number: P24250-001
			[EC] Engineering changes: A5
			[V2] Vendor specific: P24250-001
			[SN] Serial number: IL203002KT
			[V3] Vendor specific: 60c190dc0ccdea1180009440c9b31bc4
			[VA] Vendor specific: MLX:MN=MLNX:CSKU=V2:UUID=V3:PCI=V0:MODL=CX653105A
			[V0] Vendor specific: PCIeGen4 x16
			[VU] Vendor specific: IL203002KTMLNXS0D0F0
			[RV] Reserved: checksum good, 1 byte(s) reserved
		End
	Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-
		Vector table: BAR=0 offset=00002000
		PBA: BAR=0 offset=00003000
	Capabilities: [c0] Vendor Specific Information: Len=18 <?>
	Capabilities: [40] Power Management version 3
		Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold+)
		Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
	Capabilities: [100 v1] Advanced Error Reporting
		UESta:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UEMsk:	DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
		UESvrt:	DLP- SDES- TLP+ FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC+ UnsupReq- ACSViol-
		CESta:	RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
		CEMsk:	RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ AdvNonFatalErr+
		AERCap:	First Error Pointer: 08, ECRCGenCap+ ECRCGenEn+ ECRCChkCap+ ECRCChkEn+
			MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
		HeaderLog: 00000000 00000000 00000000 00000000
	Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
		ARICap:	MFVC- ACS-, Next Function: 0
		ARICtl:	MFVC- ACS-, Function Group: 0
	Capabilities: [1c0 v1] Secondary PCI Express
		LnkCtl3: LnkEquIntrruptEn- PerformEqu-
		LaneErrStat: 0
	Capabilities: [320 v1] Lane Margining at the Receiver <?>
	Capabilities: [370 v1] Physical Layer 16.0 GT/s <?>
	Capabilities: [420 v1] Data Link Feature <?>
	Kernel driver in use: mlx5_core
	Kernel modules: mlx5_core

Go to the Firmware page, track down the latest GA directory, and get the .bin firmware file. Example. Once you have a file like fw-ConnectX6-rel-20_37_1700-MCX653105A-HDA_HPE_Ax-UEFI-14.30.13-FlexBoot-3.7.102.signed.bin in place in the current working directory, run mlxfwmanager. This will detect any cards and available firmware updates:

[root@mawenzi-04 ~]# mlxfwmanager
Querying Mellanox devices firmware ...

Device #1:
----------

  Device Type:      ConnectX6
  Part Number:      MCX653105A-HDA_HPE_Ax
  Description:      HPE InfiniBand HDR/Ethernet 200Gb 1-port MCX653105A-HDAT QSFP56 x16 Adapter
  PSID:             MT_0000000451
  PCI Device Name:  0000:85:00.0
  Base GUID:        9440c9ffffb31bc4
  Versions:         Current        Available
     FW             20.35.1012     20.37.1700
     PXE            3.6.0804       3.7.0102
     UEFI           14.28.0015     14.30.0013

  Status:           Update required

---------
Found 1 device(s) requiring firmware update. Please use -u flag to perform the update.

Run mlxfwmanager -u in the directory with the .bin firmware image file to update the card(s):

[root@mawenzi-04 ~]# mlxfwmanager -u
Querying Mellanox devices firmware ...

Device #1:
----------

  Device Type:      ConnectX6
  Part Number:      MCX653105A-HDA_HPE_Ax
  Description:      HPE InfiniBand HDR/Ethernet 200Gb 1-port MCX653105A-HDAT QSFP56 x16 Adapter
  PSID:             MT_0000000451
  PCI Device Name:  0000:85:00.0
  Base GUID:        9440c9ffffb31bc4
  Versions:         Current        Available
     FW             20.35.1012     20.37.1700
     PXE            3.6.0804       3.7.0102
     UEFI           14.28.0015     14.30.0013

  Status:           Update required

---------
Found 1 device(s) requiring firmware update...

Perform FW update? [y/N]: y
Device #1: Updating FW ...
FSMST_INITIALIZE -   OK
Writing Boot image component -   OK
Done

Restart needed for updates to take effect.

Reboot once the update has succeeded.

InfiniBand Utilities

You may need to modprobe ib_umad before using some of these tools.

iblinkinfo will show info about all of the links on the fabric. Local IDs (LIDs), speeds, etc.

  • Comes from the infiniband-diags repo.

[root@mawenzi-01 ~]# iblinkinfo
CA: mawenzi-06 HCA-1:
      0x9440c9ffffb33b60      8    1[  ] ==( 4X      25.78125 Gbps Active/  LinkUp)==>       3   11[  ] "SwitchIB Mellanox Technologies" ( )
CA: mawenzi-05 mlx5_0:
      0x9440c9ffffb33bdc      7    1[  ] ==( 4X      25.78125 Gbps Active/  LinkUp)==>       3    9[  ] "SwitchIB Mellanox Technologies" ( )
CA: localhost mlx5_0:
      0x9440c9ffffb31bc4      5    1[  ] ==( 4X      25.78125 Gbps Active/  LinkUp)==>       3    7[  ] "SwitchIB Mellanox Technologies" ( )
CA: mawenzi-03 mlx5_0:
      0x9440c9ffffb35b44      2    1[  ] ==( 4X      25.78125 Gbps Active/  LinkUp)==>       3    5[  ] "SwitchIB Mellanox Technologies" ( )
CA: mawenzi-02 mlx5_0:
      0x9440c9ffffb34bf4      4    1[  ] ==( 4X      25.78125 Gbps Active/  LinkUp)==>       3    3[  ] "SwitchIB Mellanox Technologies" ( )
Switch: 0x248a07030074dd50 SwitchIB Mellanox Technologies:
           3    1[  ] ==( 4X      25.78125 Gbps Active/  LinkUp)==>       1    1[  ] "mawenzi-01 mlx5_0" ( )
           3    2[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3    3[  ] ==( 4X      25.78125 Gbps Active/  LinkUp)==>       4    1[  ] "mawenzi-02 mlx5_0" ( )
           3    4[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3    5[  ] ==( 4X      25.78125 Gbps Active/  LinkUp)==>       2    1[  ] "mawenzi-03 mlx5_0" ( )
           3    6[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3    7[  ] ==( 4X      25.78125 Gbps Active/  LinkUp)==>       5    1[  ] "localhost mlx5_0" ( )
           3    8[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3    9[  ] ==( 4X      25.78125 Gbps Active/  LinkUp)==>       7    1[  ] "mawenzi-05 mlx5_0" ( )
           3   10[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   11[  ] ==( 4X      25.78125 Gbps Active/  LinkUp)==>       8    1[  ] "mawenzi-06 HCA-1" ( )
           3   12[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   13[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   14[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   15[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   16[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   17[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   18[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   19[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   20[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   21[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   22[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   23[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   24[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   25[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   26[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   27[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   28[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   29[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   30[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   31[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   32[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   33[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   34[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   35[  ] ==(                Down/ Polling)==>             [  ] "" ( )
           3   36[  ] ==(                Down/ Polling)==>             [  ] "" ( )
CA: mawenzi-01 mlx5_0:
      0x9440c9ffffb34bd0      1    1[  ] ==( 4X      25.78125 Gbps Active/  LinkUp)==>       3    1[  ] "SwitchIB Mellanox Technologies" ( )

ibswitches: Shows information about the InfiniBand switches on the fabric

  • Comes from the infiniband-diags repo.

[root@mawenzi-01 ~]# ibswitches
Switch	: 0x248a07030074dd50 ports 36 "SwitchIB Mellanox Technologies" base port 0 lid 3 lmc 0

ibstat: Shows information about the local InfiniBand devices, or rather NICs:

[root@mawenzi-01 ~]# ibstat
CA 'mlx5_0'
	CA type: MT4123
	Number of ports: 1
	Firmware version: 20.37.1700
	Hardware version: 0
	Node GUID: 0x9440c9ffffb34bd0
	System image GUID: 0x9440c9ffffb34bd0
	Port 1:
		State: Active
		Physical state: LinkUp
		Rate: 100
		Base lid: 1
		LMC: 0
		SM lid: 2
		Capability mask: 0xa651e848
		Port GUID: 0x9440c9ffffb34bd0
		Link layer: InfiniBand
CA 'mlx5_1'
	CA type: MT4123
	Number of ports: 1
	Firmware version: 20.37.1700
	Hardware version: 0
	Node GUID: 0x9440c9ffffb35b4c
	System image GUID: 0x9440c9ffffb35b4c
	Port 1:
		State: Down
		Physical state: Disabled
		Rate: 10
		Base lid: 65535
		LMC: 0
		SM lid: 0
		Capability mask: 0xa651e848
		Port GUID: 0x9440c9ffffb35b4c
		Link layer: InfiniBand

Tasks

Show information about a Mellanox card link

[root@mawenzi-01 ~]# mlxlink -d mlx5_0

Operational Info
----------------
State                           : Active
Physical state                  : LinkUp
Speed                           : IB-EDR
Width                           : 4x
FEC                             : Standard LL RS-FEC - RS(271,257)
Loopback Mode                   : No Loopback
Auto Negotiation                : ON

Supported Info
--------------
Enabled Link Speed              : 0x00000027 (EDR,QDR,DDR,SDR)
Supported Cable Speed           : 0x0000003f (EDR,FDR,FDR10,QDR,DDR,SDR)

Troubleshooting Info
--------------------
Status Opcode                   : 0
Group Opcode                    : N/A
Recommendation                  : No issue was observed.

Tool Information
----------------
Firmware Version                : 20.37.1700
amBER Version                   : 2.02
MFT Version                     : mft 4.21.0-102

Query Mellanox HCA configuration

[root@mawenzi-06 ~]# mlxconfig -d 87:00.0 query

Device #1:
----------

Device type:    ConnectX6
Name:           MCX653105A-HDA_HPE_Ax
Description:    HPE InfiniBand HDR/Ethernet 200Gb 1-port MCX653105A-HDAT QSFP56 x16 Adapter
Device:         87:00.0

Configurations:                                      Next Boot
         MEMIC_BAR_SIZE                              0
         MEMIC_SIZE_LIMIT                            _256KB(1)
         HOST_CHAINING_MODE                          DISABLED(0)
         HOST_CHAINING_CACHE_DISABLE                 False(0)
         HOST_CHAINING_DESCRIPTORS                   Array[0..7]
         HOST_CHAINING_TOTAL_BUFFER_SIZE             Array[0..7]
         FLEX_PARSER_PROFILE_ENABLE                  0
         FLEX_IPV4_OVER_VXLAN_PORT                   0
         ROCE_NEXT_PROTOCOL                          254
         ESWITCH_HAIRPIN_DESCRIPTORS                 Array[0..7]
         ESWITCH_HAIRPIN_TOT_BUFFER_SIZE             Array[0..7]
         PF_BAR2_SIZE                                0
         ...

View HCA link type (IB or ETH)

[root@mawenzi-06 ~]# mlxconfig -d 87:00.0 query | grep LINK_TYPE
         LINK_TYPE_P1                                IB(1)

Flip HCA from InfiniBand to Ethernet

IB is 1, ETH is 2
yes | mlxconfig -d 87:00.0 set LINK_TYPE_P1=2

Use Mellanox Firmware Manager to query device firmware

[root@mawenzi-06 ~]# mlxfwmanager
Querying Mellanox devices firmware ...

Device #1:
----------

  Device Type:      ConnectX6
  Part Number:      MCX653105A-HDA_HPE_Ax
  Description:      HPE InfiniBand HDR/Ethernet 200Gb 1-port MCX653105A-HDAT QSFP56 x16 Adapter
  PSID:             MT_0000000451
  PCI Device Name:  0000:03:00.0
  Base GUID:        9440c9ffff88dd98
  Versions:         Current        Available
     FW             20.35.1012     N/A
     PXE            3.6.0804       N/A
     UEFI           14.28.0015     N/A

  Status:           No matching image found

Device #2:
----------

  Device Type:      ConnectX6
  Part Number:      MCX653105A-HDA_HPE_Ax
  Description:      HPE InfiniBand HDR/Ethernet 200Gb 1-port MCX653105A-HDAT QSFP56 x16 Adapter
  PSID:             MT_0000000451
  PCI Device Name:  0000:87:00.0
  Base GUID:        9440c9ffffb33b60
  Versions:         Current        Available
     FW             20.35.1012     N/A
     PXE            3.6.0804       N/A
     UEFI           14.28.0015     N/A

  Status:           No matching image found
  ----

Setting InfiniBand Device Static IP Address

Before you assign an IP address or edit the ONBOOT settings for the InfiniBand interfaces, they will show up like the ib1 entry below in the ip a output. After you’ve assigned an IP address, netmask, and set the card to be enabled on boot it will show up like the ib0 entry.

[root@mawenzi-06 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
       valid_lft forever preferred_lft forever
    inet6 ::1/128 scope host
       valid_lft forever preferred_lft forever
2: ens10f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
    link/ether 14:02:ec:da:9e:50 brd ff:ff:ff:ff:ff:ff
    inet 10.214.133.192/21 brd 10.214.135.255 scope global dynamic noprefixroute ens10f0
       valid_lft 70351sec preferred_lft 70351sec
    inet6 fe80::1602:ecff:feda:9e50/64 scope link
       valid_lft forever preferred_lft forever
3: ens10f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
    link/ether 14:02:ec:da:9e:51 brd ff:ff:ff:ff:ff:ff
4: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group default qlen 256
    link/infiniband 00:00:10:29:fe:80:00:00:00:00:00:00:94:40:c9:ff:ff:b3:3b:60 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
    inet 192.168.0.106/24 brd 192.168.0.255 scope global noprefixroute ib0
       valid_lft forever preferred_lft forever
    inet6 fe80::9640:c9ff:ffb3:3b60/64 scope link noprefixroute
       valid_lft forever preferred_lft forever
5: ib1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 4092 qdisc mq state DOWN group default qlen 256
    link/infiniband 00:00:10:29:fe:80:00:00:00:00:00:00:94:40:c9:ff:ff:88:dd:98 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff

To do this, you need to make sure you have the ib_ipoib module installed and loaded, this handles the IP over InfiniBand protocol in the kernel.

modprobe ib_ipoib

If you want this module to be loaded on every boot by default:

echo ib_ipoib > /etc/modules-load.d/ipoib.conf

Then, edit the /etc/sysconfig/network-scripts/ifcfg-ib1 interface config script file. Before it should look something like:

[root@mawenzi-06 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ib1
# Generated by parse-kickstart
TYPE="Infiniband"
DEVICE="ib1"
UUID="4707d11c-af1e-4981-9814-fb5d621de178"
ONBOOT="no"
BOOTPROTO="dhcp"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"

Set the following fields:

  • ONBOOT=yes : Enables the card on boot

  • BOOTPROTO=none : Tells the card not to use DHCP on boot, since we’re doing a static IP address assignment

  • IPADDR=192.168.0.106 : The IP address you want the card to have. You may want to create a private subnet for this.

  • NETMASK=255.255.255.0 : Netmask according the subnet the card is on.

Here’s an example of what the ib0 card network script file looks like from the above example:

[root@mawenzi-06 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ib0
# Generated by parse-kickstart
TYPE=InfiniBand
DEVICE=ib0
UUID=4819df4c-37ef-4aed-b6db-3c19a82c6201
ONBOOT=yes
BOOTPROTO=none
IPADDR=192.168.0.106
NETMASK=255.255.255.0
IPV6INIT=yes
IPV6_AUTOCONF=yes
CONNECTED_MODE=no
PROXY_METHOD=none
BROWSER_ONLY=no
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
NAME="System ib0"

Alternatively, you can set the IP address via ip addr:

ip addr add 192.168.0.103/24 dev ib0

then, enable the device using ip link:

ip link set dev ib0 up