InfiniBand
Details and notes for IB management, commands, notes, configuration, and installation.
Subnet Manager
The Subnet Manager is an entity running either on a managed switch, node, or somewhere in the fabric, and is responsible for discovering and configuring all the InfiniBand fabric devices to enable traffic flow between those devices.
OpenSM
opensm
is an InfiniBand compliant Subnet Manager and Subnet Administrator that runs on top of the Mellanox OFED stack.
Installing opensm
In order for opensm
to function, you need to first enable ib_umad
so opensm
can use Userspace MADs:
modprobe ib_umad
Then, install opensm
via dnf
:
dnf install -y opensm
And start the daemon via systemctl
:
systemctl start opensm
Verify it’s running:
[root@mawenzi-03 ~]# systemctl status opensm
● opensmd.service - LSB: Activates/Deactivates InfiniBand Subnet Manager
Loaded: loaded (/etc/rc.d/init.d/opensmd; generated)
Active: active (running) since Fri 2023-06-30 20:28:27 MDT; 6s ago
Docs: man:systemd-sysv-generator(8)
Process: 16797 ExecStart=/etc/rc.d/init.d/opensmd start (code=exited, status=0/SUCCESS)
Main PID: 16806 (opensm)
Tasks: 144 (limit: 821428)
Memory: 15.6M
CGroup: /system.slice/opensmd.service
├─16806 /usr/sbin/opensm --daemon
└─16809 osm_crashd
Jun 30 20:28:26 mawenzi-03 systemd[1]: Starting LSB: Activates/Deactivates InfiniBand Subnet Manager...
Jun 30 20:28:26 mawenzi-03 OpenSM[16806]: /var/log/opensm.log log file opened
Jun 30 20:28:26 mawenzi-03 OpenSM[16806]: OpenSM 5.11.0.MLNX20220418.fd3d650
Jun 30 20:28:26 mawenzi-03 OpenSM[16806]: Entering DISCOVERING state
Jun 30 20:28:26 mawenzi-03 OpenSM[16806]: Entering STANDBY state
Jun 30 20:28:27 mawenzi-03 opensmd[16797]: Starting IB Subnet Manager.
Jun 30 20:28:27 mawenzi-03 opensmd[16952]: Starting IB Subnet Manager.
Jun 30 20:28:27 mawenzi-03 opensmd[16952]: hich: no ibdiagm.sh in (/sbi
Jun 30 20:28:27 mawenzi-03 opensmd[16797]: hich: no
Jun 30 20:28:27 mawenzi-03 systemd[1]: Started LSB: Activates/Deactivates InfiniBand Subnet Manager.
Mellanox OFED (MOFED)
MOFED Utilities
-
mlxfwmanager
: Firmware Update and Query Tool -
mlxlink
: The mlxlink tool is used to check and debug link status and issues related to them. The tool can be used on different links and cables (passive, active, transceiver and backplane). -
mlxconfig
: Allows the user to change some of the device configurations without having to create and burn a new firmware -
mlxfwreset
: The tool provides the following functionality in order to load new firmware:-
Query the device for the supported reset-level and reset-type
-
Perform reset operation on the device
-
-
mlxcables
: Mellanox Cables Tool
MOFED Kernel Module Relationships
This describes the various modules of MLNX_OFED relations with the other Linux Kernel modules.
MOFED Installation
In this example we’ll be installing MOFED for Rocky Linux 8.6.
Go to Nvidia Mellanox Download center, and download the .iso
for Rocky 8.6:
Once you have it on your target system, mount it to /mnt
mount -o ro,loop MLNX_OFED_LINUX-5.4-3.7.5.0-rhel8.6-x86_64.iso /mnt
Then, gather all the needed dependencies for the install script
dnf install perl gcc-gfortran python36 tk lsof tcl tcsh pkgconf-pkg-config pciutils
Finally, run the install script itself
/mnt/mlnxofedinstall
Once that has finished installing, restart the openibd
service
/etc/init.d/openibd restart
Verifying Installation
Install the Infiniband Diagnostics utility package
dnf install infiniband-diags
Make sure all the right modules are loaded with lsmod
[root@mawenzi-06 ~]# lsmod | grep -P "(ib_|_ib|mlx|rdma)"
rdma_ucm 32768 0
rdma_cm 118784 1 rdma_ucm
iw_cm 53248 1 rdma_cm
ib_ipoib 151552 0
ib_cm 57344 2 rdma_cm,ib_ipoib
ib_umad 28672 0
mlx5_ib 430080 0
mlx5_core 1789952 1 mlx5_ib
mlxdevm 176128 1 mlx5_core
ib_uverbs 151552 2 rdma_ucm,mlx5_ib
ib_core 421888 8 rdma_cm,ib_ipoib,iw_cm,ib_umad,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm
mlx_compat 16384 11 rdma_cm,ib_ipoib,mlxdevm,iw_cm,ib_umad,ib_core,rdma_ucm,ib_uverbs,mlx5_ib,ib_cm,mlx5_core
psample 20480 1 mlx5_core
mlxfw 28672 1 mlx5_core
tls 102400 1 mlx5_core
pci_hyperv_intf 16384 1 mlx5_core
nft_fib_inet 16384 1
nft_fib_ipv4 16384 1 nft_fib_inet
nft_fib_ipv6 16384 1 nft_fib_inet
nft_fib 16384 3 nft_fib_ipv6,nft_fib_ipv4,nft_fib_inet
nf_tables 180224 235 nft_ct,nft_reject_inet,nft_fib_ipv6,nft_fib_ipv4,nft_chain_nat,nf_tables_set,nft_reject,nft_fib,nft_fib_inet
Run ibstat
to view local card info
[root@mawenzi-06 ~]# ibstat
CA 'mlx5_0'
CA type: MT4123
Number of ports: 1
Firmware version: 20.35.2000
Hardware version: 0
Node GUID: 0x9440c9ffffb33b60
System image GUID: 0x9440c9ffffb33b60
Port 1:
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 8
LMC: 0
SM lid: 1
Capability mask: 0xa659e848
Port GUID: 0x9440c9ffffb33b60
Link layer: InfiniBand
CA 'mlx5_1'
CA type: MT4123
Number of ports: 1
Firmware version: 20.35.2000
Hardware version: 0
Node GUID: 0x9440c9ffff88dd98
System image GUID: 0x9440c9ffff88dd98
Port 1:
State: Down
Physical state: Disabled
Rate: 10
Base lid: 65535
LMC: 0
SM lid: 0
Capability mask: 0xa659e848
Port GUID: 0x9440c9ffff88dd98
Link layer: InfiniBand
Here we can see 2 single-port CX-6 cards, one that’s disconnected (mlx5_1
) and doesn’t have anything plugged in, and one that is fully
connected (mlx5_0
) to the InfiniBand switch. We can also see the Local ID (LID) of the port, 8
, and the Subnet Manager (SM) LID of 1
.
Next, we can run iblinkinfo
to view information about the whole InfiniBand fabric. Note our own node, mawenzi-06
, at the bottom.
[root@mawenzi-06 ~]# iblinkinfo
CA: mawenzi-05 mlx5_0:
0x9440c9ffffb33bdc 7 1[ ] ==( 4X 25.78125 Gbps Active/ LinkUp)==> 3 9[ ] "SwitchIB Mellanox Technologies" ( )
CA: mawenzi-07 mlx5_0:
0x9440c9ffffb32bd4 6 1[ ] ==( 4X 25.78125 Gbps Active/ LinkUp)==> 3 13[ ] "SwitchIB Mellanox Technologies" ( )
CA: mawenzi-01 mlx5_0:
0x9440c9ffffb34bd0 1 1[ ] ==( 4X 25.78125 Gbps Active/ LinkUp)==> 3 1[ ] "SwitchIB Mellanox Technologies" ( )
CA: mawenzi-04 mlx5_0:
0x9440c9ffffb31bc4 5 1[ ] ==( 4X 25.78125 Gbps Active/ LinkUp)==> 3 7[ ] "SwitchIB Mellanox Technologies" ( )
CA: mawenzi-03 mlx5_0:
0x9440c9ffffb35b44 2 1[ ] ==( 4X 25.78125 Gbps Active/ LinkUp)==> 3 5[ ] "SwitchIB Mellanox Technologies" ( )
CA: mawenzi-02 mlx5_0:
0x9440c9ffffb34bf4 4 1[ ] ==( 4X 25.78125 Gbps Active/ LinkUp)==> 3 3[ ] "SwitchIB Mellanox Technologies" ( )
Switch: 0x248a07030074dd50 SwitchIB Mellanox Technologies:
3 1[ ] ==( 4X 25.78125 Gbps Active/ LinkUp)==> 1 1[ ] "mawenzi-01 mlx5_0" ( )
3 2[ ] ==( Down/ Polling)==> [ ] "" ( )
3 3[ ] ==( 4X 25.78125 Gbps Active/ LinkUp)==> 4 1[ ] "mawenzi-02 mlx5_0" ( )
3 4[ ] ==( Down/ Polling)==> [ ] "" ( )
3 5[ ] ==( 4X 25.78125 Gbps Active/ LinkUp)==> 2 1[ ] "mawenzi-03 mlx5_0" ( )
3 6[ ] ==( Down/ Polling)==> [ ] "" ( )
3 7[ ] ==( 4X 25.78125 Gbps Active/ LinkUp)==> 5 1[ ] "mawenzi-04 mlx5_0" ( )
3 8[ ] ==( Down/ Polling)==> [ ] "" ( )
3 9[ ] ==( 4X 25.78125 Gbps Active/ LinkUp)==> 7 1[ ] "mawenzi-05 mlx5_0" ( )
3 10[ ] ==( Down/ Polling)==> [ ] "" ( )
3 11[ ] ==( 4X 25.78125 Gbps Active/ LinkUp)==> 8 1[ ] "mawenzi-06 HCA-1" ( )
3 12[ ] ==( Down/ Polling)==> [ ] "" ( )
3 13[ ] ==( 4X 25.78125 Gbps Active/ LinkUp)==> 6 1[ ] "mawenzi-07 mlx5_0" ( )
3 14[ ] ==( Down/ Polling)==> [ ] "" ( )
3 15[ ] ==( Down/ Polling)==> [ ] "" ( )
3 16[ ] ==( Down/ Polling)==> [ ] "" ( )
3 17[ ] ==( Down/ Polling)==> [ ] "" ( )
3 18[ ] ==( Down/ Polling)==> [ ] "" ( )
3 19[ ] ==( Down/ Polling)==> [ ] "" ( )
3 20[ ] ==( Down/ Polling)==> [ ] "" ( )
3 21[ ] ==( Down/ Polling)==> [ ] "" ( )
3 22[ ] ==( Down/ Polling)==> [ ] "" ( )
3 23[ ] ==( Down/ Polling)==> [ ] "" ( )
3 24[ ] ==( Down/ Polling)==> [ ] "" ( )
3 25[ ] ==( Down/ Polling)==> [ ] "" ( )
3 26[ ] ==( Down/ Polling)==> [ ] "" ( )
3 27[ ] ==( Down/ Polling)==> [ ] "" ( )
3 28[ ] ==( Down/ Polling)==> [ ] "" ( )
3 29[ ] ==( Down/ Polling)==> [ ] "" ( )
3 30[ ] ==( Down/ Polling)==> [ ] "" ( )
3 31[ ] ==( Down/ Polling)==> [ ] "" ( )
3 32[ ] ==( Down/ Polling)==> [ ] "" ( )
3 33[ ] ==( Down/ Polling)==> [ ] "" ( )
3 34[ ] ==( Down/ Polling)==> [ ] "" ( )
3 35[ ] ==( Down/ Polling)==> [ ] "" ( )
3 36[ ] ==( Down/ Polling)==> [ ] "" ( )
CA: mawenzi-06 HCA-1:
0x9440c9ffffb33b60 8 1[ ] ==( 4X 25.78125 Gbps Active/ LinkUp)==> 3 11[ ] "SwitchIB Mellanox Technologies" ( )
Refer to most InfiniBand utilities or MOFED utilities for other diagnostic utilities.
Card Configuration
Here we’ll be using a Mellanox ConnectX-6 card for this set of examples. Make sure that you’ve installed MOFED and have loaded all the required modules.
Enable Card on Boot
Rocky Linux 8.6
For Rocky 8.6, we’ll be using the network-scripts ifcfg
configuration file to persist card configuration.
Edit /etc/sysconfig/network-scripts/ifcfg-ib0
, enabling ONBOOT
and disabling DHCP as boot protocol
sed -i -e 's/ONBOOT=no/ONBOOT=yes/g' -e 's/BOOTPROTO=dhcp/BOOTPROTO=none/g' /etc/sysconfig/network-scripts/ifcfg-ib0
Now, reboot
the node.
Rocky Linux 9.1
For Rocky 9.X onwards, everything is done using the newer
NetworkManager
system. You can still convert your old ifcfg
files to the new format, by using nmcli connection migrate
.
Update Firmware
Find PCI ID using lspci
:
[root@mawenzi-06 ~]# lspci | grep Mellanox
03:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]
87:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]
The 03:00.0
and 87:00.0
are the PCI device names of the two cards we have on the system.
HPE-Branded Firmware Updates
Check if the cards are HPE-branded, using lspci
in verbose mode with selected device.
Under Vital Product Data
, note the entry: Product Name: HPE InfiniBand HDR/Ethernet 200Gb 1-port MCX653105A-HDAT QSFP56 x16 Adapter
. This means that we can’t do a firmware update using generic files downloaded from Mellanox website; instead we’ll
have to use ones from HPE. Use the product info to find the right fabric firmware image here:
Ctrl+F for the Part number: P24250-001
that comes from the following lspci
output:
[root@mawenzi-04 ~]# lspci -vv -s 85:00.0
85:00.0 Infiniband controller: Mellanox Technologies MT28908 Family [ConnectX-6]
Subsystem: Mellanox Technologies Device 0068
Physical Slot: 1
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr+ Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR- INTx-
Latency: 0, Cache Line Size: 64 bytes
Interrupt: pin A routed to IRQ 157
NUMA node: 0
IOMMU group: 28
Region 0: Memory at ac000000 (64-bit, prefetchable) [size=32M]
Expansion ROM at ab400000 [virtual] [disabled] [size=1M]
Capabilities: [60] Express (v2) Endpoint, MSI 00
DevCap: MaxPayload 512 bytes, PhantFunc 0, Latency L0s unlimited, L1 unlimited
ExtTag+ AttnBtn- AttnInd- PwrInd- RBE+ FLReset+ SlotPowerLimit 75.000W
DevCtl: CorrErr- NonFatalErr+ FatalErr+ UnsupReq-
RlxdOrd+ ExtTag+ PhantFunc- AuxPwr- NoSnoop+ FLReset-
MaxPayload 512 bytes, MaxReadReq 4096 bytes
DevSta: CorrErr+ NonFatalErr- FatalErr- UnsupReq+ AuxPwr- TransPend-
LnkCap: Port #0, Speed 16GT/s, Width x16, ASPM not supported
ClockPM- Surprise- LLActRep- BwNot- ASPMOptComp+
LnkCtl: ASPM Disabled; RCB 64 bytes, Disabled- CommClk+
ExtSynch- ClockPM- AutWidDis- BWInt- AutBWInt-
LnkSta: Speed 16GT/s (ok), Width x16 (ok)
TrErr- Train- SlotClk+ DLActive- BWMgmt- ABWMgmt-
DevCap2: Completion Timeout: Range ABC, TimeoutDis+ NROPrPrP- LTR-
10BitTagComp+ 10BitTagReq- OBFF Not Supported, ExtFmt- EETLPPrefix-
EmergencyPowerReduction Not Supported, EmergencyPowerReductionInit-
FRS- TPHComp- ExtTPHComp-
AtomicOpsCap: 32bit- 64bit- 128bitCAS-
DevCtl2: Completion Timeout: 50us to 50ms, TimeoutDis- LTR- OBFF Disabled,
AtomicOpsCtl: ReqEn+
LnkCap2: Supported Link Speeds: 2.5-16GT/s, Crosslink- Retimer+ 2Retimers+ DRS-
LnkCtl2: Target Link Speed: 16GT/s, EnterCompliance- SpeedDis-
Transmit Margin: Normal Operating Range, EnterModifiedCompliance- ComplianceSOS-
Compliance De-emphasis: -6dB
LnkSta2: Current De-emphasis Level: -6dB, EqualizationComplete+ EqualizationPhase1+
EqualizationPhase2+ EqualizationPhase3+ LinkEqualizationRequest-
Retimer- 2Retimers- CrosslinkRes: unsupported
Capabilities: [48] Vital Product Data
Product Name: HPE InfiniBand HDR/Ethernet 200Gb 1-port MCX653105A-HDAT QSFP56 x16 Adapter
Read-only fields:
[PN] Part number: P24250-001
[EC] Engineering changes: A5
[V2] Vendor specific: P24250-001
[SN] Serial number: IL203002KT
[V3] Vendor specific: 60c190dc0ccdea1180009440c9b31bc4
[VA] Vendor specific: MLX:MN=MLNX:CSKU=V2:UUID=V3:PCI=V0:MODL=CX653105A
[V0] Vendor specific: PCIeGen4 x16
[VU] Vendor specific: IL203002KTMLNXS0D0F0
[RV] Reserved: checksum good, 1 byte(s) reserved
End
Capabilities: [9c] MSI-X: Enable+ Count=64 Masked-
Vector table: BAR=0 offset=00002000
PBA: BAR=0 offset=00003000
Capabilities: [c0] Vendor Specific Information: Len=18 <?>
Capabilities: [40] Power Management version 3
Flags: PMEClk- DSI- D1- D2- AuxCurrent=375mA PME(D0-,D1-,D2-,D3hot-,D3cold+)
Status: D0 NoSoftRst+ PME-Enable- DSel=0 DScale=0 PME-
Capabilities: [100 v1] Advanced Error Reporting
UESta: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UEMsk: DLP- SDES- TLP- FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC- UnsupReq- ACSViol-
UESvrt: DLP- SDES- TLP+ FCP- CmpltTO- CmpltAbrt- UnxCmplt- RxOF- MalfTLP- ECRC+ UnsupReq- ACSViol-
CESta: RxErr- BadTLP- BadDLLP- Rollover- Timeout- AdvNonFatalErr+
CEMsk: RxErr+ BadTLP+ BadDLLP+ Rollover+ Timeout+ AdvNonFatalErr+
AERCap: First Error Pointer: 08, ECRCGenCap+ ECRCGenEn+ ECRCChkCap+ ECRCChkEn+
MultHdrRecCap- MultHdrRecEn- TLPPfxPres- HdrLogCap-
HeaderLog: 00000000 00000000 00000000 00000000
Capabilities: [150 v1] Alternative Routing-ID Interpretation (ARI)
ARICap: MFVC- ACS-, Next Function: 0
ARICtl: MFVC- ACS-, Function Group: 0
Capabilities: [1c0 v1] Secondary PCI Express
LnkCtl3: LnkEquIntrruptEn- PerformEqu-
LaneErrStat: 0
Capabilities: [320 v1] Lane Margining at the Receiver <?>
Capabilities: [370 v1] Physical Layer 16.0 GT/s <?>
Capabilities: [420 v1] Data Link Feature <?>
Kernel driver in use: mlx5_core
Kernel modules: mlx5_core
Go to the Firmware page, track down the latest GA directory, and get the .bin
firmware file. Example.
Once you have a file like fw-ConnectX6-rel-20_37_1700-MCX653105A-HDA_HPE_Ax-UEFI-14.30.13-FlexBoot-3.7.102.signed.bin
in place
in the current working directory, run mlxfwmanager
. This will detect any cards and available firmware updates:
[root@mawenzi-04 ~]# mlxfwmanager
Querying Mellanox devices firmware ...
Device #1:
----------
Device Type: ConnectX6
Part Number: MCX653105A-HDA_HPE_Ax
Description: HPE InfiniBand HDR/Ethernet 200Gb 1-port MCX653105A-HDAT QSFP56 x16 Adapter
PSID: MT_0000000451
PCI Device Name: 0000:85:00.0
Base GUID: 9440c9ffffb31bc4
Versions: Current Available
FW 20.35.1012 20.37.1700
PXE 3.6.0804 3.7.0102
UEFI 14.28.0015 14.30.0013
Status: Update required
---------
Found 1 device(s) requiring firmware update. Please use -u flag to perform the update.
Run mlxfwmanager -u
in the directory with the .bin
firmware image file to update the card(s):
[root@mawenzi-04 ~]# mlxfwmanager -u
Querying Mellanox devices firmware ...
Device #1:
----------
Device Type: ConnectX6
Part Number: MCX653105A-HDA_HPE_Ax
Description: HPE InfiniBand HDR/Ethernet 200Gb 1-port MCX653105A-HDAT QSFP56 x16 Adapter
PSID: MT_0000000451
PCI Device Name: 0000:85:00.0
Base GUID: 9440c9ffffb31bc4
Versions: Current Available
FW 20.35.1012 20.37.1700
PXE 3.6.0804 3.7.0102
UEFI 14.28.0015 14.30.0013
Status: Update required
---------
Found 1 device(s) requiring firmware update...
Perform FW update? [y/N]: y
Device #1: Updating FW ...
FSMST_INITIALIZE - OK
Writing Boot image component - OK
Done
Restart needed for updates to take effect.
Reboot once the update has succeeded.
InfiniBand Utilities
You may need to modprobe ib_umad
before using some of these tools.
iblinkinfo
will show info about all of the links on the fabric. Local IDs (LIDs), speeds, etc.
-
Comes from the
infiniband-diags
repo.
[root@mawenzi-01 ~]# iblinkinfo
CA: mawenzi-06 HCA-1:
0x9440c9ffffb33b60 8 1[ ] ==( 4X 25.78125 Gbps Active/ LinkUp)==> 3 11[ ] "SwitchIB Mellanox Technologies" ( )
CA: mawenzi-05 mlx5_0:
0x9440c9ffffb33bdc 7 1[ ] ==( 4X 25.78125 Gbps Active/ LinkUp)==> 3 9[ ] "SwitchIB Mellanox Technologies" ( )
CA: localhost mlx5_0:
0x9440c9ffffb31bc4 5 1[ ] ==( 4X 25.78125 Gbps Active/ LinkUp)==> 3 7[ ] "SwitchIB Mellanox Technologies" ( )
CA: mawenzi-03 mlx5_0:
0x9440c9ffffb35b44 2 1[ ] ==( 4X 25.78125 Gbps Active/ LinkUp)==> 3 5[ ] "SwitchIB Mellanox Technologies" ( )
CA: mawenzi-02 mlx5_0:
0x9440c9ffffb34bf4 4 1[ ] ==( 4X 25.78125 Gbps Active/ LinkUp)==> 3 3[ ] "SwitchIB Mellanox Technologies" ( )
Switch: 0x248a07030074dd50 SwitchIB Mellanox Technologies:
3 1[ ] ==( 4X 25.78125 Gbps Active/ LinkUp)==> 1 1[ ] "mawenzi-01 mlx5_0" ( )
3 2[ ] ==( Down/ Polling)==> [ ] "" ( )
3 3[ ] ==( 4X 25.78125 Gbps Active/ LinkUp)==> 4 1[ ] "mawenzi-02 mlx5_0" ( )
3 4[ ] ==( Down/ Polling)==> [ ] "" ( )
3 5[ ] ==( 4X 25.78125 Gbps Active/ LinkUp)==> 2 1[ ] "mawenzi-03 mlx5_0" ( )
3 6[ ] ==( Down/ Polling)==> [ ] "" ( )
3 7[ ] ==( 4X 25.78125 Gbps Active/ LinkUp)==> 5 1[ ] "localhost mlx5_0" ( )
3 8[ ] ==( Down/ Polling)==> [ ] "" ( )
3 9[ ] ==( 4X 25.78125 Gbps Active/ LinkUp)==> 7 1[ ] "mawenzi-05 mlx5_0" ( )
3 10[ ] ==( Down/ Polling)==> [ ] "" ( )
3 11[ ] ==( 4X 25.78125 Gbps Active/ LinkUp)==> 8 1[ ] "mawenzi-06 HCA-1" ( )
3 12[ ] ==( Down/ Polling)==> [ ] "" ( )
3 13[ ] ==( Down/ Polling)==> [ ] "" ( )
3 14[ ] ==( Down/ Polling)==> [ ] "" ( )
3 15[ ] ==( Down/ Polling)==> [ ] "" ( )
3 16[ ] ==( Down/ Polling)==> [ ] "" ( )
3 17[ ] ==( Down/ Polling)==> [ ] "" ( )
3 18[ ] ==( Down/ Polling)==> [ ] "" ( )
3 19[ ] ==( Down/ Polling)==> [ ] "" ( )
3 20[ ] ==( Down/ Polling)==> [ ] "" ( )
3 21[ ] ==( Down/ Polling)==> [ ] "" ( )
3 22[ ] ==( Down/ Polling)==> [ ] "" ( )
3 23[ ] ==( Down/ Polling)==> [ ] "" ( )
3 24[ ] ==( Down/ Polling)==> [ ] "" ( )
3 25[ ] ==( Down/ Polling)==> [ ] "" ( )
3 26[ ] ==( Down/ Polling)==> [ ] "" ( )
3 27[ ] ==( Down/ Polling)==> [ ] "" ( )
3 28[ ] ==( Down/ Polling)==> [ ] "" ( )
3 29[ ] ==( Down/ Polling)==> [ ] "" ( )
3 30[ ] ==( Down/ Polling)==> [ ] "" ( )
3 31[ ] ==( Down/ Polling)==> [ ] "" ( )
3 32[ ] ==( Down/ Polling)==> [ ] "" ( )
3 33[ ] ==( Down/ Polling)==> [ ] "" ( )
3 34[ ] ==( Down/ Polling)==> [ ] "" ( )
3 35[ ] ==( Down/ Polling)==> [ ] "" ( )
3 36[ ] ==( Down/ Polling)==> [ ] "" ( )
CA: mawenzi-01 mlx5_0:
0x9440c9ffffb34bd0 1 1[ ] ==( 4X 25.78125 Gbps Active/ LinkUp)==> 3 1[ ] "SwitchIB Mellanox Technologies" ( )
ibswitches
: Shows information about the InfiniBand switches on the fabric
-
Comes from the
infiniband-diags
repo.
[root@mawenzi-01 ~]# ibswitches
Switch : 0x248a07030074dd50 ports 36 "SwitchIB Mellanox Technologies" base port 0 lid 3 lmc 0
ibstat
: Shows information about the local InfiniBand devices, or rather NICs:
[root@mawenzi-01 ~]# ibstat
CA 'mlx5_0'
CA type: MT4123
Number of ports: 1
Firmware version: 20.37.1700
Hardware version: 0
Node GUID: 0x9440c9ffffb34bd0
System image GUID: 0x9440c9ffffb34bd0
Port 1:
State: Active
Physical state: LinkUp
Rate: 100
Base lid: 1
LMC: 0
SM lid: 2
Capability mask: 0xa651e848
Port GUID: 0x9440c9ffffb34bd0
Link layer: InfiniBand
CA 'mlx5_1'
CA type: MT4123
Number of ports: 1
Firmware version: 20.37.1700
Hardware version: 0
Node GUID: 0x9440c9ffffb35b4c
System image GUID: 0x9440c9ffffb35b4c
Port 1:
State: Down
Physical state: Disabled
Rate: 10
Base lid: 65535
LMC: 0
SM lid: 0
Capability mask: 0xa651e848
Port GUID: 0x9440c9ffffb35b4c
Link layer: InfiniBand
Tasks
Show information about a Mellanox card link
[root@mawenzi-01 ~]# mlxlink -d mlx5_0
Operational Info
----------------
State : Active
Physical state : LinkUp
Speed : IB-EDR
Width : 4x
FEC : Standard LL RS-FEC - RS(271,257)
Loopback Mode : No Loopback
Auto Negotiation : ON
Supported Info
--------------
Enabled Link Speed : 0x00000027 (EDR,QDR,DDR,SDR)
Supported Cable Speed : 0x0000003f (EDR,FDR,FDR10,QDR,DDR,SDR)
Troubleshooting Info
--------------------
Status Opcode : 0
Group Opcode : N/A
Recommendation : No issue was observed.
Tool Information
----------------
Firmware Version : 20.37.1700
amBER Version : 2.02
MFT Version : mft 4.21.0-102
Query Mellanox HCA configuration
[root@mawenzi-06 ~]# mlxconfig -d 87:00.0 query
Device #1:
----------
Device type: ConnectX6
Name: MCX653105A-HDA_HPE_Ax
Description: HPE InfiniBand HDR/Ethernet 200Gb 1-port MCX653105A-HDAT QSFP56 x16 Adapter
Device: 87:00.0
Configurations: Next Boot
MEMIC_BAR_SIZE 0
MEMIC_SIZE_LIMIT _256KB(1)
HOST_CHAINING_MODE DISABLED(0)
HOST_CHAINING_CACHE_DISABLE False(0)
HOST_CHAINING_DESCRIPTORS Array[0..7]
HOST_CHAINING_TOTAL_BUFFER_SIZE Array[0..7]
FLEX_PARSER_PROFILE_ENABLE 0
FLEX_IPV4_OVER_VXLAN_PORT 0
ROCE_NEXT_PROTOCOL 254
ESWITCH_HAIRPIN_DESCRIPTORS Array[0..7]
ESWITCH_HAIRPIN_TOT_BUFFER_SIZE Array[0..7]
PF_BAR2_SIZE 0
...
View HCA link type (IB or ETH)
[root@mawenzi-06 ~]# mlxconfig -d 87:00.0 query | grep LINK_TYPE
LINK_TYPE_P1 IB(1)
Flip HCA from InfiniBand to Ethernet
IB is 1 , ETH is 2
|
yes | mlxconfig -d 87:00.0 set LINK_TYPE_P1=2
Use Mellanox Firmware Manager to query device firmware
[root@mawenzi-06 ~]# mlxfwmanager
Querying Mellanox devices firmware ...
Device #1:
----------
Device Type: ConnectX6
Part Number: MCX653105A-HDA_HPE_Ax
Description: HPE InfiniBand HDR/Ethernet 200Gb 1-port MCX653105A-HDAT QSFP56 x16 Adapter
PSID: MT_0000000451
PCI Device Name: 0000:03:00.0
Base GUID: 9440c9ffff88dd98
Versions: Current Available
FW 20.35.1012 N/A
PXE 3.6.0804 N/A
UEFI 14.28.0015 N/A
Status: No matching image found
Device #2:
----------
Device Type: ConnectX6
Part Number: MCX653105A-HDA_HPE_Ax
Description: HPE InfiniBand HDR/Ethernet 200Gb 1-port MCX653105A-HDAT QSFP56 x16 Adapter
PSID: MT_0000000451
PCI Device Name: 0000:87:00.0
Base GUID: 9440c9ffffb33b60
Versions: Current Available
FW 20.35.1012 N/A
PXE 3.6.0804 N/A
UEFI 14.28.0015 N/A
Status: No matching image found
----
Setting InfiniBand Device Static IP Address
Before you assign an IP address or edit the ONBOOT settings for the InfiniBand interfaces, they will show up like the ib1
entry below in the ip a
output.
After you’ve assigned an IP address, netmask, and set the card to be enabled on boot it will show up like the ib0
entry.
[root@mawenzi-06 ~]# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: ens10f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 14:02:ec:da:9e:50 brd ff:ff:ff:ff:ff:ff
inet 10.214.133.192/21 brd 10.214.135.255 scope global dynamic noprefixroute ens10f0
valid_lft 70351sec preferred_lft 70351sec
inet6 fe80::1602:ecff:feda:9e50/64 scope link
valid_lft forever preferred_lft forever
3: ens10f1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN group default qlen 1000
link/ether 14:02:ec:da:9e:51 brd ff:ff:ff:ff:ff:ff
4: ib0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 2044 qdisc mq state UP group default qlen 256
link/infiniband 00:00:10:29:fe:80:00:00:00:00:00:00:94:40:c9:ff:ff:b3:3b:60 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
inet 192.168.0.106/24 brd 192.168.0.255 scope global noprefixroute ib0
valid_lft forever preferred_lft forever
inet6 fe80::9640:c9ff:ffb3:3b60/64 scope link noprefixroute
valid_lft forever preferred_lft forever
5: ib1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 4092 qdisc mq state DOWN group default qlen 256
link/infiniband 00:00:10:29:fe:80:00:00:00:00:00:00:94:40:c9:ff:ff:88:dd:98 brd 00:ff:ff:ff:ff:12:40:1b:ff:ff:00:00:00:00:00:00:ff:ff:ff:ff
To do this, you need to make sure you have the ib_ipoib
module installed and loaded, this handles the IP over InfiniBand protocol in the kernel.
modprobe ib_ipoib
If you want this module to be loaded on every boot by default:
echo ib_ipoib > /etc/modules-load.d/ipoib.conf
Then, edit the /etc/sysconfig/network-scripts/ifcfg-ib1
interface config script file. Before it should look something like:
[root@mawenzi-06 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ib1
# Generated by parse-kickstart
TYPE="Infiniband"
DEVICE="ib1"
UUID="4707d11c-af1e-4981-9814-fb5d621de178"
ONBOOT="no"
BOOTPROTO="dhcp"
IPV6INIT="yes"
IPV6_AUTOCONF="yes"
Set the following fields:
-
ONBOOT=yes
: Enables the card on boot -
BOOTPROTO=none
: Tells the card not to use DHCP on boot, since we’re doing a static IP address assignment -
IPADDR=192.168.0.106
: The IP address you want the card to have. You may want to create a private subnet for this. -
NETMASK=255.255.255.0
: Netmask according the subnet the card is on.
Here’s an example of what the ib0
card network script file looks like from the above example:
[root@mawenzi-06 ~]# cat /etc/sysconfig/network-scripts/ifcfg-ib0
# Generated by parse-kickstart
TYPE=InfiniBand
DEVICE=ib0
UUID=4819df4c-37ef-4aed-b6db-3c19a82c6201
ONBOOT=yes
BOOTPROTO=none
IPADDR=192.168.0.106
NETMASK=255.255.255.0
IPV6INIT=yes
IPV6_AUTOCONF=yes
CONNECTED_MODE=no
PROXY_METHOD=none
BROWSER_ONLY=no
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
NAME="System ib0"
Alternatively, you can set the IP address via ip addr
:
ip addr add 192.168.0.103/24 dev ib0
then, enable the device using ip link
:
ip link set dev ib0 up