CX7 400G 网络 SR-IOV 虚拟机测试优化

在实际测试过程中,CX7 400G 网络 SR-IOV 虚拟机测试存在一些性能瓶颈。为了优化测试效率,进行了一系列调整和优化。
同时,对于虚拟机网络性能的标准,预计为80%(比物理网卡的性能标准低5%~10%)。

1. 优化测试环境配置

Host 配置

1. VM configuration - CPU core 绑定

将虚拟机的 CPU 绑定到物理 CPU core上,以避免使用HT core。

  • 确定物理 CPU core 的数量和编号。
1
2
3
4
5
# lscpu | grep "Core(s) per socket"
Core(s) per socket: 48
# lscpu | grep "CPU(s)"
CPU(s): 96

  • 查看每个 CPU core 的编号和对应关系。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
# lscpu -e
CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE MAXMHZ MINMHZ MHZ
0 0 0 0 0:0:0:0 yes 9999.8888 7777.0000 6666.0000
1 0 0 1 1:1:1:0 yes 9999.8888 7777.0000 6666.0000
2 0 0 2 2:2:2:0 yes 9999.8888 7777.0000 6666.0000
3 0 0 3 3:3:3:0 yes 9999.8888 7777.0000 6666.0000
4 0 0 4 4:4:4:0 yes 9999.8888 7777.0000 6666.0000
5 0 0 5 5:5:5:0 yes 9999.8888 7777.0000 6666.0000
6 0 0 6 32:32:32:4 yes 9999.8888 7777.0000 6666.0000
7 0 0 7 33:33:33:4 yes 9999.8888 7777.0000 6666.0000
8 0 0 8 34:34:34:4 yes 9999.8888 7777.0000 6666.0000
9 0 0 9 35:35:35:4 yes 9999.8888 7777.0000 6666.0000
...

48 0 0 0 0:0:0:0 yes 9999.8888 7777.0000 6666.0000
49 0 0 1 1:1:1:0 yes 9999.8888 7777.0000 6666.0000
50 0 0 2 2:2:2:0 yes 9999.8888 7777.0000 6666.0000
51 0 0 3 3:3:3:0 yes 9999.8888 7777.0000 6666.0000
52 0 0 4 4:4:4:0 yes 9999.8888 7777.0000 6666.0000
53 0 0 5 5:5:5:0 yes 9999.8888 7777.0000 6666.0000
54 0 0 6 32:32:32:4 yes 9999.8888 7777.0000 6666.0000
55 0 0 7 33:33:33:4 yes 9999.8888 7777.0000 6666.0000
56 0 0 8 34:34:34:4 yes 9999.8888 7777.0000 6666.0000
57 0 0 9 35:35:35:4 yes 9999.8888 7777.0000 6666.0000
...

  • 从输出中可以看到,CPU core 0-47 是物理 CPU core,CPU core 48-95 是对应的 HT core。
  • 因此,在虚拟机配置中,将 CPU core 0-39 绑定到虚拟机,以避免使用 HT core。
  • 编辑虚拟机配置文件 /etc/libvirt/qemu/rhel9.4.xml
  • 添加以下内容(这里使用48 Physical CPU cores 中的40 cores):
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
<vcpu placement='static'>40</vcpu>
<cputune>
<vcpupin vcpu='0' cpuset='0'/>
<vcpupin vcpu='1' cpuset='1'/>
<vcpupin vcpu='2' cpuset='2'/>
<vcpupin vcpu='3' cpuset='3'/>
<vcpupin vcpu='4' cpuset='4'/>
<vcpupin vcpu='5' cpuset='5'/>
<vcpupin vcpu='6' cpuset='6'/>
<vcpupin vcpu='7' cpuset='7'/>
<vcpupin vcpu='8' cpuset='8'/>
<vcpupin vcpu='9' cpuset='9'/>
<vcpupin vcpu='10' cpuset='10'/>
<vcpupin vcpu='11' cpuset='11'/>
<vcpupin vcpu='12' cpuset='12'/>
<vcpupin vcpu='13' cpuset='13'/>
<vcpupin vcpu='14' cpuset='14'/>
<vcpupin vcpu='15' cpuset='15'/>
<vcpupin vcpu='16' cpuset='16'/>
<vcpupin vcpu='17' cpuset='17'/>
<vcpupin vcpu='18' cpuset='18'/>
<vcpupin vcpu='19' cpuset='19'/>
<vcpupin vcpu='20' cpuset='20'/>
<vcpupin vcpu='21' cpuset='21'/>
<vcpupin vcpu='22' cpuset='22'/>
<vcpupin vcpu='23' cpuset='23'/>
<vcpupin vcpu='24' cpuset='24'/>
<vcpupin vcpu='25' cpuset='25'/>
<vcpupin vcpu='26' cpuset='26'/>
<vcpupin vcpu='27' cpuset='27'/>
<vcpupin vcpu='28' cpuset='28'/>
<vcpupin vcpu='29' cpuset='29'/>
<vcpupin vcpu='30' cpuset='30'/>
<vcpupin vcpu='31' cpuset='31'/>
<vcpupin vcpu='32' cpuset='32'/>
<vcpupin vcpu='33' cpuset='33'/>
<vcpupin vcpu='34' cpuset='34'/>
<vcpupin vcpu='35' cpuset='35'/>
<vcpupin vcpu='36' cpuset='36'/>
<vcpupin vcpu='37' cpuset='37'/>
<vcpupin vcpu='38' cpuset='38'/>
<vcpupin vcpu='39' cpuset='39'/>
</cputune>

  • 查看是否生效(需要先关闭再打开虚拟机)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
# virsh vcpuinfo rhel9.4
VCPU: 0
CPU: 0
State: running
CPU time: 7292.5s
CPU Affinity: y-----------------------------------------------------------------------------------------------

VCPU: 1
CPU: 1
State: running
CPU time: 7568.5s
CPU Affinity: -y----------------------------------------------------------------------------------------------

VCPU: 2
CPU: 2
State: running
CPU time: 3773.3s
CPU Affinity: --y---------------------------------------------------------------------------------------------

VCPU: 3
CPU: 3
State: running
CPU time: 7596.7s
CPU Affinity: ---y--------------------------------------------------------------------------------------------
...

2. 内存分配

确保虚拟机有足够的内存分配,以满足测试需求。

3. IRQ 绑定

  • 查看 VF 的 IRQ 号和对应的 CPU core 绑定情况, 0000:c1:00.1 是 VF 的 PCI 地址。
1
2
3
4
for i in $(grep "0000:c1:00.1" /proc/interrupts | awk -F: '{print $1}'); do
echo -n "IRQ $i -> "
cat /proc/irq/$i/smp_affinity_list
done
  • Output:
1
2
3
4
5
6
7
8
9
10
11
12
13
IRQ 560 -> 77
IRQ 561 -> 22
IRQ 562 -> 23
IRQ 563 -> 32
IRQ 564 -> 47
IRQ 565 -> 58
IRQ 566 -> 69
IRQ 567 -> 33
IRQ 568 -> 88
IRQ 569 -> 93
IRQ 570 -> 12
IRQ 571 -> 83

  • 将 VF 的 IRQ 绑定到 CPU core 0-11 上,以提高性能。
  • 编写脚本 bind_vf_irqs.sh 来自动绑定 VF 的 IRQ 到指定的 CPU core 上。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
# cat bind_vf_irqs.sh
#!/bin/bash
# Script to bind Mellanox VF MSI-X interrupts to a given CPU range
# Usage: ./bind_vf_irqs.sh <PCI_ADDR> <CPU_LIST>
# Example: ./bind_vf_irqs.sh 0000:c1:00.1 0-39

PCI_ADDR=$1
CPU_LIST=$2

if [[ -z "$PCI_ADDR" || -z "$CPU_LIST" ]]; then
echo "Usage: $0 <PCI_ADDR> <CPU_LIST>"
exit 1
fi

# Get all IRQs for this VF
IRQS=($(grep -i "$PCI_ADDR" /proc/interrupts | awk -F: '{gsub(" ","",$1); print $1}'))

# Convert CPU_LIST to array
CPUS=()
for cpu in $(echo $CPU_LIST | tr ',' ' ' | sed 's/-/ /'); do
CPUS+=($cpu)
done

# Handle ranges like 0-39
if [[ ${#CPUS[@]} -eq 2 ]]; then
START=${CPUS[0]}
END=${CPUS[1]}
CPUS=()
for ((i=START;i<=END;i++)); do
CPUS+=($i)
done
fi

CPU_COUNT=${#CPUS[@]}
INDEX=0

echo "Binding VF $PCI_ADDR IRQs to CPUs: ${CPUS[*]}"

for irq in "${IRQS[@]}"; do
cpu=${CPUS[$INDEX]}
echo $cpu > /proc/irq/$irq/smp_affinity_list
echo "IRQ $irq -> CPU $cpu"
INDEX=$(( (INDEX + 1) % CPU_COUNT ))
done

echo "Done."

  • 运行脚本来绑定 VF 的 IRQ 到 CPU core 0-11 上。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
# chmod +x bind_vf_irqs.sh
# ./bind_vf_irqs.sh 0000:c1:00.1 0-39
Binding VF 0000:c1:00.1 IRQs to CPUs: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39
IRQ 560 -> CPU 0
IRQ 561 -> CPU 1
IRQ 562 -> CPU 2
IRQ 563 -> CPU 3
IRQ 564 -> CPU 4
IRQ 565 -> CPU 5
IRQ 566 -> CPU 6
IRQ 567 -> CPU 7
IRQ 568 -> CPU 8
IRQ 569 -> CPU 9
IRQ 570 -> CPU 10
IRQ 571 -> CPU 11
Done.
  • 通过上述步骤,成功将 VF 的 IRQ 绑定到 CPU core 0-11 上,以提高虚拟机的网络性能。
  • 绑定多少个IRQ取决于VF Combined 设置,Combined为11,可以通过ethtool -l enpxxxs0v0查看VF的Combined数量。

2. VM 内部优化

  • 在虚拟机内部,优化网络性能的一个重要步骤是配置 RPS (Receive Packet Steering),将网络接收处理分散到多个 CPU core 上,以提高网络吞吐量。
  • 通过以下命令将 enp0s9 网卡的 RPS CPU mask 设置为 ffffffff,表示允许32 CPU core 处理网络接收任务。
1
2
3
for i in /sys/class/net/enp0s9/queues/rx-*/rps_cpus; do
echo ffffffff > $i
done
  • 通过上述命令,成功配置了 RPS,使得 enp0s9 网卡的网络接收处理可以分散到多个 CPU core 上,从而提高了虚拟机的网络性能。

3. 测试脚本

  • Server 端:
1
2
3
4
5
6
7
8
9
10
#!/usr/bin/bash
core=32
for((i=0;i<$core;i++))
do
let port=3000+$i
let cpu=$i%$core
echo "$port is listening on $cpu"
numactl --physcpubind=$cpu iperf3 -s -D -p $port
done
echo "please use pkill iperf3 to stop server"
  • Client 端:
1
2
3
4
5
6
7
8
9
10
11
12
13
14
#!/bin/sh
#core=`cat /proc/cpuinfo|grep "processor"|wc -l`
core=32
ip=$1
for((i=0;i<$core;i++))
do
let port=3000+$i
let cpu=$i%$core
let time=3600
echo "$port is connected on cpu $cpu"
numactl --physcpubind=$cpu iperf3 -c $ip -p $port -t $time -P 3 > iperf-tcp-$i.log &
sleep 0.3
done
echo "after running $time sec iperf3 client will stop"