我正在试验eBPF和网络命名空间,因为我最终想要过滤Kubernetes容器之间的通信量。但我发现这不管用。我使用的是eBPF和https://github.com/tjcw/xdp-tutorial/tree/master/ebpf-filter上常用的测试用例中的用户代码,当我运行https://github.com/tjcw/xdp-tutorial/blob/master/ebpf-filter/runns.sh时,会出现以下故障
libbpf: elf: skipping unrecognized data section(7) xdp_metadata
libxdp: No bpffs found at /sys/fs/bpf
libxdp: Compatibility check for dispatcher program failed: No such file or directory
libxdp: Falling back to loading single prog without dispatcher
libbpf: specified path /sys/fs/bpf/accept_map is not on BPF FS
libbpf: map 'accept_map': failed to auto-pin at '/sys/fs/bpf/accept_map': -22
libbpf: map 'accept_map': failed to create: Invalid argument(-22)
libbpf: failed to load object './af_xdp_kern.o'
ERROR:xdp_program__attach returns -22
这看起来是在Ubuntu22.04中使用内核时,eBPF与网络命名空间不兼容;研究文件系统类型显示
tjcw@tjcw-Standard-PC-Q35-ICH9-2009:~/workspace/xdp-tutorial/ebpf-filter$ sudo ip netns exec ns1 bash
root@tjcw-Standard-PC-Q35-ICH9-2009:/home/tjcw/workspace/xdp-tutorial/ebpf-filter# cd /sys/fs
root@tjcw-Standard-PC-Q35-ICH9-2009:/sys/fs# ls
bpf cgroup ecryptfs ext4 fuse pstore
root@tjcw-Standard-PC-Q35-ICH9-2009:/sys/fs# df .
Filesystem 1K-blocks Used Available Use% Mounted on
ns1 0 0 0 - /sys
root@tjcw-Standard-PC-Q35-ICH9-2009:/sys/fs# cd bpf
root@tjcw-Standard-PC-Q35-ICH9-2009:/sys/fs/bpf# df .
Filesystem 1K-blocks Used Available Use% Mounted on
ns1 0 0 0 - /sys
root@tjcw-Standard-PC-Q35-ICH9-2009:/sys/fs/bpf# ls
root@tjcw-Standard-PC-Q35-ICH9-2009:/sys/fs/bpf#
在网络命名空间中,以及
tjcw@tjcw-Standard-PC-Q35-ICH9-2009:~/workspace/xdp-tutorial/ebpf-filter$ sudo bash
root@tjcw-Standard-PC-Q35-ICH9-2009:/home/tjcw/workspace/xdp-tutorial/ebpf-filter# df /sys/fs
Filesystem 1K-blocks Used Available Use% Mounted on
sysfs 0 0 0 - /sys
root@tjcw-Standard-PC-Q35-ICH9-2009:/home/tjcw/workspace/xdp-tutorial/ebpf-filter# df /sys/fs/bpf
Filesystem 1K-blocks Used Available Use% Mounted on
bpf 0 0 0 - /sys/fs/bpf
root@tjcw-Standard-PC-Q35-ICH9-2009:/home/tjcw/workspace/xdp-tutorial/ebpf-filter# ls -l /sys/fs/bpf
total 0
drwx------ 2 root root 0 Nov 11 12:30 snap
drwx------ 3 root root 0 Nov 11 15:17 xdp
root@tjcw-Standard-PC-Q35-ICH9-2009:/home/tjcw/workspace/xdp-tutorial/ebpf-filter#
在根命名空间中。eBPF真的与网络名称空间不兼容,还是我配置错了或误解了什么?我在VM中运行,而VM又在我的膝上型计算机上运行。Ubuntu22.04内核是5.15.0-52-泛型.
有人建议我应该将bpf文件系统挂载到命名空间中,因此我使测试脚本的开始看起来像
#!/bin/bash -x
ip netns add ns1
ip netns exec ns1 mount -t bpf bpf /sys/fs/bpf
ip netns exec ns1 df /sys/fs/bpf
ip netns add ns2
ip netns exec ns2 mount -t bpf bpf /sys/fs/bpf
ip netns exec ns2 df /sys/fs/bpf
但这对我没用,我
+ ip netns add ns1
+ ip netns exec ns1 mount -t bpf bpf /sys/fs/bpf
+ ip netns exec ns1 df /sys/fs/bpf
Filesystem 1K-blocks Used Available Use% Mounted on
ns1 0 0 0 - /sys
+
还有人建议我看看https://github.com/cilium/cilium、https://isovalent.com/以及Linux源代码工具/test/selftests/bpf中的测试,以获取灵感。这就是我接下来要尝试的。
在尝试了最初的答案(下面)之后,我得到了一些Destination Host Unreachable
消息。我的AF_DXP程序会加载,但是pings不会通过。这是我的测试用例脚本
ip netns delete ns1
ip netns delete ns2
sleep 2
ip netns add ns1
ip netns add ns2
ip link add veth1 type veth peer name vpeer1
ip link add veth2 type veth peer name vpeer2
ip link set veth1 up
ip link set veth2 up
ip link set vpeer1 netns ns1
ip link set vpeer2 netns ns2
ip link add br0 type bridge
ip link set br0 up
ip link set veth1 master br0
ip link set veth2 master br0
ip addr add 10.10.0.1/16 dev br0
iptables -P FORWARD ACCEPT
iptables -F FORWARD
ip netns exec ns2 ./runns2.sh &
ip netns exec ns1 ./runns1.sh
wait
使用助手脚本runns1.sh
#!/bin/bash -x
ip netns exec ns1 ip link set lo up
ip netns exec ns1 ip link set vpeer1 up
ip netns exec ns1 ip addr add 10.10.0.10/16 dev vpeer1
sleep 6
ip netns exec ns1 ping -c 10 10.10.0.20
和助手脚本runns2.sh
#!/bin/bash -x
ip link set lo up
ip link set vpeer2 up
ip addr add 10.10.0.20/16 dev vpeer2
ip link set dev vpeer2 xdpgeneric off
ip tuntap add mode tun tun0
ip link set dev tun0 down
ip link set dev tun0 addr 10.10.0.30/24
ip link set dev tun0 up
mount -t bpf bpf /sys/fs/bpf
df /sys/fs/bpf
ls -l /sys/fs/bpf
rm -f /sys/fs/bpf/accept_map /sys/fs/bpf/xdp_stats_map
if [[ -z "${LEAVE}" ]]
then
export LD_LIBRARY_PATH=/usr/local/lib
./af_xdp_user -S -d vpeer2 -Q 0 --filename ./af_xdp_kern.o &
ns2_pid=$!
sleep 20
kill -INT ${ns2_pid}
fi
wait
给予输出
+ ip netns delete ns1
+ ip netns delete ns2
+ sleep 2
+ ip netns add ns1
+ ip netns add ns2
+ ip link add veth1 type veth peer name vpeer1
+ ip link add veth2 type veth peer name vpeer2
+ ip link set veth1 up
+ ip link set veth2 up
+ ip link set vpeer1 netns ns1
+ ip link set vpeer2 netns ns2
+ ip link add br0 type bridge
RTNETLINK answers: File exists
+ ip link set br0 up
+ ip link set veth1 master br0
+ ip link set veth2 master br0
+ ip addr add 10.10.0.1/16 dev br0
RTNETLINK answers: File exists
+ iptables -P FORWARD ACCEPT
+ iptables -F FORWARD
+ ip netns exec ns1 ./runns1.sh
+ ip netns exec ns2 ./runns2.sh
+ ip netns exec ns1 ip link set lo up
+ ip link set lo up
+ ip netns exec ns1 ip link set vpeer1 up
+ ip link set vpeer2 up
+ ip addr add 10.10.0.20/16 dev vpeer2
+ ip netns exec ns1 ip addr add 10.10.0.10/16 dev vpeer1
+ ip link set dev vpeer2 xdpgeneric off
+ ip tuntap add mode tun tun0
+ sleep 6
+ ip link set dev tun0 down
+ ip link set dev tun0 addr 10.10.0.30/24
"10.10.0.30/24" is invalid lladdr.
+ ip link set dev tun0 up
+ mount -t bpf bpf /sys/fs/bpf
+ df /sys/fs/bpf
Filesystem 1K-blocks Used Available Use% Mounted on
bpf 0 0 0 - /sys/fs/bpf
+ ls -l /sys/fs/bpf
total 0
+ rm -f /sys/fs/bpf/accept_map /sys/fs/bpf/xdp_stats_map
+ [[ -z '' ]]
+ export LD_LIBRARY_PATH=/usr/local/lib
+ LD_LIBRARY_PATH=/usr/local/lib
+ ns2_pid=3266
+ sleep 20
+ ./af_xdp_user -S -d vpeer2 -Q 0 --filename ./af_xdp_kern.o
main cfg.filename=./af_xdp_kern.o
main Opening program file ./af_xdp_kern.o
libbpf: elf: skipping unrecognized data section(8) .xdp_run_config
libbpf: elf: skipping unrecognized data section(9) xdp_metadata
main xdp_prog=0x56161aa476b0
main bpf_object=0x56161aa44490
libbpf: elf: skipping unrecognized data section(7) xdp_metadata
libbpf: elf: skipping unrecognized data section(7) xdp_metadata
+ ip netns exec ns1 ping -c 10 10.10.0.20
xsk_socket__create_shared_named_prog returns 0
bpf_map_update_elem(9,0x7ffef63436d0,0x7ffef63436e4,0)
bpf_map_update_elem returns 0
xsk_ring_prod__reserve returns 2048, XSK_RING_PROD__DEFAULT_NUM_DESCS is 2048
tun_read thread running
tun_read
0x0000 60 00 00 00 00 08 3a ff fe 80 00 00 00 00 00 00
0x0010 4c 45 17 e6 11 7c b7 4e ff 02 00 00 00 00 00 00
0x0020 00 00 00 00 00 00 00 02 85 00 50 41 00 00 00 00
addr=0x1fff100 len=90 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2049 free_count=0 frame=0x1fff100
addr=0x1ffe100 len=86 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2050 free_count=1 frame=0x1ffe100
addr=0x1ffd100 len=90 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2051 free_count=2 frame=0x1ffd100
addr=0x1ffc100 len=86 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2052 free_count=3 frame=0x1ffc100
addr=0x1ffb100 len=130 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2053 free_count=4 frame=0x1ffb100
addr=0x1ffa100 len=90 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2055 free_count=5 frame=0x1ffa100
addr=0x1ff9100 len=70 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2055 free_count=6 frame=0x1ff9100
addr=0x1ff8100 len=90 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2056 free_count=7 frame=0x1ff8100
addr=0x1ff7100 len=107 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2057 free_count=8 frame=0x1ff7100
addr=0x1ff6100 len=110 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2058 free_count=9 frame=0x1ff6100
addr=0x1ff5100 len=90 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2059 free_count=10 frame=0x1ff5100
addr=0x1ff4100 len=214 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2060 free_count=11 frame=0x1ff4100
addr=0x1ff3100 len=214 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2061 free_count=12 frame=0x1ff3100
addr=0x1ff2100 len=214 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2062 free_count=13 frame=0x1ff2100
addr=0x1ff1100 len=90 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2064 free_count=14 frame=0x1ff1100
addr=0x1ff0100 len=70 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2064 free_count=15 frame=0x1ff0100
addr=0x1fef100 len=202 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2065 free_count=16 frame=0x1fef100
addr=0x1fee100 len=107 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2066 free_count=17 frame=0x1fee100
addr=0x1fed100 len=90 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2067 free_count=18 frame=0x1fed100
addr=0x1fec100 len=202 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2068 free_count=19 frame=0x1fec100
tun_read
0x0000 60 00 00 00 00 08 3a ff fe 80 00 00 00 00 00 00
0x0010 4c 45 17 e6 11 7c b7 4e ff 02 00 00 00 00 00 00
0x0020 00 00 00 00 00 00 00 02 85 00 50 41 00 00 00 00
addr=0x1feb100 len=107 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2069 free_count=20 frame=0x1feb100
addr=0x1fea100 len=70 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2070 free_count=21 frame=0x1fea100
addr=0x1fe9100 len=202 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2071 free_count=22 frame=0x1fe9100
addr=0x1fe8100 len=70 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2072 free_count=23 frame=0x1fe8100
addr=0x1fe7100 len=42 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2073 free_count=24 frame=0x1fe7100
addr=0x1fe6100 len=42 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2074 free_count=25 frame=0x1fe6100
addr=0x1fe5100 len=42 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2075 free_count=26 frame=0x1fe5100
addr=0x1fe4100 len=107 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocaPING 10.10.0.20 (10.10.0.20) 56(84) bytes of data.
From 10.10.0.10 icmp_seq=1 Destination Host Unreachable
From 10.10.0.10 icmp_seq=2 Destination Host Unreachable
From 10.10.0.10 icmp_seq=3 Destination Host Unreachable
From 10.10.0.10 icmp_seq=4 Destination Host Unreachable
From 10.10.0.10 icmp_seq=5 Destination Host Unreachable
From 10.10.0.10 icmp_seq=6 Destination Host Unreachable
From 10.10.0.10 icmp_seq=7 Destination Host Unreachable
From 10.10.0.10 icmp_seq=8 Destination Host Unreachable
From 10.10.0.10 icmp_seq=9 Destination Host Unreachable
From 10.10.0.10 icmp_seq=10 Destination Host Unreachable
--- 10.10.0.20 ping statistics ---
10 packets transmitted, 0 received, +10 errors, 100% packet loss, time 9209ms
pipe 4
+ wait
+ kill -INT 3266
+ wait
tion_count=2076 free_count=27 frame=0x1fe4100
addr=0x1fe3100 len=42 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2077 free_count=28 frame=0x1fe3100
addr=0x1fe2100 len=42 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2078 free_count=29 frame=0x1fe2100
tun_read
0x0000 60 00 00 00 00 08 3a ff fe 80 00 00 00 00 00 00
0x0010 4c 45 17 e6 11 7c b7 4e ff 02 00 00 00 00 00 00
0x0020 00 00 00 00 00 00 00 02 85 00 50 41 00 00 00 00
addr=0x1fe1100 len=42 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2079 free_count=30 frame=0x1fe1100
addr=0x1fe0100 len=42 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2080 free_count=31 frame=0x1fe0100
addr=0x1fdf100 len=70 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2082 free_count=32 frame=0x1fdf100
addr=0x1fde100 len=70 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2082 free_count=33 frame=0x1fde100
addr=0x1fdd100 len=42 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2083 free_count=34 frame=0x1fdd100
addr=0x1fdc100 len=42 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2084 free_count=35 frame=0x1fdc100
addr=0x1fdb100 len=42 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2085 free_count=36 frame=0x1fdb100
addr=0x1fda100 len=107 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2086 free_count=37 frame=0x1fda100
addr=0x1fd9100 len=42 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2087 free_count=38 frame=0x1fd9100
addr=0x1fd8100 len=42 transmitted=0
xsk_free_umem_frame xsk=0x56161aa570f0 allocation_count=2088 free_count=39 frame=0x1fd8100
我想我已经看到脚本按预期运行了一次(第一个数据包丢失了,因为它被视为火星人,随后的数据包会通过),但即使在重新启动之后,我也无法立即运行此行为。
谢谢你能提供的一切帮助,
发布于 2022-11-17 16:32:35
“无法到达的目的地”消息是因为我没有在eBPF内核代码中处理ARP数据包。有了这个修复后,我的代码就可以正常工作了(流的第一个数据包被丢弃,所有后续的数据包都被传递)。我的运行脚本和助手脚本是
#!/bin/bash -x
ip netns delete ns1
ip netns delete ns2
sleep 2
ip netns add ns1
ip netns add ns2
ip link add veth1 type veth peer name vpeer1
ip link add veth2 type veth peer name vpeer2
ip link set veth1 up
ip link set veth2 up
ip link set vpeer1 netns ns1
ip link set vpeer2 netns ns2
ip link add br0 type bridge
ip link set br0 up
ip link set veth1 master br0
ip link set veth2 master br0
ip addr add 10.10.0.1/16 dev br0
iptables -P FORWARD ACCEPT
iptables -F FORWARD
ip netns exec ns2 ./runns2.sh &
ip netns exec ns1 ./runns1.sh
wait
runns1.sh:
#!/bin/bash -x
ip netns exec ns1 ip link set lo up
ip netns exec ns1 ip link set vpeer1 up
ip netns exec ns1 ip addr add 10.10.0.10/16 dev vpeer1
sleep 6
ip netns exec ns1 ping -c 10 10.10.0.20
runns2.sh:
#!/bin/bash -x
ip link set lo up
ip link set vpeer2 up
ip addr add 10.10.0.20/16 dev vpeer2
ip link set dev vpeer2 xdpgeneric off
ip tuntap add mode tun tun0
ip link set dev tun0 down
ip link set dev tun0 addr 10.10.0.30/24
ip link set dev tun0 up
mount -t bpf bpf /sys/fs/bpf
df /sys/fs/bpf
ls -l /sys/fs/bpf
rm -f /sys/fs/bpf/accept_map /sys/fs/bpf/xdp_stats_map
if [[ -z "${LEAVE}" ]]
then
export LD_LIBRARY_PATH=/usr/local/lib
./af_xdp_user -S -d vpeer2 -Q 0 --filename ./af_xdp_kern.o &
ns2_pid=$!
sleep 20
kill -INT ${ns2_pid}
fi
wait
发布于 2022-11-16 13:33:12
一位同事的回答:
> Something isn't working as I expect; it looks like the bpf file system does not mount in the network namespace. The start of my script is
> #!/bin/bash -x
> ip netns add ns1
> ip netns exec ns1 mount -t bpf bpf /sys/fs/bpf
> ip netns exec ns1 df /sys/fs/bpf
>
> and I think I should expect to see 'bpf' as the filesystem type for the 'df' command. However what I actually get is
> + ip netns add ns1
> + ip netns exec ns1 mount -t bpf bpf /sys/fs/bpf
> + ip netns exec ns1 df /sys/fs/bpf
> Filesystem 1K-blocks Used Available Use% Mounted on
> ns1 0 0 0 - /sys
> +
> and then my attempt to run the afxdp test case process fails as before. Any idea what I am doing wrong ?
Well, the problem is that 'ip' sets up a new mount namespace every thing
you do 'ip netns exec'. So the BPF mount doesn't stay across different
'exec' invocations.
This is a bit of an impedance mismatch between libxdp and 'ip netns'.
You can get around it by having multiple commands in a single script and
executing that script with 'ip netns exec', instead of doing multiple
'exec' commands.
One thing to be aware of here is that the fact that the mount goes away
also means all the pinned programs disappear; so if you load an XDP
program with libxdp, then exit the netns, and go back in, libxdp may
have trouble unloading the program. If you're running a single
application that uses AF_XDP, this should be much of an issue, though.
I guess we could also teach libxdp to try to mount the bpffs if it's not
already there...
https://serverfault.com/questions/1115453
复制相似问题