最近在测试vrrp功能的时候发现一个问题,就是主备同时在线的时候,在ping虚拟网关的时候,会出现下面的问题:
root@learningvpp2:~# ping 192.168.90.1
PING 192.168.90.1 (192.168.90.1) 56(84) bytes of data.
64 bytes from 192.168.90.1: icmp_seq=1 ttl=64 time=0.697 ms
64 bytes from 192.168.90.1: icmp_seq=1 ttl=63 time=3.90 ms (DUP!)
64 bytes from 192.168.90.1: icmp_seq=1 ttl=64 time=3.90 ms (DUP!)
64 bytes from 192.168.90.1: icmp_seq=1 ttl=63 time=3.90 ms (DUP!)
64 bytes from 192.168.90.1: icmp_seq=2 ttl=64 time=6.56 ms
64 bytes from 192.168.90.1: icmp_seq=2 ttl=64 time=6.56 ms (DUP!)
64 bytes from 192.168.90.1: icmp_seq=2 ttl=63 time=6.56 ms (DUP!)
64 bytes from 192.168.90.1: icmp_seq=2 ttl=63 time=6.56 ms (DUP!)
上网查询了一下原因,说是回复了重复的icmp回应报文。通过抓包发现确实存在问题回复了多个icmp relay报文地址。
怀疑是测试环境使用的VMware虚拟机使用网络连接lan区段原因,但是不确定。
还有一个原因可能是开启的混杂模式,vrrp环境中如果不开启混杂模式的话,vpp是无法收到icmp request报文的,按照作者vrrp特性描述中的说明,可能需要接口开启混杂模式,但是混杂模式下,又出现ping DUP的问题。特性描述如下:
VRRP virtual MAC address support:
- DPDK interfaces with PMD support for multiple MAC addresses
rte_eth_dev_mac_addr_add(),
rte_eth_dev_mac_addr_del()
- Other interfaces which are set in promiscuous mode may work
DPDK支持多mac地址对应的命令行就是:'set interface secondary-mac-address '命令允许在不改变默认MAC地址的情况下,在指定的接口上增加或删除额外的MAC地址。这可以允许发送到这些MAC地址的数据包被接收,而无需将接口设置为混杂模式。并不是所有接口都支持此操作。这样做的主要是硬件网卡,不过virtio也这样做。
vrrp就是使用上述命令行配置,但是在VMware虚拟机模式下,vpp接口无法收取报文,配置混杂模式后可以。不清楚硬件网卡是否存在此问题。
当前测试环境的时候,还存在一个问题就是当master设备从Backup状态切换会master状态时,会发送免费arp请求,当Backup收到报文后,仍然回复了arp relay报文。
04:27:59:333693: dpdk-input
GigabitEthernet2/4/0 rx queue 0
buffer 0x9721e: current data 0, length 60, buffer-pool 0, ref-count 1, trace handle 0x93
ext-hdr-valid
PKT MBUF: port 2, nb_segs 1, pkt_len 60
buf_len 2176, data_len 60, ol_flags 0x0, data_off 128, phys_addr 0x367c8800
packet_type 0x0 l2_len 0 l3_len 0 outer_l2_len 0 outer_l3_len 0
rss 0x0 fdir.hi 0x0 fdir.lo 0x0
IP4: 00:00:5e:00:01:01 -> 01:00:5e:00:00:12
VRRP: 192.168.90.100 -> 224.0.0.18
tos 0x00, ttl 255, length 32, checksum 0xc04e dscp CS0 ecn NON_ECN
fragment id 0x0000
04:27:59:333722: ethernet-input
frame: flags 0x3, hw-if-index 3, sw-if-index 3
IP4: 00:00:5e:00:01:01 -> 01:00:5e:00:00:12
04:27:59:333734: ip4-input-no-checksum
VRRP: 192.168.90.100 -> 224.0.0.18
tos 0x00, ttl 255, length 32, checksum 0xc04e dscp CS0 ecn NON_ECN
fragment id 0x0000
04:27:59:333741: vrrp4-accept-owner-input
IPv4 sw_if_index 3 192.168.90.100 -> 224.0.0.18
04:27:59:333758: vrrp4-input
VRRP: sw_if_index 3 IPv4
ver 3, type 1, VRID 1, prio 200, n_addrs 1, interval 100cs, csum 0x52f0
addresses: 192.168.90.1
04:27:59:334451: error-drop
rx:GigabitEthernet2/4/0
04:27:59:334471: drop
vrrp4-input: VRRP packets processed
Packet 149
04:27:59:333693: dpdk-input
GigabitEthernet2/4/0 rx queue 0
buffer 0x97245: current data 0, length 60, buffer-pool 0, ref-count 1, trace handle 0x94
ext-hdr-valid
PKT MBUF: port 2, nb_segs 1, pkt_len 60
buf_len 2176, data_len 60, ol_flags 0x0, data_off 128, phys_addr 0x367c91c0
packet_type 0x0 l2_len 0 l3_len 0 outer_l2_len 0 outer_l3_len 0
rss 0x0 fdir.hi 0x0 fdir.lo 0x0
ARP: 00:0c:29:a2:43:f5 -> ff:ff:ff:ff:ff:ff
request, type ethernet/IP4, address size 6/4
00:00:5e:00:01:01/192.168.90.1 -> 00:00:00:00:00:00/192.168.90.1
04:27:59:333722: ethernet-input
frame: flags 0x3, hw-if-index 3, sw-if-index 3
ARP: 00:0c:29:a2:43:f5 -> ff:ff:ff:ff:ff:ff
04:27:59:333737: arp-input
request, type ethernet/IP4, address size 6/4
00:00:5e:00:01:01/192.168.90.1 -> 00:00:00:00:00:00/192.168.90.1
04:27:59:333751: vrrp4-arp-input
address 0.1.8.0: vr_index 0 vr_id 1
04:27:59:334467: GigabitEthernet2/4/0-output
GigabitEthernet2/4/0
ARP: 00:0c:29:07:6f:b8 -> 00:0c:29:a2:43:f5
reply, type ethernet/IP4, address size 6/4
00:00:5e:00:01:01/192.168.90.1 -> 00:00:5e:00:01:01/192.168.90.1
04:27:59:334474: GigabitEthernet2/4/0-tx
GigabitEthernet2/4/0 tx queue 0
buffer 0x97245: current data 0, length 60, buffer-pool 0, ref-count 1, trace handle 0x94
ext-hdr-valid
l2-hdr-offset 0 l3-hdr-offset 14
PKT MBUF: port 2, nb_segs 1, pkt_len 60
buf_len 2176, data_len 60, ol_flags 0x0, data_off 128, phys_addr 0x367c91c0
packet_type 0x0 l2_len 0 l3_len 0 outer_l2_len 0 outer_l3_len 0
rss 0x0 fdir.hi 0x0 fdir.lo 0x0
ARP: 00:0c:29:07:6f:b8 -> 00:0c:29:a2:43:f5
reply, type ethernet/IP4, address size 6/4
00:00:5e:00:01:01/192.168.90.1 -> 00:00:5e:00:01:01/192.168.90.1
因Master设备配置了抢占模式,所以master设备接口up后,会发送vrrp通告报文和免费arp请求报文,Back设备收到后,会从master状态切换会Back状态。在代码中调用了vl_api_rpc_call_main_thread函数发送到main核来处理消息。可能就存在back设备还未从Master设备切换程back状态就收到了免费arp request报文,而回复了arp relay报文。如上面trace流程所示。
/* signal main thread to process contents of packet */
args0.vr_index = vr0 - vmp->vrs;
args0.pkt = vrrp0;#可能存在问题。
vl_api_rpc_call_main_thread (vrrp_input_process, (u8 *) &args0,
sizeof (args0));
个人感觉在多核模式下,还是存在问题,vrrp0是一个指针,指向vlib_buffer_t缓存区的data地方。rpc消息发送到main线程处理的时候,worker的缓存区可能都已经释放了。
vpp还提供一个接口就是set interface mac address表示变更接口的mac地址。使用接口mac地址change功能可以不必配置混杂模式解决上述问题。修改patch如下:
diff --git a/src/plugins/vrrp/node.c b/src/plugins/vrrp/node.c
index 7ba18c4f7..5ed52b4ad 100644
--- a/src/plugins/vrrp/node.c
+++ b/src/plugins/vrrp/node.c
@@ -343,6 +343,15 @@ vrrp_arp_nd_next (vlib_buffer_t * b, u32 * next_index, u32 * vr_index,
*next_index = VRRP_ARP_INPUT_NEXT_DROP;
return;
}
+ if (!vr || vr->runtime.state == VRRP_VR_STATE_MASTER)
+ {
+ if (arp->ip4_over_ethernet[0].ip4.as_u32 ==
+ arp->ip4_over_ethernet[1].ip4.as_u32)
+ {
+ clib_warning("vrrp maybe switch backup,not relay.");
+ *next_index = VRRP_ARP_INPUT_NEXT_DROP; return;
+ }
+ }
/* RFC 5798 section 6.4.3: Master "MUST respond" to ARP/ND. */
eth = ethernet_buffer_get_header (b);
diff --git a/src/plugins/vrrp/vrrp.c b/src/plugins/vrrp/vrrp.c
index 8461798e0..e4f0a94ec 100644
--- a/src/plugins/vrrp/vrrp.c
+++ b/src/plugins/vrrp/vrrp.c
@@ -117,13 +117,22 @@ vrrp_vr_transition_vmac (vrrp_vr_t * vr, vrrp_vr_state_t new_state)
/* enable only if current master vrs is 0, disable only if 0 or 1 */
if ((enable && !n_master_vrs) || (!enable && (n_master_vrs < 2)))
{
- clib_warning ("%s virtual MAC address %U on hardware interface %u",
+ clib_warning ("%s virtual MAC address %U %Uon hardware interface %u",
(enable) ? "Adding" : "Deleting",
format_ethernet_address, vr->runtime.mac.bytes,
+ format_ethernet_address, vr->runtime.hmac.bytes,
hw->hw_if_index);
-
- error = vnet_hw_interface_add_del_mac_address
- (vnm, hw->hw_if_index, vr->runtime.mac.bytes, enable);
+ if (enable)
+ {
+ memcpy_s(vr->runtime.hmac.bytes, 6, hw->hw_address, 6);
+ error = vnet_hw_interface_change_mac_address
+ (vnm, hw->hw_if_index, vr->runtime.mac.bytes);
+ }
+ else
+ {
+ error = vnet_hw_interface_change_mac_address
+ (vnm, hw->hw_if_index, vr->runtime.hmac.bytes);
+ }
}
if (error)
diff --git a/src/plugins/vrrp/vrrp.h b/src/plugins/vrrp/vrrp.h
index c93259219..3ab8beca6 100644
--- a/src/plugins/vrrp/vrrp.h
+++ b/src/plugins/vrrp/vrrp.h
@@ -98,6 +98,7 @@ typedef struct vrrp_vr_runtime
u16 skew;
u16 master_down_int;
mac_address_t mac;
+ mac_address_t hmac;
f64 last_sent;
u32 timer_index;
} vrrp_vr_runtime_t;
diff --git a/src/plugins/vrrp/vrrp_format.c b/src/plugins/vrrp/vrrp_format.c
index df9bf930b..521146eea 100644
--- a/src/plugins/vrrp/vrrp_format.c
+++ b/src/plugins/vrrp/vrrp_format.c
@@ -107,6 +107,8 @@ format_vrrp_vr (u8 * s, va_list * args)
s = format (s, " virtual MAC %U\n", format_ethernet_address,
&vr->runtime.mac);
+ s = format (s, " hw MAC %U\n", format_ethernet_address,
+ &vr->runtime.hmac);
s = format (s, " addresses %U\n", format_vrrp_vr_addrs,
(vr->config.flags & VRRP_VR_IPV6) != 0, vr->config.vr_addrs);
diff --git a/src/plugins/vrrp/vrrp_packet.c b/src/plugins/vrrp/vrrp_packet.c
index 89a6ede60..5a79792a0 100644
--- a/src/plugins/vrrp/vrrp_packet.c
+++ b/src/plugins/vrrp/vrrp_packet.c
@@ -467,7 +467,7 @@ vrrp4_garp_pkt_build (vrrp_vr_t * vr, vlib_buffer_t * b, ip4_address_t * ip4)
arp->opcode = clib_host_to_net_u16 (ETHERNET_ARP_OPCODE_request);
arp->ip4_over_ethernet[0].mac = vr->runtime.mac;
arp->ip4_over_ethernet[0].ip4 = *ip4;
- arp->ip4_over_ethernet[1].mac = broadcast_mac;
+ // arp->ip4_over_ethernet[1].mac = broadcast_mac;
arp->ip4_over_ethernet[1].ip4 = *ip4;
}
当vrrp是master的时候,会将vrrp接口的hw mac地址修改未vrrp虚mac。当切换未备的时候,再恢复回来。
DBGvpp# show vrrp vr
[0] sw_if_index 3 VR ID 1 IPv4
state Master flags: preempt yes accept yes unicast no
priority: configured 200 adjusted 200
timers: adv interval 100 master adv 100 skew 21 master down 321
virtual MAC 00:00:5e:00:01:01
hw MAC 00:0c:29:a2:43:f5
addresses 192.168.90.1
peer addresses
tracked interfaces
DBGvpp# show hardware-interfaces GigabitEthernet2/4/0
Name Idx Link Hardware
GigabitEthernet2/4/0 3 up GigabitEthernet2/4/0
Link speed: 1 Gbps
RX Queues:
queue thread mode
0 main (0) polling
Ethernet address 00:00:5e:00:01:01
Intel 82540EM (e1000)
但是上面存在一个问题就是一个物理口只能配置一个vrrp实例了。这样的修改就不是vrrp作者的意图。大家有更好的方案,欢迎一起讨论。
本文分享自 DPDK VPP源码分析 微信公众号,前往查看
如有侵权,请联系 cloudcommunity@tencent.com 删除。
本文参与 腾讯云自媒体同步曝光计划 ,欢迎热爱写作的你一起参与!