前往小程序,Get更优阅读体验!
立即前往
首页
学习
活动
专区
工具
TVP
发布
社区首页 >专栏 >【网工案例库】某数据中心局点二层网络对接思科Eth-trunk出现环路问题

【网工案例库】某数据中心局点二层网络对接思科Eth-trunk出现环路问题

作者头像
Ponnie
发布2023-09-04 17:42:41
4400
发布2023-09-04 17:42:41
举报
文章被收录于专栏:玉龙小栈玉龙小栈

原文链接:https://www.yuque.com/erik.zhao/trouble/lhhzukyw9zf8fxoc?singleDoc# 《某数据中心局点二层网络对接思科Eth-trunk出现环路问题》

问题描述

某数据中心局点二层网络对接思科Eth-trunk出现环路问题

【组网概述】

CE12800 M-LAG组网,与思科对接二层Eth-trunk,使用PVST破环协议。

【组网拓扑图】

【故障现象】

CE128与思科对接Eth-trunk后,发现SHN-P-MCA-SW01上Eth-Trunk17成员接口为Unselected状态。Eth-Trunk17成员接口分别为10GE7/0/19和10GE7/0/20,对端思科接口为Gi1/14和Gi1/15。期间现象为:Gi1/14物理口状态up,协议状态down;Gi1/15物理口状态down,协议状态down。后经排查,Gi1/15口为物理连线错误。在排查LACP协商状态时,代理商在思科交换机上将成员口逐个移出Port-Channel,在移除正常对接的接口后,业务中断。

处理过程

1、过程回放

21:13:00 :发现MCA01上Eth-Trunk成员口Unselected状态,开始定位原因;

21:13:00~21:19:00 :代理商继续移出成员口操作尝试解决Eth-Trunk对接问题

21:22:00 :发现长ping SA02交换机的Loopback地址不通;

21:22:00 :立刻执行关闭SA01交换机和MAC交换机互联接口;

21:23:20 :汇报领导,交换机二层打通操作影响业务;

21:22:00 ~ 21:24:30:陆续关闭SA01交换机和MAC交换机互联的8个接口;

21:24:35 :长ping SA02交换机Loopback地址恢复,网络恢复

客户业务侧反馈影响两个区的业务,6笔银联业务。

2、日志分析:CE日志分析(MCA01):

代码语言:javascript
复制
# 20:50:33 网线接好,MCA01设备Eth17、Eth18都UP起来了:
Nov  5 2022 20:50:33+08:00 SHN-P-MCA-SW01 %IFNET/2/linkDown_clear(l):CID=0x807a0405-alarmID=0x08520003-clearType=service_resume;The interface status changes. (ifName=Eth-Trunk18, AdminStatus=UP, OperStatus=UP, Reason=Interface physical link is up, mainIfname=Eth-Trunk18)
Nov  5 2022 20:50:33+08:00 SHN-P-MCA-SW01 %IFNET/2/linkDown_clear(l):CID=0x807a0405-alarmID=0x08520003-clearType=service_resume;The interface status changes. (ifName=Eth-Trunk17, AdminStatus=UP, OperStatus=UP, Reason=Interface physical link is up, mainIfname=Eth-Trunk17)
代码语言:javascript
复制
# 20:53:19 MCA01设备Eth-Trunk17 LACP down,原因是Eth-Trunk17成员口10GE7/0/19收到的远端接口的信息不一致:
Nov  5 2022 20:53:19+08:00 SHN-P-MCA-SW01 %LACP/4/LACP_STATE_DOWN(l):CID=0x804804ba;The LACP state is down. (PortName=10GE7/0/19, TrunkName=Eth-Trunk17, LastReceivePacketTime=[2022-11-05 20:53:18:605+08:00], Reason=The remote portkey in the LACPDU received from this interface was different from other members. Please check the remote members bandwidths, duplex modes, or Eth-Trunk IDs.)
Nov  5 2022 20:53:19+08:00 SHN-P-MCA-SW01 %LACP/2/hwLacpNegotiateFailed_active(l):CID=0x807a0405-alarmID=0x09360000;The member of LAG negotiation failed. (TrunkIndex=5, PortIfIndex=77, TrunkId=17, TrunkName=Eth-Trunk17, PortName=10GE7/0/19, Reason=A link fault occurred or negotiation information synchronization failed.)
代码语言:javascript
复制
# 21:05:49 代理商shutdown/undo shutdown思科侧G1/14口,CE128 Eth17成员口10GE7/0/19物理DOWN后又恢复UP:
Nov  5 2022 21:05:49+08:00 SHN-P-MCA-SW01 %IFNET/2/linkDown_clear(l):CID=0x807a0405-alarmID=0x08520003-clearType=service_resume;The interface status changes. (ifName=10GE7/0/19, AdminStatus=UP, OperStatus=UP, Reason=Interface physical link is up, mainIfname=Eth-Trunk17)
Nov  5 2022 21:05:47+08:00 SHN-P-MCA-SW01 %IFNET/2/linkDown_active(l):CID=0x807a0405-alarmID=0x08520003;The interface status changes. (ifName=10GE7/0/19, AdminStatus=UP, OperStatus=DOWN, Reason=Interface physical link is down, mainIfname=Eth-Trunk17)
代码语言:javascript
复制
# 21:12:04 代理商shutdown/undo shutdown思科侧G1/15口, CE128 Eth17成员口10GE7/0/20网线物理DOWN后又恢复UP:
Nov  5 2022 21:12:04+08:00 SHN-P-MCA-SW01 %IFNET/2/linkDown_clear(l):CID=0x807a0405-alarmID=0x08520003-clearType=service_resume;The interface status changes. (ifName=10GE7/0/20, AdminStatus=UP, OperStatus=UP, Reason=Interface physical link is up, mainIfname=Eth-Trunk17)
Nov  5 2022 21:11:54+08:00 SHN-P-MCA-SW01 %IFNET/2/linkDown_active(l):CID=0x807a0405-alarmID=0x08520003;The interface status changes. (ifName=10GE7/0/20, AdminStatus=UP, OperStatus=DOWN, Reason=Interface physical link is down, mainIfname=Eth-Trunk17)
代码语言:javascript
复制
# 21:13:19 代理商重新shutdown/undo shutdown Eth17成员口之后,LACP依旧是DOWN的(未UP过)
Nov  5 2022 21:13:19+08:00 SHN-P-MCA-SW01 %LACP/2/hwLacpNegotiateFailed_active(l):CID=0x807a0405-alarmID=0x09360000;The member of LAG negotiation failed. (TrunkIndex=5, PortIfIndex=77, TrunkId=17, TrunkName=Eth-Trunk17, PortName=10GE7/0/19, Reason=A link fault occurred or negotiation information synchronization failed.)
Nov  5 2022 21:13:18+08:00 SHN-P-MCA-SW01 %LACP/4/LACP_STATE_DOWN(l):CID=0x804804ba;The LACP state is down. (PortName=10GE7/0/19, TrunkName=Eth-Trunk17, LastReceivePacketTime=[2022-11-05 21:13:18:728+08:00], Reason=The remote interface was not selected. Please check the remote interface s status and configurations.)
Nov  5 2022 21:12:08+08:00 SHN-P-MCA-SW01 %LACP/2/hwLacpNegotiateFailed_clear(l):CID=0x807a0405-alarmID=0x09360000-clearType=service_resume;Link negotiation failure is resumed. (TrunkIndex=5, PortIfIndex=78, TrunkId=17, TrunkName=Eth-Trunk17, PortName=10GE7/0/20, Reason=The link fault was rectified and negotiation information was synchronized.)
代码语言:javascript
复制
# LACP协商失败的原因是远端接口未加入聚合(加入聚合标记位为0)
Nov  5 2022 21:13:18.713+08:00 SHN-P-MCA-SW01 %LACP/6/PDU_STE_CHANGE(D):CID=0x804804ba;The Actor_State of received PDU packets changed. (PortName=10GE7/0/19, ReceivedPDUActorstate=10000000, ReceivedPDUPartnerstate=10111100, LocalActorstate=10111100)
Nov  5 2022 21:13:18.713+08:00 SHN-P-MCA-SW01 %LACP/6/MUX_STE_CHANGE(D):CID=0x804804ba;The state in the MUX state machine changes. (TrunkName=Eth-trunk17, PortName=10GE7/0/19, MuxOldStatus=COLLECTING_DISTRIBUTING, MuxNewStatus=DETACHED)
Nov  5 2022 21:13:18.718+08:00 SHN-P-MCA-SW01 %LACP/7/LACP_SELECT_REASON(D):CID=0x807a0405;The state of Eth-trunk17  s port 10GE7/0/19 is changed from SELECTED to UNSELECTED for the reason of RemoteWontAgg.
Nov  5 2022 21:13:18.728+08:00 SHN-P-MCA-SW01 %LACP/6/PDU_STE_CHANGE(D):CID=0x804804ba;The Actor_State of received PDU packets changed. (PortName=10GE7/0/19, ReceivedPDUActorstate=01000000, ReceivedPDUPartnerstate=00101100, LocalActorstate=10100000)
Nov  5 2022 21:13:19.685+08:00 SHN-P-MCA-SW01 %OPS/6/OPS_DIAG_USERDEFINED_INFORMATION(D):CID=0x80c2272b;2022-11-05 21:13:18+08:00;LACP negotiation failed because the remote interface was not selected. Please check the remote interface s status and configurations. (Interface=10GE7/0/19, Eth-Trunk17) (user="_lacp_mtp.py", session=52)
代码语言:javascript
复制
# 21:14:33 21:14:33 代理商shutdown思科侧G2/14、G2/15口,CE128 Eth18成员口10GE8/0/19、 10GE8/0/20物理DOWN:
Nov  5 2022 21:14:33+08:00 SHN-P-MCA-SW01 %IFNET/2/linkDown_active(l):CID=0x807a0405-alarmID=0x08520003;The interface status changes. (ifName=10GE8/0/20, AdminStatus=UP, OperStatus=DOWN, Reason=Interface physical link is down, mainIfname=Eth-Trunk18)
Nov  5 2022 21:14:33+08:00 SHN-P-MCA-SW01 %IFNET/2/linkDown_active(l):CID=0x807a0405-alarmID=0x08520003;The interface status changes. (ifName=10GE8/0/19, AdminStatus=UP, OperStatus=DOWN, Reason=Interface physical link is down, mainIfname=Eth-Trunk18)
代码语言:javascript
复制
# 21:19:39 代理商undo shutdown思科侧G2/14、G2/15口,CE128 Eth18成员口10GE8/0/19、 10GE8/0/20物理UP,Eth18 接口UP:
Nov  5 2022 21:19:39+08:00 SHN-P-MCA-SW01 %LACP/2/hwLacpTotalLinkLoss_clear(l):CID=0x807a0405-alarmID=0x09360001-clearType=service_resume;Link bandwidth lost totally is resumed. (TrunkIndex=6, TrunkIfIndex=238, TrunkId=18, TrunkName=Eth-Trunk18, Reason=Link is selected.)
Nov  5 2022 21:19:38+08:00 SHN-P-MCA-SW01 %IFNET/2/linkDown_clear(l):CID=0x807a0405-alarmID=0x08520003-clearType=service_resume;The interface status changes. (ifName=Eth-Trunk18, AdminStatus=UP, OperStatus=UP, Reason=Interface physical link is up, mainIfname=Eth-Trunk18)
Nov  5 2022 21:19:33+08:00 SHN-P-MCA-SW01 %IFNET/2/linkDown_clear(l):CID=0x807a0405-alarmID=0x08520003-clearType=service_resume;The interface status changes. (ifName=10GE8/0/20, AdminStatus=UP, OperStatus=UP, Reason=Interface physical link is up, mainIfname=Eth-Trunk18)
Nov  5 2022 21:19:29+08:00 SHN-P-MCA-SW01 %IFNET/2/linkDown_clear(l):CID=0x807a0405-alarmID=0x08520003-clearType=service_resume;The interface status changes. (ifName=10GE8/0/19, AdminStatus=UP, OperStatus=UP, Reason=Interface physical link is up, mainIfname=Eth-Trunk18)

3、CE日志分析(MCA02):

代码语言:javascript
复制
# 20:50:41 网线接好,MCA01设备Eth17、Eth18都UP起来了:
Nov  5 2022 20:50:41+08:00 SHN-P-MCA-SW02 %IFNET/2/linkDown_clear(l):CID=0x807a0405-alarmID=0x08520003-clearType=service_resume;The interface status changes. (ifName=Eth-Trunk17, AdminStatus=UP, OperStatus=UP, Reason=Interface physical link is up, mainIfname=Eth-Trunk17)
Nov  5 2022 20:50:41+08:00 SHN-P-MCA-SW02 %IFNET/2/linkDown_clear(l):CID=0x807a0405-alarmID=0x08520003-clearType=service_resume;The interface status changes. (ifName=Eth-Trunk18, AdminStatus=UP, OperStatus=UP, Reason=Interface physical link is up, mainIfname=Eth-Trunk18)
代码语言:javascript
复制
# 21:14:33~21:19:33 代理商shutdown思科G1/14、G1/5口之后,undo shutdown了两个接口,Eth18之后UP,LACP协商选中
Nov  5 2022 21:14:33+08:00 SHN-P-MCA-SW02 %IFNET/2/linkDown_active(l):CID=0x807a0405-alarmID=0x08520003;The interface status changes. (ifName=10GE8/0/20, AdminStatus=UP, OperStatus=DOWN, Reason=Interface physical link is down, mainIfname=Eth-Trunk18)
Nov  5 2022 21:14:33+08:00 SHN-P-MCA-SW02 %IFNET/2/linkDown_active(l):CID=0x807a0405-alarmID=0x08520003;The interface status changes. (ifName=10GE8/0/19, AdminStatus=UP, OperStatus=DOWN, Reason=Interface physical link is down, mainIfname=Eth-Trunk18)
……
Nov  5 2022 21:19:33+08:00 SHN-P-MCA-SW02 %IFNET/2/linkDown_clear(l):CID=0x807a0405-alarmID=0x08520003-clearType=service_resume;The interface status changes. (ifName=10GE8/0/20, AdminStatus=UP, OperStatus=UP, Reason=Interface physical link is up, mainIfname=Eth-Trunk18)
Nov  5 2022 21:19:33+08:00 SHN-P-MCA-SW02 %IFNET/2/linkDown_clear(l):CID=0x807a0405-alarmID=0x08520003-clearType=service_resume;The interface status changes. (ifName=10GE8/0/19, AdminStatus=UP, OperStatus=UP, Reason=Interface physical link is up, mainIfname=Eth-Trunk18)
Nov  5 2022 21:19:38+08:00 SHN-P-MCA-SW02 %LACP/2/hwLacpTotalLinkLoss_clear(l):CID=0x807a0405-alarmID=0x09360001-clearType=service_resume;Link bandwidth lost totally is resumed. (TrunkIndex=6, TrunkIfIndex=236, TrunkId=18, TrunkName=Eth-Trunk18, Reason=Link is selected.)
Nov  5 2022 21:19:38+08:00 SHN-P-MCA-SW02 %IFNET/2/linkDown_clear(l):CID=0x807a0405-alarmID=0x08520003-clearType=service_resume;The interface status changes. (ifName=Eth-Trunk18, AdminStatus=UP, OperStatus=UP, Reason=Interface physical link is up, mainIfname=Eth-Trunk18)
代码语言:javascript
复制
# 21:20:49~21:22:00 MCA01的Eth17、Eth18之间出现MAC漂移(漂移的MAC均为思科设备):
Nov  5 2022 21:22:00+08:00 SHN-P-MCA-SW02 %FEI_COMM/4/hwMflpVlanLoopAlarm_active(l):CID=0x807f0485-alarmID=0x095e0012;MAC flapping detected, VlanId = 133, Original-Port = Eth-Trunk18, Flapping port 1 = Eth-Trunk17, port 2 = -. Check the network connected to the interface learning a flapping MAC address : 0000-0c07-ac01.
Nov  5 2022 21:22:00+08:00 SHN-P-MCA-SW02 %FEI_COMM/4/hwMflpVlanLoopAlarm_active(l):CID=0x807f0485-alarmID=0x095e0012;MAC flapping detected, VlanId = 131, Original-Port = Eth-Trunk18, Flapping port 1 = Eth-Trunk17, port 2 = -. Check the network connected to the interface learning a flapping MAC address : 0000-0c07-ac01.
Nov  5 2022 21:21:41+08:00 SHN-P-MCA-SW02 %FEI_COMM/4/hwMflpVlanLoopAlarm_active(l):CID=0x807f0485-alarmID=0x095e0012;MAC flapping detected, VlanId = 132, Original-Port = Eth-Trunk18, Flapping port 1 = Eth-Trunk17, port 2 = -. Check the network connected to the interface learning a flapping MAC address : 0000-0c07-ac01.
Nov  5 2022 21:21:40+08:00 SHN-P-MCA-SW02 %FEI_COMM/4/hwMflpVlanLoopAlarm_active(l):CID=0x807f04e4-alarmID=0x095e0012;MAC flapping detected, VlanId = 152, Original-Port = Eth-Trunk18, Flapping port 1 = Eth-Trunk17, port 2 = -. Check the network connected to the interface learning a flapping MAC address : 0000-0c07-ac01.
Nov  5 2022 21:20:49+08:00 SHN-P-MCA-SW02 %FEI_COMM/4/hwMflpVlanLoopAlarm_active(l):CID=0x807f0485-alarmID=0x095e0012;MAC flapping detected, VlanId = 130, Original-Port = Eth-Trunk18, Flapping port 1 = Eth-Trunk17, port 2 = -. Check the network connected to the interface learning a flapping MAC address : ac1f-6b22-b8b8.

4、排查结果

代理商在排查Eth-trunk接口未选中问题时,根据之前经验在思科设备上将成员口移出Eth-tunk,由于二层物理口是UP的,与原二层Eth-trunk接口在VLAN1形成环路(VLAN1运行了客户业务),该操作触发了两台思科之间的接口err-disable。

明确根因

在排查接口Unselected状态时,代理商根据之前变更经验,在思科侧将成员口逐个移出聚合口(未shutdown接口),属于方案外的操作,该操作触发了两台思科之间的接口err-disable,影响了思科下挂网络的路由,最终导致了客户业务中断。

解决方案

紧急将变更涉及的所有接口Shutdown掉,业务恢复正常。

建议与总结

1、VLAN1是特殊的VLAN,建议不要运行业务; 2、Eth-trunk口在添加/移出成员口操作时,需要先将成员口shutdown后,再执行添加/移出操作; 3、RFC操作不要执行变更外的操作。

本文参与 腾讯云自媒体分享计划,分享自微信公众号。
原始发表:2023-06-12,如有侵权请联系 cloudcommunity@tencent.com 删除

本文分享自 玉龙网络新知社 微信公众号,前往查看

如有侵权,请联系 cloudcommunity@tencent.com 删除。

本文参与 腾讯云自媒体分享计划  ,欢迎热爱写作的你一起参与!

评论
登录后参与评论
0 条评论
热度
最新
推荐阅读
目录
  • 问题描述
  • 【组网概述】
  • 【组网拓扑图】
  • 【故障现象】
  • 处理过程
    • 1、过程回放
      • 2、日志分析:CE日志分析(MCA01):
        • 3、CE日志分析(MCA02):
          • 4、排查结果
          • 明确根因
          • 解决方案
          • 建议与总结
          相关产品与服务
          Elasticsearch Service
          腾讯云 Elasticsearch Service(ES)是云端全托管海量数据检索分析服务,拥有高性能自研内核,集成X-Pack。ES 支持通过自治索引、存算分离、集群巡检等特性轻松管理集群,也支持免运维、自动弹性、按需使用的 Serverless 模式。使用 ES 您可以高效构建信息检索、日志分析、运维监控等服务,它独特的向量检索还可助您构建基于语义、图像的AI深度应用。
          领券
          问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档