Now this problem was a little unique in the Cisco VSS setup. Last week some of the HP ILO connectivity, IPS connectivity, ASA connectivity and Cisco Wireless connectivity was getting lost intermittently. This connectivity lost was only for the management VLAN.
Although we got one of the new sites connectivity last week but somehow
i thought this is something not related to these issues.
All these devices were working fine but the management port reachablilty was an issues. we tracked the ports and found that the ports stop pinging sometimes within 5 minutes or continue to work for next few hours.
Based on my experience, when we checked that the ping response starts coming back the moment we do a clear arp int vlan the solution seemed to be evident.
I logged in and checked that the interface VLAN ARP timeout is 4 hours and mac-address aging timer is by default 5 minutes so i altered the mac-address aging timer to 4 hours or 14400 seconds.
After this tried clearing the arp, the issue persists. P{ing works for 30-40 Minutes interval and then again connectivity goes.
We verified the DFC, SUP and Modules software against known bugs.
CORESW01#sh mod switch 1
Switch Number: 1 Role: Virtual Switch Active
---------------------- -----------------------------
Mod Ports Card Type Model Serial No.
--- ----- -------------------------------------- ------------------ -----------
1 8 CEF720 8 port 10GE with DFC WS-X6708-10GE SAL13442G92
3 48 CEF720 48 port 10/100/1000mb Ethernet WS-X6748-GE-TX SAL1425KZ9U
5 5 Supervisor Engine 720 10GE (Active) VS-S720-10G SAL1426LNQC
7 8 Intrusion Detection System WS-SVC-IDSM-2 SAL1423K01P
8 6 Firewall Module WS-SVC-FWM-1 SAL1419HLQ0
Mod MAC addresses Hw Fw Sw Status
--- ---------------------------------- ------ ------------ ------------ -------
1 0026.9925.bf58 to 0026.9925.bf5f 2.1 12.2(18r)S1 12.2(33)SXI4 Ok
3 c84c.7570.0fa0 to c84c.7570.0fcf 3.4 12.2(18r)S1 12.2(33)SXI4 Ok
5 0026.cb61.4b48 to 0026.cb61.4b4f 3.2 8.5(4) 12.2(33)SXI4 Ok
7 5475.d062.6160 to 5475.d062.6167 6.5 7.2(1) 7.0(4)E4 Ok
8 5475.d062.4bb8 to 5475.d062.4bbf 4.5 7.2(1) 4.0(12) Ok
Mod Sub-Module Model Serial Hw Status
---- --------------------------- ------------------ ----------- ------- -------
1 Distributed Forwarding Card WS-F6700-DFC3C SAL13442GEF 1.4 Ok
3 Distributed Forwarding Card WS-F6700-DFC3C SAL1426L9Y9 1.4 Ok
5 Policy Feature Card 3 VS-F6K-PFC3C SAL1426LM7Y 1.1 Ok
5 MSFC3 Daughterboard VS-F6K-MSFC3 SAL1426LMXY 5.0 Ok
7 IDS 2 accelerator board WS-SVC-IDSUPG 71100440010 2.5 Ok
Mod Online Diag Status
---- -------------------
1 Pass
3 Pass
5 Pass
7 Pass
8 Pass
Then we thought of taking a 10 minutes Wireshark output and diverted the problematic interface to the interface in which our laptop was there.
After analyzing the dump, we found that there was nothing suspicious and the device or host was sending the reply of the ping as well.
We tried to see the mac of the device is available on what all interfaces.
sh mac- address d8d3.8561.0991 all de
========================================
PI_E RM RMA Type Alw-Lrn Trap Modified Notify Capture Flood Mac Address Age Pvlan SWbits Index XTag
----+---+---+----+-------+----+--------+------+-------+------+--------------+----+------+------+------+----
switch 1 Module 1:
Yes Yes Yes DY No No Yes No No No d8d3.8561.0991 0x9F 100 0 0x102E 0
switch 1 Module 3:
Yes Yes Yes DY No No Yes No No No d8d3.8561.0991 0x1F 100 0 0x102E 0
Supervisor switch 1 Module 5
No Yes No DY No No Yes No No No d8d3.8561.0991 0xFB 100 0 0x102E 0
switch 2 Module 1:
Yes Yes Yes DY No No Yes No No No d8d3.8561.0991 0xAC 100 0 0x102E 0
switch 2 Module 3:
Yes Yes Yes DY No No Yes No No No d8d3.8561.0991 0xFF 100 0 0x102E 0
Supervisor switch 2 Module 5
No Yes No DY No No Yes No No No d8d3.8561.0991 0xFA 100 0 0x102E 0
sh mac- add d8d3.8561.0991
This output showed that Switch 2 Module 1 age is getting 0 when this issue is coming.
Legend: * - primary entry
age - seconds since last seen
n/a - not available
vlan mac address type learn age ports
------+----------------+--------+-----+----------+--------------------------
switch 1 Module 1:
* 100 d8d3.8561.0991 dynamic Yes 250 Po200
switch 1 Module 3:
* 100 d8d3.8561.0991 dynamic Yes 200 Po200
switch 2 Module 1:
* 100 d8d3.8561.0991 dynamic Yes 0 Po200
switch 2 Module 3:
* 100 d8d3.8561.0991 dynamic Yes 175 Po200
We gave this command to find out if mac is moving within the switch.
mac address-table notification mac-move
CORESW01#sh sw vir
Switch mode : Virtual Switch
Virtual switch domain number : 100
Local switch number : 1
Local switch operational role: Virtual Switch Active
Peer switch number : 2
Peer switch operational role : Virtual Switch Standby
After that we did a packet capture.
service internal
show platform capture elam trigger dbus ipv4 if L3_PT=ICMP IP_DA=10.1.1.100 ICMP_TYPE=0x8
sh plat cap elam start
In this way we found that the DMAC is 5475.d0e5.4500 which is not for the device we are inspecting.
We also did the no ip redirect in the VLAN facing the issue.
When we tracked this MAC we found that this MAC belonged to one of the interface of the ASA.
Then we logged in to the ASA and found the issue that the ASA was doing a proxy arp on this interface so we disabled the proxy arp on ASA Management Interface device.
sysopt noproxyarp management
Bingo. Problem Solved.
All these devices were working fine but the management port reachablilty was an issues. we tracked the ports and found that the ports stop pinging sometimes within 5 minutes or continue to work for next few hours.
Based on my experience, when we checked that the ping response starts coming back the moment we do a clear arp int vlan the solution seemed to be evident.
I logged in and checked that the interface VLAN ARP timeout is 4 hours and mac-address aging timer is by default 5 minutes so i altered the mac-address aging timer to 4 hours or 14400 seconds.
After this tried clearing the arp, the issue persists. P{ing works for 30-40 Minutes interval and then again connectivity goes.
We verified the DFC, SUP and Modules software against known bugs.
CORESW01#sh mod switch 1
Switch Number: 1 Role: Virtual Switch Active
---------------------- -----------------------------
Mod Ports Card Type Model Serial No.
--- ----- -------------------------------------- ------------------ -----------
1 8 CEF720 8 port 10GE with DFC WS-X6708-10GE SAL13442G92
3 48 CEF720 48 port 10/100/1000mb Ethernet WS-X6748-GE-TX SAL1425KZ9U
5 5 Supervisor Engine 720 10GE (Active) VS-S720-10G SAL1426LNQC
7 8 Intrusion Detection System WS-SVC-IDSM-2 SAL1423K01P
8 6 Firewall Module WS-SVC-FWM-1 SAL1419HLQ0
Mod MAC addresses Hw Fw Sw Status
--- ---------------------------------- ------ ------------ ------------ -------
1 0026.9925.bf58 to 0026.9925.bf5f 2.1 12.2(18r)S1 12.2(33)SXI4 Ok
3 c84c.7570.0fa0 to c84c.7570.0fcf 3.4 12.2(18r)S1 12.2(33)SXI4 Ok
5 0026.cb61.4b48 to 0026.cb61.4b4f 3.2 8.5(4) 12.2(33)SXI4 Ok
7 5475.d062.6160 to 5475.d062.6167 6.5 7.2(1) 7.0(4)E4 Ok
8 5475.d062.4bb8 to 5475.d062.4bbf 4.5 7.2(1) 4.0(12) Ok
Mod Sub-Module Model Serial Hw Status
---- --------------------------- ------------------ ----------- ------- -------
1 Distributed Forwarding Card WS-F6700-DFC3C SAL13442GEF 1.4 Ok
3 Distributed Forwarding Card WS-F6700-DFC3C SAL1426L9Y9 1.4 Ok
5 Policy Feature Card 3 VS-F6K-PFC3C SAL1426LM7Y 1.1 Ok
5 MSFC3 Daughterboard VS-F6K-MSFC3 SAL1426LMXY 5.0 Ok
7 IDS 2 accelerator board WS-SVC-IDSUPG 71100440010 2.5 Ok
Mod Online Diag Status
---- -------------------
1 Pass
3 Pass
5 Pass
7 Pass
8 Pass
Then we thought of taking a 10 minutes Wireshark output and diverted the problematic interface to the interface in which our laptop was there.
After analyzing the dump, we found that there was nothing suspicious and the device or host was sending the reply of the ping as well.
We tried to see the mac of the device is available on what all interfaces.
sh mac- address d8d3.8561.0991 all de
========================================
PI_E RM RMA Type Alw-Lrn Trap Modified Notify Capture Flood Mac Address Age Pvlan SWbits Index XTag
----+---+---+----+-------+----+--------+------+-------+------+--------------+----+------+------+------+----
switch 1 Module 1:
Yes Yes Yes DY No No Yes No No No d8d3.8561.0991 0x9F 100 0 0x102E 0
switch 1 Module 3:
Yes Yes Yes DY No No Yes No No No d8d3.8561.0991 0x1F 100 0 0x102E 0
Supervisor switch 1 Module 5
No Yes No DY No No Yes No No No d8d3.8561.0991 0xFB 100 0 0x102E 0
switch 2 Module 1:
Yes Yes Yes DY No No Yes No No No d8d3.8561.0991 0xAC 100 0 0x102E 0
switch 2 Module 3:
Yes Yes Yes DY No No Yes No No No d8d3.8561.0991 0xFF 100 0 0x102E 0
Supervisor switch 2 Module 5
No Yes No DY No No Yes No No No d8d3.8561.0991 0xFA 100 0 0x102E 0
sh mac- add d8d3.8561.0991
This output showed that Switch 2 Module 1 age is getting 0 when this issue is coming.
Legend: * - primary entry
age - seconds since last seen
n/a - not available
vlan mac address type learn age ports
------+----------------+--------+-----+----------+--------------------------
switch 1 Module 1:
* 100 d8d3.8561.0991 dynamic Yes 250 Po200
switch 1 Module 3:
* 100 d8d3.8561.0991 dynamic Yes 200 Po200
switch 2 Module 1:
* 100 d8d3.8561.0991 dynamic Yes 0 Po200
switch 2 Module 3:
* 100 d8d3.8561.0991 dynamic Yes 175 Po200
We gave this command to find out if mac is moving within the switch.
mac address-table notification mac-move
CORESW01#sh sw vir
Switch mode : Virtual Switch
Virtual switch domain number : 100
Local switch number : 1
Local switch operational role: Virtual Switch Active
Peer switch number : 2
Peer switch operational role : Virtual Switch Standby
After that we did a packet capture.
service internal
show platform capture elam trigger dbus ipv4 if L3_PT=ICMP IP_DA=10.1.1.100 ICMP_TYPE=0x8
sh plat cap elam start
In this way we found that the DMAC is 5475.d0e5.4500 which is not for the device we are inspecting.
We also did the no ip redirect in the VLAN facing the issue.
When we tracked this MAC we found that this MAC belonged to one of the interface of the ASA.
Then we logged in to the ASA and found the issue that the ASA was doing a proxy arp on this interface so we disabled the proxy arp on ASA Management Interface device.
sysopt noproxyarp management
Bingo. Problem Solved.