Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[chassis] 400g link goes down after reboot and never comes up #116

Open
arlakshm opened this issue Dec 16, 2024 · 1 comment
Open

[chassis] 400g link goes down after reboot and never comes up #116

arlakshm opened this issue Dec 16, 2024 · 1 comment

Comments

@arlakshm
Copy link

We are seeing an issue with the latest 2020405 image, where a 400g link fails after a reboot and does not recover. However, when we load last week's image on the same linecard, the device operates correctly and remains stable after a reboot.

Details of optics

admin@str3-7800-lc3-1:/usr/share/sonic/device$ show vers

SONiC Software Version: SONiC.20240510.18
SONiC OS Version: 12
Distribution: Debian 12.6
Kernel: 6.1.0-22-2-amd64
Build commit: a29e50aa51
Build date: Fri Dec 13 23:23:26 UTC 2024
Built by: azureuser@f00fcd22c000000

Platform: x86_64-arista_7800r3a_36dm2_lc
HwSKU: Arista-7800R3A-36DM2-D36
ASIC: broadcom
ASIC Count: 2
Serial Number: SGD22203294
Model Number: 7800R3A-36DM2-LC
Hardware Revision: 2a.05
Uptime: 21:07:28 up 50 min,  1 user,  load average: 2.08, 2.20, 2.25
Date: Sat 14 Dec 2024 21:07:28

Docker images:
REPOSITORY                 TAG           IMAGE ID       SIZE
docker-mux                 20240510.18   d646f3fbf191   366MB
docker-mux                 latest        d646f3fbf191   366MB
docker-macsec              latest        67a06b13593b   345MB
docker-sonic-telemetry     20240510.18   86633a3ee686   380MB
docker-sonic-telemetry     latest        86633a3ee686   380MB
docker-dhcp-server         latest        338636407fd1   338MB
docker-sonic-gnmi          20240510.18   d488984c4198   380MB
docker-sonic-gnmi          latest        d488984c4198   380MB
docker-eventd              20240510.18   34566d6833fd   314MB
docker-eventd              latest        34566d6833fd   314MB
docker-gbsyncd-broncos     20240510.18   c082e85b7203   352MB
docker-gbsyncd-broncos     latest        c082e85b7203   352MB
docker-gbsyncd-credo       20240510.18   d4c1432ff740   326MB
docker-gbsyncd-credo       latest        d4c1432ff740   326MB
docker-dhcp-relay          latest        7e376aaa0f22   326MB
docker-orchagent           20240510.18   c88f7e5d87de   356MB
docker-orchagent           latest        c88f7e5d87de   356MB
docker-fpm-frr             20240510.18   c958e6ebb126   368MB
docker-fpm-frr             latest        c958e6ebb126   368MB
docker-snmp                20240510.18   223e75162bbc   354MB
docker-snmp                latest        223e75162bbc   354MB
docker-platform-monitor    20240510.18   b850018fc1b4   434MB
docker-platform-monitor    latest        b850018fc1b4   434MB
docker-teamd               20240510.18   05565e6b7aaa   343MB
docker-teamd               latest        05565e6b7aaa   343MB
docker-lldp                20240510.18   6467d5a324b1   360MB
docker-lldp                latest        6467d5a324b1   360MB
docker-acms                20240510.18   56c2a0603509   365MB
docker-acms                latest        56c2a0603509   365MB
docker-database            20240510.18   5f8c00648c37   323MB
docker-database            latest        5f8c00648c37   323MB
docker-sonic-restapi       20240510.18   5c5c3a03404b   333MB
docker-sonic-restapi       latest        5c5c3a03404b   333MB
docker-router-advertiser   20240510.18   6287ecc27e8d   315MB
docker-router-advertiser   latest        6287ecc27e8d   315MB
docker-syncd-brcm-dnx      20240510.18   93a61d588709   703MB
docker-syncd-brcm-dnx      latest        93a61d588709   703MB
k8s.gcr.io/pause           3.5           ed210e3e4a5b   683kB

admin@str3-7800-lc3-1:/usr/share/sonic/device$ show interface transceiver  presence  Ethernet176
Port         Presence
-----------  -----------
Ethernet176  Not present

admin@str3-7800-lc3-1:/usr/share/sonic/device$ sudo sfputil show error-status
Port         Error Status
-----------  --------------
Ethernet0    OK
Ethernet8    OK
Ethernet16   OK
Ethernet24   OK
Ethernet32   OK
Ethernet40   OK
Ethernet48   OK
Ethernet56   OK
Ethernet64   OK
Ethernet72   OK
Ethernet80   OK
Ethernet88   OK
Ethernet96   OK
Ethernet104  OK
Ethernet112  OK
Ethernet120  OK
Ethernet128  OK
Ethernet136  OK
Ethernet144  OK
Ethernet152  OK
Ethernet160  OK
Ethernet168  OK
Ethernet176  OK
Ethernet184  OK
Ethernet192  OK
Ethernet200  OK
Ethernet208  OK
Ethernet216  OK
Ethernet224  OK
Ethernet232  OK
Ethernet240  OK
Ethernet248  OK
Ethernet256  OK
Ethernet264  OK
Ethernet272  Unplugged
Ethernet280  Unplugged
admin@str3-7800-lc3-1:/usr/share/sonic/device$ sudo sfputil show error-status -p Ethernet176
Port         Error Status
-----------  --------------
Ethernet176
Ethernet176  OK
admin@str3-7800-lc3-1:/usr/share/sonic/device$ sudo sfputil show error-status -p Ethernet176
Port         Error Status
-----------  --------------
Ethernet176
Ethernet176  OK
admin@str3-7800-lc3-1:/usr/share/sonic/device$ sudo sfputil show error-status -p Ethernet0
Port       Error Status
---------  --------------
Ethernet0  OK
Ethernet0
admin@str3-7800-lc3-1:/usr/share/sonic/device$ sudo sfputil show error-status -p Ethernet176
Port         Error Status
-----------  --------------
Ethernet176
Ethernet176  OK
admin@str3-7800-lc3-1:/usr/share/sonic/device$ sudo sfputil show eeprom -p Ethernet176
Ethernet176: SFP EEPROM detected
        Active App Selection Host Lane 1: 1
        Active App Selection Host Lane 2: 1
        Active App Selection Host Lane 3: 1
        Active App Selection Host Lane 4: 1
        Active App Selection Host Lane 5: 1
        Active App Selection Host Lane 6: 1
        Active App Selection Host Lane 7: 1
        Active App Selection Host Lane 8: 1
        Application Advertisement: 400GAUI-8 C2M (Annex 120E) - Host Assign (0x1) - Active Cable assembly with BER < 5x10^-5 - Media Assign (0x1)
                                   CAUI-4 C2M (Annex 83E) - Host Assign (0x1) - Active Cable assembly with BER < 5x10^-5 - Media Assign (0x1)
        CMIS Revision: 4.0
        Connector: No separable connector
        Encoding: N/A
        Extended Identifier: Power Class 5 (10.0W Max)
        Extended RateSelect Compliance: N/A
        Hardware Revision: 1.0
        Host Electrical Interface: 400GAUI-8 C2M (Annex 120E)
        Host Lane Assignment Options: 1
        Host Lane Count: 8
        Identifier: QSFP-DD Double Density 8X Pluggable Transceiver
        Length Cable Assembly(m): 5.0
        Media Interface Code: Active Cable assembly with BER < 5x10^-5
        Media Interface Technology: 850 nm VCSEL
        Media Lane Assignment Options: 1
        Media Lane Count: 8
        Nominal Bit Rate(100Mbs): 0
        Specification compliance: active_cable_media_interface
        Vendor Date Code(YYYY-MM-DD Lot): 2021-06-19
        Vendor Name: Cloud Light
        Vendor OUI: a8-bc-9c
        Vendor PN: 7123-G37-05
        Vendor Rev: 02
        Vendor SN: AJ21230500


admin@str3-7800-lc3-1:/usr/share/sonic/device$ sudo sfputil show eeprom -p Ethernet176 --dom
Ethernet176: SFP EEPROM detected
        Active App Selection Host Lane 1: 1
        Active App Selection Host Lane 2: 1
        Active App Selection Host Lane 3: 1
        Active App Selection Host Lane 4: 1
        Active App Selection Host Lane 5: 1
        Active App Selection Host Lane 6: 1
        Active App Selection Host Lane 7: 1
        Active App Selection Host Lane 8: 1
        Application Advertisement: 400GAUI-8 C2M (Annex 120E) - Host Assign (0x1) - Active Cable assembly with BER < 5x10^-5 - Media Assign (0x1)
                                   CAUI-4 C2M (Annex 83E) - Host Assign (0x1) - Active Cable assembly with BER < 5x10^-5 - Media Assign (0x1)
        CMIS Revision: 4.0
        Connector: No separable connector
        Encoding: N/A
        Extended Identifier: Power Class 5 (10.0W Max)
        Extended RateSelect Compliance: N/A
        Hardware Revision: 1.0
        Host Electrical Interface: 400GAUI-8 C2M (Annex 120E)
        Host Lane Assignment Options: 1
        Host Lane Count: 8
        Identifier: QSFP-DD Double Density 8X Pluggable Transceiver
        Length Cable Assembly(m): 5.0
        Media Interface Code: Active Cable assembly with BER < 5x10^-5
        Media Interface Technology: 850 nm VCSEL
        Media Lane Assignment Options: 1
        Media Lane Count: 8
        Nominal Bit Rate(100Mbs): 0
        Specification compliance: active_cable_media_interface
        Vendor Date Code(YYYY-MM-DD Lot): 2021-06-19
        Vendor Name: Cloud Light
        Vendor OUI: a8-bc-9c
        Vendor PN: 7123-G37-05
        Vendor Rev: 02
        Vendor SN: AJ21230500
        ChannelMonitorValues:
                RX1Power: 3.002dBm
                RX2Power: 2.806dBm
                RX3Power: 2.6dBm
                RX4Power: 3.178dBm
                RX5Power: 2.231dBm
                RX6Power: 2.382dBm
                RX7Power: 2.228dBm
                RX8Power: 2.299dBm
                TX1Bias: 6.736mA
                TX1Power: 1.046dBm
                TX2Bias: 6.612mA
                TX2Power: 1.187dBm
                TX3Bias: 6.694mA
                TX3Power: 1.515dBm
                TX4Bias: 6.674mA
                TX4Power: 1.382dBm
                TX5Bias: 7.116mA
                TX5Power: 1.328dBm
                TX6Bias: 7.25mA
                TX6Power: 1.25dBm
                TX7Bias: 7.05mA
                TX7Power: 1.104dBm
                TX8Bias: 7.228mA
                TX8Power: 1.617dBm
        ChannelThresholdValues:
                RxPowerHighAlarm  : 6.0dBm
                RxPowerHighWarning: 5.5dBm
                RxPowerLowAlarm   : -11.506dBm
                RxPowerLowWarning : -8.502dBm
                TxBiasHighAlarm   : 11.0mA
                TxBiasHighWarning : 10.0mA
                TxBiasLowAlarm    : 4.5mA
                TxBiasLowWarning  : 5.5mA
                TxPowerHighAlarm  : 6.0dBm
                TxPowerHighWarning: 5.5dBm
                TxPowerLowAlarm   : -12.007dBm
                TxPowerLowWarning : -10.0dBm
        ModuleMonitorValues:
                Temperature: 38.781C
                Vcc: 3.334Volts
        ModuleThresholdValues:
                TempHighAlarm  : 80.0C
                TempHighWarning: 70.0C
                TempLowAlarm   : -10.0C
                TempLowWarning : 0.0C
                VccHighAlarm   : 3.5Volts
                VccHighWarning : 3.465Volts
                VccLowAlarm    : 3.1Volts
                VccLowWarning  : 3.135Volts

We tried the following steps to recover the link but it did not help

  • reboot linecard
  • reboot chassis
  • remove the media settting json and reboot
  • change enable_xcvrd_sff_mgr to false and reboot.
@arlakshm
Copy link
Author

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant