Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New RPI 256 SSD Kit Not found. brcm-pcie 1000110000.pcie: link down #6511

Open
odymaui opened this issue Dec 3, 2024 · 20 comments
Open

New RPI 256 SSD Kit Not found. brcm-pcie 1000110000.pcie: link down #6511

odymaui opened this issue Dec 3, 2024 · 20 comments

Comments

@odymaui
Copy link

odymaui commented Dec 3, 2024

Describe the bug

New Raspberry PI 256GB SSD kit is not recognized by the Raspberry Pi 5 and will not see the new HAT or nvme drive.

The two primary ways to try and "see" the m.2 HAT and nvme drive has been by using lspci and lsblk. Neither seem to see it. All attempts and seating any wire connections or drive connections have not helped.

After assuming the problem was with the RPi SSD kit, I got a new replacement kit and it too is not seen by the system.

The only error I see is system log is brcm-pcie 1000110000.pcie: link down.

You can see the output of sudo journalctl -b -g pci here: https://paste.debian.net/hidden/95dd02d6

Steps to reproduce the behaviour

There is a long thread in the RPI forum troubleshooting this issue: https://forums.raspberrypi.com/viewtopic.php?t=379680 without success.

Device (s)

Raspberry Pi 5

System

cat /etc/rpi-issue

Raspberry Pi reference 2024-11-19
Generated using pi-gen, https://github.com/RPi-Distro/pi-gen, 891df1e21ed2b6099a2e6a13e26c91dea44b34d4, stage4

vcgencmd version

2024/11/12 16:10:44
Copyright (c) 2012 Broadcom
version 4b019946 (release) (embedded)

uname -a
Linux haleakala 6.6.62+rpt-rpi-2712 #1 SMP PREEMPT Debian 1:6.6.62-1+rpt1 (2024-11-25) aarch64 GNU/Linux

Logs

Dec 02 19:47:46 haleakala kernel: Kernel command line: reboot=w coherent_pool=1M 8250.nr_uarts=1 pci=pcie_bus_safe cgroup_disable=memory numa_policy=interleave smsc95xx.macaddr=D8:3A:DD:F7:80:F9 vc_mem.mem_base=0x3fc00000 vc_mem.mem_size=0x40000000 console=ttyAMA10,115200 console=tty1 root=PARTUUID=c71db27b-02 rootfstype=ext4 fsck.repair=yes rootwait quiet splash plymouth.ignore-serial-consoles cfg80211.ieee80211_regdom=US
Dec 02 19:47:46 haleakala kernel: brcm-pcie 1000110000.pcie: host bridge /axi/pcie@110000 ranges:
Dec 02 19:47:46 haleakala kernel: brcm-pcie 1000110000.pcie: No bus range found for /axi/pcie@110000, using [bus 00-ff]
Dec 02 19:47:46 haleakala kernel: brcm-pcie 1000110000.pcie: MEM 0x1b80000000..0x1bffffffff -> 0x0080000000
Dec 02 19:47:46 haleakala kernel: brcm-pcie 1000110000.pcie: MEM 0x1800000000..0x1b7fffffff -> 0x0400000000
Dec 02 19:47:46 haleakala kernel: brcm-pcie 1000110000.pcie: IB MEM 0x0000000000..0x0fffffffff -> 0x1000000000
Dec 02 19:47:46 haleakala kernel: brcm-pcie 1000110000.pcie: Forcing gen 2
Dec 02 19:47:46 haleakala kernel: brcm-pcie 1000110000.pcie: PCI host bridge to bus 0000:00
Dec 02 19:47:46 haleakala kernel: brcm-pcie 1000110000.pcie: link down
Dec 02 19:47:46 haleakala kernel: pcieport 0000:00:00.0: PME: Signaling with IRQ 38
Dec 02 19:47:46 haleakala kernel: pcieport 0000:00:00.0: AER: enabled with IRQ 38
Dec 02 19:47:46 haleakala kernel: brcm-pcie 1000120000.pcie: host bridge /axi/pcie@120000 ranges:
Dec 02 19:47:46 haleakala kernel: brcm-pcie 1000120000.pcie: No bus range found for /axi/pcie@120000, using [bus 00-ff]
Dec 02 19:47:46 haleakala kernel: brcm-pcie 1000120000.pcie: MEM 0x1f00000000..0x1ffffffffb -> 0x0000000000
Dec 02 19:47:46 haleakala kernel: brcm-pcie 1000120000.pcie: MEM 0x1c00000000..0x1effffffff -> 0x0400000000
Dec 02 19:47:46 haleakala kernel: brcm-pcie 1000120000.pcie: IB MEM 0x1f00000000..0x1f003fffff -> 0x0000000000
Dec 02 19:47:46 haleakala kernel: brcm-pcie 1000120000.pcie: IB MEM 0x0000000000..0x0fffffffff -> 0x1000000000
Dec 02 19:47:46 haleakala kernel: brcm-pcie 1000120000.pcie: Forcing gen 2
Dec 02 19:47:46 haleakala kernel: brcm-pcie 1000120000.pcie: PCI host bridge to bus 0000:00
Dec 02 19:47:46 haleakala kernel: brcm-pcie 1000120000.pcie: link up, 5.0 GT/s PCIe x4 (!SSC)

Additional context

Any suggestions for how to troubleshoot / resolve this issue would be greatly appreciated.

@P33M
Copy link
Contributor

P33M commented Dec 3, 2024

Is this a cross-post from the forums, or are you seeing the same symptoms in a different context?

@odymaui
Copy link
Author

odymaui commented Dec 3, 2024

It was suggested I created this bug as a result of the forum post which could not be resolved.

@pelwell
Copy link
Contributor

pelwell commented Dec 3, 2024

Have you tried without whatever the bit of hardware mounted underneath the Pi 5 is? It always helps to reduce a system to the simplest configuration when debugging a problem.

@odymaui
Copy link
Author

odymaui commented Dec 3, 2024

If you look at the last entry of the post, the pi and the hat are the only two on the system. I did a fresh install of the OS. it should be as reduced as possible.

@6by9
Copy link
Contributor

6by9 commented Dec 3, 2024

It's difficult to say for certain, but in the image attached to https://forums.raspberrypi.com/viewtopic.php?p=2273870#p2273870 it looks like the FFC connector on the M2 HAT isn't closed. There appears to be a gap between the black of the clamp and the silver of the main connector.
Simplest test would be to pull on that cable gently and see if it comes out.

@odymaui
Copy link
Author

odymaui commented Dec 3, 2024

I clamped it down tight and rebooted but no change. I still can't see the drive and the link down error message shows in the log.

I have tried fiddling will all the connections etc with no success.

@P33M
Copy link
Contributor

P33M commented Dec 3, 2024

In that case the common denominator is your Pi 5. I would recommend removing the HAT and active cooler and carefully checking both sides of the board for missing or damaged components.

@odymaui
Copy link
Author

odymaui commented Dec 3, 2024

I think everything looks okay in that I don't see an obvious difference compared to the rest of the board. I assume you are referring to the PCIe slot? Here are front and back views. Do you see something?
brd_1
brd_2

@pelwell
Copy link
Contributor

pelwell commented Dec 3, 2024

The underside of the board is more vulnerable - any chance of an in-focus shot of that?

@odymaui
Copy link
Author

odymaui commented Dec 3, 2024

Below is a shot of the bottom of the board. Also, I edited the firmware config.txt and removed pci* dtparams and rebooted. There is no link down error message.
brd_3

gramps@haleakala:~ $ sudo dmesg | grep -i pci
[ 0.000000] Kernel command line: reboot=w coherent_pool=1M 8250.nr_uarts=1 pci=pcie_bus_safe cgroup_disable=memory numa_policy=interleave smsc95xx.macaddr=D8:3A:DD:F7:80:F9 vc_mem.mem_base=0x3fc00000 vc_mem.mem_size=0x40000000 console=ttyAMA10,115200 console=tty1 root=PARTUUID=c71db27b-02 rootfstype=ext4 fsck.repair=yes rootwait quiet splash plymouth.ignore-serial-consoles cfg80211.ieee80211_regdom=US
[ 0.036904] PCI: CLS 0 bytes, default 64
[ 0.259906] brcm-pcie 1000120000.pcie: host bridge /axi/pcie@120000 ranges:
[ 0.259910] brcm-pcie 1000120000.pcie: No bus range found for /axi/pcie@120000, using [bus 00-ff]
[ 0.259918] brcm-pcie 1000120000.pcie: MEM 0x1f00000000..0x1ffffffffb -> 0x0000000000
[ 0.259922] brcm-pcie 1000120000.pcie: MEM 0x1c00000000..0x1effffffff -> 0x0400000000
[ 0.259927] brcm-pcie 1000120000.pcie: IB MEM 0x1f00000000..0x1f003fffff -> 0x0000000000
[ 0.259931] brcm-pcie 1000120000.pcie: IB MEM 0x0000000000..0x0fffffffff -> 0x1000000000
[ 0.261128] brcm-pcie 1000120000.pcie: Forcing gen 2
[ 0.261326] brcm-pcie 1000120000.pcie: PCI host bridge to bus 0000:00
[ 0.261328] pci_bus 0000:00: root bus resource [bus 00-ff]
[ 0.261331] pci_bus 0000:00: root bus resource [mem 0x1f00000000-0x1ffffffffb] (bus address [0x00000000-0xfffffffb])
[ 0.261333] pci_bus 0000:00: root bus resource [mem 0x1c00000000-0x1effffffff pref] (bus address [0x400000000-0x6ffffffff])
[ 0.261342] pci 0000:00:00.0: [14e4:2712] type 01 class 0x060400
[ 0.261363] pci 0000:00:00.0: PME# supported from D0 D3hot
[ 0.261938] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 0.369123] brcm-pcie 1000120000.pcie: link up, 5.0 GT/s PCIe x4 (!SSC)
[ 0.369143] pci 0000:01:00.0: [1de4:0001] type 00 class 0x020000
[ 0.369156] pci 0000:01:00.0: reg 0x10: [mem 0xffffc000-0xffffffff]
[ 0.369163] pci 0000:01:00.0: reg 0x14: [mem 0xffc00000-0xffffffff]
[ 0.369169] pci 0000:01:00.0: reg 0x18: [mem 0xffff0000-0xffffffff]
[ 0.369232] pci 0000:01:00.0: supports D1
[ 0.369234] pci 0000:01:00.0: PME# supported from D0 D1 D3hot D3cold
[ 0.381127] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[ 0.381136] pci 0000:00:00.0: BAR 8: assigned [mem 0x1f00000000-0x1f005fffff]
[ 0.381139] pci 0000:01:00.0: BAR 1: assigned [mem 0x1f00000000-0x1f003fffff]
[ 0.381142] pci 0000:01:00.0: BAR 2: assigned [mem 0x1f00400000-0x1f0040ffff]
[ 0.381146] pci 0000:01:00.0: BAR 0: assigned [mem 0x1f00410000-0x1f00413fff]
[ 0.381150] pci 0000:00:00.0: PCI bridge to [bus 01]
[ 0.381152] pci 0000:00:00.0: bridge window [mem 0x1f00000000-0x1f005fffff]
[ 0.381156] pci 0000:00:00.0: Max Payload Size set to 256/ 512 (was 128), Max Read Rq 512
[ 0.381163] pci 0000:01:00.0: Max Payload Size set to 256/ 256 (was 128), Max Read Rq 512
[ 0.381231] pcieport 0000:00:00.0: enabling device (0000 -> 0002)
[ 0.381257] pcieport 0000:00:00.0: PME: Signaling with IRQ 38
[ 0.381294] pcieport 0000:00:00.0: AER: enabled with IRQ 38
[ 0.872426] input: Microsoft Microsoft 3-Button Mouse with IntelliEye(TM) as /devices/platform/axi/1000120000.pcie/1f00300000.usb/xhci-hcd.1/usb3/3-2/3-2:1.0/0003:045E:0040.0001/input/input0
[ 0.887427] input: SIGMACHIP USB Keyboard as /devices/platform/axi/1000120000.pcie/1f00200000.usb/xhci-hcd.0/usb1/1-2/1-2:1.0/0003:1C4F:0016.0002/input/input1
[ 1.007389] input: SIGMACHIP USB Keyboard Consumer Control as /devices/platform/axi/1000120000.pcie/1f00200000.usb/xhci-hcd.0/usb1/1-2/1-2:1.1/0003:1C4F:0016.0003/input/input2
[ 1.065162] input: SIGMACH

@odymaui
Copy link
Author

odymaui commented Dec 4, 2024

When I plug the HAT back and restart I get the link down error message again.

Does that indicate that I have a bad Raspberry Pi 5 board?

[ 0.000000] Kernel command line: reboot=w coherent_pool=1M 8250.nr_uarts=1 pci=pcie_bus_safe cgroup_disable=memory numa_policy=interleave smsc95xx.macaddr=D8:3A:DD:F7:80:F9 vc_mem.mem_base=0x3fc00000 vc_mem.mem_size=0x40000000 console=ttyAMA10,115200 console=tty1 root=PARTUUID=c71db27b-02 rootfstype=ext4 fsck.repair=yes rootwait quiet splash plymouth.ignore-serial-consoles cfg80211.ieee80211_regdom=US
[ 0.031894] PCI: CLS 0 bytes, default 64
[ 0.254482] brcm-pcie 1000110000.pcie: host bridge /axi/pcie@110000 ranges:
[ 0.254486] brcm-pcie 1000110000.pcie: No bus range found for /axi/pcie@110000, using [bus 00-ff]
[ 0.254494] brcm-pcie 1000110000.pcie: MEM 0x1b80000000..0x1bffffffff -> 0x0080000000
[ 0.254498] brcm-pcie 1000110000.pcie: MEM 0x1800000000..0x1b7fffffff -> 0x0400000000
[ 0.254502] brcm-pcie 1000110000.pcie: IB MEM 0x0000000000..0x0fffffffff -> 0x1000000000
[ 0.255798] brcm-pcie 1000110000.pcie: Forcing gen 2
[ 0.256012] brcm-pcie 1000110000.pcie: PCI host bridge to bus 0000:00
[ 0.256013] pci_bus 0000:00: root bus resource [bus 00-ff]
[ 0.256016] pci_bus 0000:00: root bus resource [mem 0x1b80000000-0x1bffffffff] (bus address [0x80000000-0xffffffff])
[ 0.256018] pci_bus 0000:00: root bus resource [mem 0x1800000000-0x1b7fffffff pref] (bus address [0x400000000-0x77fffffff])
[ 0.256028] pci 0000:00:00.0: [14e4:2712] type 01 class 0x060400
[ 0.256049] pci 0000:00:00.0: PME# supported from D0 D3hot
[ 0.256607] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 0.681085] brcm-pcie 1000110000.pcie: link down
[ 0.685763] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[ 0.685769] pci 0000:00:00.0: PCI bridge to [bus 01]
[ 0.685773] pci 0000:00:00.0: Max Payload Size set to 512/ 512 (was 128), Max Read Rq 512
[ 0.685849] pcieport 0000:00:00.0: PME: Signaling with IRQ 38
[ 0.685889] pcieport 0000:00:00.0: AER: enabled with IRQ 38
[ 0.685951] pci_bus 0000:01: busn_res: [bus 01] is released
[ 0.685989] pci_bus 0000:00: busn_res: [bus 00-ff] is released
[ 0.686068] brcm-pcie 1000120000.pcie: host bridge /axi/pcie@120000 ranges:
[ 0.686070] brcm-pcie 1000120000.pcie: No bus range found for /axi/pcie@120000, using [bus 00-ff]
[ 0.686076] brcm-pcie 1000120000.pcie: MEM 0x1f00000000..0x1ffffffffb -> 0x0000000000
[ 0.686080] brcm-pcie 1000120000.pcie: MEM 0x1c00000000..0x1effffffff -> 0x0400000000
[ 0.686085] brcm-pcie 1000120000.pcie: IB MEM 0x1f00000000..0x1f003fffff -> 0x0000000000
[ 0.686088] brcm-pcie 1000120000.pcie: IB MEM 0x0000000000..0x0fffffffff -> 0x1000000000
[ 0.687247] brcm-pcie 1000120000.pcie: Forcing gen 2
[ 0.687268] brcm-pcie 1000120000.pcie: PCI host bridge to bus 0000:00
[ 0.687270] pci_bus 0000:00: root bus resource [bus 00-ff]
[ 0.687272] pci_bus 0000:00: root bus resource [mem 0x1f00000000-0x1ffffffffb] (bus address [0x00000000-0xfffffffb])
[ 0.687274] pci_bus 0000:00: root bus resource [mem 0x1c00000000-0x1effffffff pref] (bus address [0x400000000-0x6ffffffff])
[ 0.687281] pci 0000:00:00.0: [14e4:2712] type 01 class 0x060400
[ 0.687296] pci 0000:00:00.0: PME# supported from D0 D3hot
[ 0.687834] pci 0000:00:00.0: bridge configuration invalid ([bus 00-00]), reconfiguring
[ 0.793088] brcm-pcie 1000120000.pcie: link up, 5.0 GT/s PCIe x4 (!SSC)
[ 0.793105] pci 0000:01:00.0: [1de4:0001] type 00 class 0x020000
[ 0.793117] pci 0000:01:00.0: reg 0x10: [mem 0xffffc000-0xffffffff]
[ 0.793123] pci 0000:01:00.0: reg 0x14: [mem 0xffc00000-0xffffffff]
[ 0.793128] pci 0000:01:00.0: reg 0x18: [mem 0xffff0000-0xffffffff]
[ 0.793190] pci 0000:01:00.0: supports D1
[ 0.793191] pci 0000:01:00.0: PME# supported from D0 D1 D3hot D3cold
[ 0.805092] pci_bus 0000:01: busn_res: [bus 01-ff] end is updated to 01
[ 0.805097] pci 0000:00:00.0: BAR 8: assigned [mem 0x1f00000000-0x1f005fffff]
[ 0.805099] pci 0000:01:00.0: BAR 1: assigned [mem 0x1f00000000-0x1f003fffff]
[ 0.805103] pci 0000:01:00.0: BAR 2: assigned [mem 0x1f00400000-0x1f0040ffff]
[ 0.805107] pci 0000:01:00.0: BAR 0: assigned [mem 0x1f00410000-0x1f00413fff]
[ 0.805111] pci 0000:00:00.0: PCI bridge to [bus 01]
[ 0.805113] pci 0000:00:00.0: bridge window [mem 0x1f00000000-0x1f005fffff]
[ 0.805116] pci 0000:00:00.0: Max Payload Size set to 256/ 512 (was 128), Max Read Rq 512
[ 0.805123] pci 0000:01:00.0: Max Payload Size set to 256/ 256 (was 128), Max Read Rq 512
[ 0.805160] pcieport 0000:00:00.0: enabling device (0000 -> 0002)
[ 0.805177] pcieport 0000:00:00.0: PME: Signaling with IRQ 39
[ 0.805215] pcieport 0000:00:00.0: AER: enabled with IRQ 39
[ 1.311173] input: SIGMACHIP USB Keyboard as /devices/platform/axi/1000120000.pcie/1f00200000.usb/xhci-hcd.0/usb1/1-2/1-2:1.0/0003:1C4F:0016.0002/input/input0
[ 1.312116] input: Microsoft Microsoft 3-Button Mouse with IntelliEye(TM) as /devices/platform/axi/1000120000.pcie/1f00300000.usb/xhci-hcd.1/usb3/3-2/3-2:1.0/0003:045E:0040.0001/input/input1
[ 1.431136] input: SIGMACHIP USB Keyboard Consumer Control as /devices/platform/axi/1000120000.pcie/1f00200000.usb/xhci-hcd.0/usb1/1-2/1-2:1.1/0003:1C4F:0016.0003/input/input2

@pelwell
Copy link
Contributor

pelwell commented Dec 5, 2024

Given that the Pi 5, M.2 HAT+ and NVME drive are known good designs, and that you've tried a different M.2 HAT, a hardware fault - Pi 5, NVME or cable - does seem the likeliest explanation.

@timg236
Copy link
Contributor

timg236 commented Dec 5, 2024

"Does that indicate that I have a bad Raspberry Pi 5 board?"

Not necessarily. With the default firmware and config.txt settings this tells us that the firmware was able to detect the the presence of the HAT via the DET_WAKE gpio. The previous post of the forum shows the power LED on the HAT illuminated so that shows that the power-en GPIO is working.

If the firmware does not detect a HAT then Linux won't attempt to bringup the link unless the dtparam=pciex1 parameters is present.

So ... the system knows that there is a HAT there but is unable to initialize the PCIe link which could be the Pi5 / the cable / the HAT or the cable.

It might be worth trying vanilla RPi OS on the SD-card but that probably won't make a difference.

@odymaui
Copy link
Author

odymaui commented Dec 5, 2024

Do you know of there is a log or way to enable some additional logging to get more information on why the system thinks the link is down? The RPI 5 has just sat at a desk and taken little to no wear/abuse.

I have now tried two new SSD kits with the same issue, so wondering how to get more information to narrow / identify the issue and if it is with the RPI 5.

Would you speculate towards a software or hardware issue? It sounds like you are saying a hardware issue. And, any advice on how I can narrow this down and isolate this issue? Unfortunately I don't have another new RPI 5.

@pelwell
Copy link
Contributor

pelwell commented Dec 5, 2024

I have now tried two new SSD kits with the same issue

When you say "SSD kits", does you mean the M.2 HAT, the SSD and the ribbon cable to connect to the Pi 5? If that's the case, the only common element is the Pi 5 and its power supply.

@odymaui
Copy link
Author

odymaui commented Dec 5, 2024

The SSD kit is this: https://www.raspberrypi.com/products/ssd-kit/ . I have the 256GB version and it has everything needed to to plug it in, except it doesn't work :(.

@pelwell
Copy link
Contributor

pelwell commented Dec 5, 2024

I think we can rule out the power supply, given that it's our 27W PSU, so the only thing left is the Pi 5 itself. I can't know how it has been treated - you say it's "taken little to no wear/abuse" - but it doesn't require great carelessness to short something out when the Pi is uncased.

@odymaui
Copy link
Author

odymaui commented Dec 5, 2024

Ok. So do know if there are logging or other options to get more details about the issue other than just link down?

@pelwell
Copy link
Contributor

pelwell commented Dec 5, 2024

Link down is the symptom - the driver configures everything, waits 100ms, then checks for the status bits to indicate that the link has been established, retrying every 5ms up to 100 times. The error shows that there was no response. If there was some other error or indication that something was wrong then the kernel would have displayed it, but there isn't - it's supposed to just work.

@pelwell
Copy link
Contributor

pelwell commented Dec 6, 2024

If this is the hardware fault that we think it might be, this is the first such instance we're aware of. As such, we would like to inspect the board. Email me - [email protected] - and we can arrange an exchange.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants