Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

A320 hangs on 4.12rc1 and up #12

Open
montvid opened this issue Oct 10, 2017 · 6 comments
Open

A320 hangs on 4.12rc1 and up #12

montvid opened this issue Oct 10, 2017 · 6 comments

Comments

@montvid
Copy link

montvid commented Oct 10, 2017

I am using freedreno on ARM apq8064 Nexus 7 2013 WIFI (flo). I am using these patches to make freedreno work on 4.11 and 4.11.12 if I don't use them I get a blue/black screen:

Make_of_dma_deconfigure()_public
https://git.linaro.org/people/john.stultz/flo.git/commit/?h=flo-v4.11&id=6dd93436bf11ccd9fef459b8dc9f58c8635a6230

Split_of_configure_dma()_into_mask_and_ops_configuration
https://git.linaro.org/people/john.stultz/flo.git/commit/?h=flo-v4.11&id=b3c0228dd8468ab9a9da3b70b2a011ffb855091b

Configure_dma_operations_at_probe_time
https://git.linaro.org/people/john.stultz/flo.git/commit/?h=flo-v4.11&id=4d4d7186492648fe2517178b983fcd5a55e3d575

Handle_IOMMU_lookup_failure_with_deferred_probing_or_error
https://git.linaro.org/people/john.stultz/flo.git/commit/?h=flo-v4.11&id=c0648aa6e6063285cec2a15fecd360a3b4394af4

Now kernels 4.12rc1 and up implement these patches upstream in a heavy edited form and when I run them I get the screen working but the A320 gpu hangs constantly and is unusable. So the problem was introduced in kernel 4.12rc1.
Upstreamed patch to 4.12rc1:
Handle_IOMMU_lookup_failure_with_deferred_probing_or_error
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=7b07cbefb68d486febf47e13b570fed53d9296b4

Could there be a way to fix this from freedreno side or should I ask laurent pinchart about his commits?
Please contact me to help test a possible fix.
My dmesg https://paste.gnome.org/pw0kyldbz

[ 105.339331] msm 5100000.mdp: A320: using IOMMU
[ 124.084217] msm 5100000.mdp: A320: hangcheck detected gpu lockup!
[ 124.084235] msm 5100000.mdp: A320: completed fence: 1
[ 124.084246] msm 5100000.mdp: A320: submitted fence: 29
[ 124.084391] msm 5100000.mdp: A320: hangcheck recover!
[ 124.084418] msm 5100000.mdp: A320: offending task: Xwayland
[ 124.098105] revision: 320 (3.2.0.2)
[ 124.098122] fence: 1/29
[ 124.098136] rptr: 40
[ 124.098150] wptr: 0
[ 124.098163] rb wptr: 510
[ 124.098179] CP_SCRATCH_REG0: 0
[ 124.098194] CP_SCRATCH_REG1: 0
[ 124.098209] CP_SCRATCH_REG2: 2
[ 124.098224] CP_SCRATCH_REG3: 0
[ 124.098239] CP_SCRATCH_REG4: 0
[ 124.098254] CP_SCRATCH_REG5: 1
[ 124.098269] CP_SCRATCH_REG6: 21
[ 124.098284] CP_SCRATCH_REG7: 3
[ 125.634299] [drm:a3xx_idle] ERROR A320: timeout waiting for GPU to idle!
[ 126.634276] [drm:adreno_idle] ERROR A320: timeout waiting to drain ringbuffer!
[ 127.744210] [drm:adreno_idle] ERROR A320: timeout waiting to drain ringbuffer!
[ 128.844285] [drm:adreno_idle] ERROR A320: timeout waiting to drain ringbuffer!
[ 129.944292] [drm:adreno_idle] ERROR A320: timeout waiting to drain ringbuffer!
[ 131.044162] [drm:adreno_idle] ERROR A320: timeout waiting to drain ringbuffer!
[ 131.044267] msm 5100000.mdp: A320: hangcheck detected gpu lockup!
[ 131.044299] msm 5100000.mdp: A320: completed fence: 61
[ 131.044322] msm 5100000.mdp: A320: submitted fence: 67
[ 131.044593] msm 5100000.mdp: A320: hangcheck recover!
[ 131.044605] msm 5100000.mdp: A320: offending task: Xwayland
[ 131.044758] revision: 320 (3.2.0.2)
[ 131.044768] fence: 61/67
[ 131.044777] rptr: 0
[ 131.044784] wptr: 0
[ 131.044792] rb wptr: 38
[ 131.044800] CP_SCRATCH_REG0: 0
[ 131.044807] CP_SCRATCH_REG1: 0
[ 131.044815] CP_SCRATCH_REG2: 61
[ 131.044822] CP_SCRATCH_REG3: 0
[ 131.044829] CP_SCRATCH_REG4: 0
[ 131.044837] CP_SCRATCH_REG5: 1339
[ 131.044844] CP_SCRATCH_REG6: 1346
[ 131.044852] CP_SCRATCH_REG7: 1341
[ 134.084235] msm 5100000.mdp: A320: hangcheck detected gpu lockup!
[ 134.084295] msm 5100000.mdp: A320: completed fence: 70
[ 134.084317] msm 5100000.mdp: A320: submitted fence: 71
[ 134.084869] msm 5100000.mdp: A320: hangcheck recover!
[ 134.084897] msm 5100000.mdp: A320: offending task: weston
[ 134.084979] revision: 320 (3.2.0.2)
[ 134.084994] fence: 70/71
[ 134.085007] rptr: 24
[ 134.085021] wptr: 0
[ 134.085034] rb wptr: 46
[ 134.085052] CP_SCRATCH_REG0: 0
[ 134.085067] CP_SCRATCH_REG1: 0
[ 134.085082] CP_SCRATCH_REG2: 70
[ 134.085097] CP_SCRATCH_REG3: 0
[ 134.085112] CP_SCRATCH_REG4: 0
[ 134.085126] CP_SCRATCH_REG5: 1657
[ 134.085141] CP_SCRATCH_REG6: 1680
[ 134.085156] CP_SCRATCH_REG7: 1682
[ 171.044266] msm 5100000.mdp: A320: hangcheck detected gpu lockup!
[ 171.044326] msm 5100000.mdp: A320: completed fence: 73
[ 171.044349] msm 5100000.mdp: A320: submitted fence: 74
[ 171.044908] msm 5100000.mdp: A320: hangcheck recover!
[ 171.044933] msm 5100000.mdp: A320: offending task: weston
[ 171.045004] revision: 320 (3.2.0.2)
[ 171.045020] fence: 73/74
[ 171.045033] rptr: 24
[ 171.045046] wptr: 0
[ 171.045059] rb wptr: 46
[ 171.045075] CP_SCRATCH_REG0: 0
[ 171.045090] CP_SCRATCH_REG1: 0
[ 171.045104] CP_SCRATCH_REG2: 73
[ 171.045119] CP_SCRATCH_REG3: 0
[ 171.045134] CP_SCRATCH_REG4: 0
[ 171.045149] CP_SCRATCH_REG5: 559
[ 171.045163] CP_SCRATCH_REG6: 684
[ 171.045178] CP_SCRATCH_REG7: 686
[ 409.844371] dsi_cmds2buf_tx: cmd dma tx failed, type=0x39, data0=0x51, len=8
[ 410.184346] msm 5100000.mdp: vblank time out, crtc=0

@robclark
Copy link
Collaborator

My suspicion is that maybe the iommu dma-ops are getting wired up after these patches, making msm_gem.c's use of dma_{map,unmap}_sg() cause unintended iommu map/unmap conflicting with our management of the iova space.. I haven't had time yet to look again at 8064, but it is on my todo list

@montvid
Copy link
Author

montvid commented Jan 22, 2018

The problem persists in 4.14.14. I can boot the device no problem but the gpu throws the same errors when trying to load a GUI. We at postmarketos are trying to use your freedreno driver to run KDE Plasma Mobile, LuneOS and other Linux GUI's like XFCE4, MATE on older phones and a lot of them use A3** gpu so we depend on you and your freedreno driver support. As I said before 4.11.12 is the last kernel everything was working. Well I am glad I can use that.

@robclark
Copy link
Collaborator

sorry that I haven't had time to look at ifc6410 yet.. although if the issue is what I think it is (with the iommu) it should at least be restricted to the msm_iommu+mdp4+a3xx combo (so wouldn't apply to anything 8x74 and later).. so basically it would only be 8064.

I guess if the older kernel is working, sticking with that is a reasonable short-term solution.

@robclark
Copy link
Collaborator

"freedreno" is mainly the userspace part.. it normally uses upstream drm/msm kernel module. There is some support for an ancient downstream kgsl kernel driver, but it is nearly impossible to support all the different versions of the downstream kgsl driver on various/random android kernel branches. Most of my development work focuses on community boards (ifc6540/db410c/db820c/etc) since they are easier to work with.

For folks who want to use freedreno on phones/tablets, especially devices w/ SoC's that have reasonable support upstream, I tend to encourage getting an upstream kernel working. The downside is low level bring-up work, in terms of digging thru downstream kernel and converting things to upstream DT bindings.. plus writing a dsi panel driver. The upside is that old kgsl kernel support was a complete hack and pretty fragile.

The good news is qcom seems to move closer to upstream drivers (clk/genpd/etc) in newer things so hopefully this problem goes away eventually ;-)

@okias
Copy link

okias commented Mar 10, 2020

Any progress on that? Also having issues with Nexus 7 2013 flo (kernel 5.6-rc4-next)

[1] https://github.com/apq8064-mainline/linux/commits/qcom-apq8064-next

@sylencecc
Copy link

sylencecc commented Nov 27, 2024

Encountering the same issue while trying to revive a few Nexus 7 tablets. The device now struggles with its vendor 4.11 kernel, which doesn't work anymore with recent versions of the freedreno mesa/gallium driver. On the other hand, newer kernels suffer from the issue mentioned here. Is there anything I could do to help resolve this? Any info to provide from the device? Test something?
I'm currently running kernel 5.10.98-r2 (from the link [1] posted above) on a Nexus 7 2013 flo. When running X, there is valid graphical output, but extremely unresponsive. dmesg fills with

[  123.105108] msm 5100000.mdp: [drm:hangcheck_handler] *ERROR* A320: hangcheck detected gpu lockup rb 0!
[  123.105133] msm 5100000.mdp: [drm:hangcheck_handler] *ERROR* A320:     completed fence: 2
[  123.105152] msm 5100000.mdp: [drm:hangcheck_handler] *ERROR* A320:     submitted fence: 3
[  123.105242] msm 5100000.mdp: [drm:recover_worker] *ERROR* A320: hangcheck recover!
[  123.105388] msm 5100000.mdp: [drm:recover_worker] *ERROR* A320: offending task: Xorg (/usr/libexec/Xorg :0 -seat seat0 -auth /var/run/lightdm/root/:0 -nolisten tcp vt7 -novtswitch)
[  123.107931] revision: 320 (3.2.0.2)
[  123.107954] rb 0: fence:    2/3
[  123.107975] rptr:     44
[  123.107994] rb wptr:  44
[  123.108015] CP_SCRATCH_REG0: 0
[  123.108036] CP_SCRATCH_REG1: 0
[  123.108056] CP_SCRATCH_REG2: 2
[  123.108077] CP_SCRATCH_REG3: 0
[  123.108099] CP_SCRATCH_REG4: 0
[  123.108119] CP_SCRATCH_REG5: 0
[  123.108140] CP_SCRATCH_REG6: 0
[  123.108161] CP_SCRATCH_REG7: 0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants
@robclark @okias @sylencecc @montvid and others