Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add linux-drm-syncobj-v1 protocol #1356

Merged
merged 4 commits into from
Sep 24, 2024
Merged

Conversation

ids1024
Copy link
Member

@ids1024 ids1024 commented Mar 8, 2024

So far just exposes the protocol but doesn't do anything. #1327 does indeed make that part mostly painless instead of having impl bounds to deal with and a delegate macro.

@ids1024
Copy link
Member Author

ids1024 commented Apr 16, 2024

Reading a bit more about IN_FENCE_FD and OUT_FENCE_PTR, I think we need to use both if we want DRM scanout to not use implicit sync?

https://github.com/ValveSoftware/gamescope/blob/master/src/drm.cpp uses an g_nAlwaysSignalledSyncFile as the IN_FENCE_FD. Which is probably what we want as well (for direct scanout) if we only commit the buffer when we have already waited for it to be ready.

Then we can poll the fence produced by OUT_FENCE_PTR, at not signal release points for buffers used in direct scanout until that.

@Drakulix
Copy link
Member

Reading a bit more about IN_FENCE_FD and OUT_FENCE_PTR, I think we need to use both if we want DRM scanout to not use implicit sync?

I am not sure, if you can use OUT_FENCE_PTR without IN_FENCE_FD. It wouldn't really make sense for the driver to support one but not the other. We probably should just set them both all the time.

https://github.com/ValveSoftware/gamescope/blob/master/src/drm.cpp uses an g_nAlwaysSignalledSyncFile as the IN_FENCE_FD. Which is probably what we want as well (for direct scanout) if we only commit the buffer when we have already waited for it to be ready.

Yes, if we do CPU latching (that is polling the fence), then this seems to be the right approach for direct-scanout.

Though I would still just send the fence for composited frames, although we obviously could apply the principle there as well, as I tried here: pop-os/cosmic-comp#291

However I don't see a benefit outside of cases, where the driver supports polling fences, but not IN_FENCE_FD. In those we could skip a CPU blocking glFinish, which also becomes a non-issue once we have a thread per surface.

Perhaps there are some advantages for VRR use cases, but I don't think that is worth the trouble for a first iteration.

Then we can poll the fence produced by OUT_FENCE_PTR, at not signal release points for buffers used in direct scanout until that.

👍

@ids1024 ids1024 force-pushed the drm-syncobj branch 2 times, most recently from 2fb6a6d to fbb5ca6 Compare May 13, 2024 16:50
@ids1024 ids1024 force-pushed the drm-syncobj branch 6 times, most recently from a4246d7 to 37c2c7e Compare May 21, 2024 19:16
@codecov-commenter
Copy link

codecov-commenter commented May 21, 2024

Codecov Report

Attention: Patch coverage is 92.30769% with 2 lines in your changes are missing coverage. Please review.

Project coverage is 20.29%. Comparing base (c982040) to head (a17585c).
Report is 63 commits behind head on master.

Files Patch % Lines
src/backend/renderer/utils/wayland.rs 93.75% 1 Missing ⚠️
src/wayland/shell/xdg/handlers/positioner.rs 0.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1356      +/-   ##
==========================================
+ Coverage   20.27%   20.29%   +0.01%     
==========================================
  Files         161      162       +1     
  Lines       26041    26351     +310     
==========================================
+ Hits         5281     5348      +67     
- Misses      20760    21003     +243     
Flag Coverage Δ
wlcs-buffer 17.72% <92.30%> (+0.04%) ⬆️
wlcs-core 17.39% <92.30%> (+0.03%) ⬆️
wlcs-output 7.08% <0.00%> (-0.15%) ⬇️
wlcs-pointer-input 19.25% <88.46%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@ids1024 ids1024 force-pushed the drm-syncobj branch 3 times, most recently from e5b0a4b to e1a602f Compare May 23, 2024 01:35
@ids1024
Copy link
Member Author

ids1024 commented May 24, 2024

Given commits could be blocking, this might permanently lock up a display without a timeout. It might be reasonable to never send any user-provided fences and instead require adding a Blocker, if one chooses the implement linux-drm-syncobj.

Apparently fences have timeouts? Since the kernel wants to prevent an indefinite lock like this: https://docs.kernel.org/driver-api/dma-buf.html#indefinite-dma-fences

Presumably the timeout behavior would be similar to implict sync without a DmabufBlocker? In which case that's just consistent with the expected problems when not using a blocker.

So that does sound like something that could be a viable optimization for things like fullscreen applications, even if a general purpose compositor probably want to use blockers under most circumstances (for implicit or explicit sync). But it's an optimization that can be handled later, with more testing.

We kinda do that already? SyncPoints can be awaited via the GPU driver in smithay currently and we use that for synchronizing multi-gpu copies and block for compositing before submitting.

Ah right, we already have support for that in the renderer. So if we want to use the acquire_fence without a CPU wait, we'll want to pass the fence to EGL there. But that can be handled later.

  • Never pass user-provided fences to IN_FENCE_FD, only for compositing like we do now (and denylist the nvidia driver here for now).
  • Pass a signalled fence for direct-scanout, if downstream signals that it is using blockers (either automagically or with a flag on the DrmCompositor).
  • Otherwise just fallback to implicit sync for direct-scanout.

I'm not sure if we can automagically fallback to implicit sync. We could just not pass an IN_FENCE_FD, and ignore the acquire fence, but I don't think ignoring the explicit fence is guaranteed to work (with all drivers)?

I think we need to either:

  1. Document that, at least for now, use of DrmSyncPointBlocker is required with the Smithay syncobj-v1 protocol implementation.
    • If drmSyncobjEventfd isn't supported by the kernel, the protocol shouldn't be exposed.
    • I think this matches what Mutter is doing.
  2. If a blocker isn't used, use IN_FENCE_FD, or import the fence into EGL
    • Since IN_FENCE_FD doesn't work on Nvidia, for now, DrmSyncPointBlocker is the only way to get proper synchronization with the Nvidia driver?

@Drakulix
Copy link
Member

Drakulix commented May 24, 2024

So that does sound like something that could be a viable optimization for things like fullscreen applications

Right. For fullscreen surfaces, we probably want to just pass through the fence, not for normal desktop usage though. I agree this is a later optimization.

Ah right, we already have support for that in the renderer. So if we want to use the acquire_fence without a CPU wait, we'll want to pass the fence to EGL there. But that can be handled later.

Yes, but we kinda don't want that (except for the fullscreen case again), because then we would block compositing for an undefined amount of time. The goal is to render as late as possible and use the latest ready buffer.

We could just not pass an IN_FENCE_FD, and ignore the acquire fence, but I don't think ignoring the explicit fence is guaranteed to work (with all drivers)?

Yes we can absolutely do that, that is what we are doing right now. We would only block drivers not supporting implicit sync (nvidia). Anyone serious about nvidia support, just needs to implement the protocol and I don't want to mandate that.

So strong preference for 1.

@DemiMarie
Copy link
Contributor

DemiMarie commented May 29, 2024

My understanding is that DRM syncobjs are, at least in the future, not guaranteed to ever signal. This is because future drivers may support Userspace Memory Fences (UMFs) and userspace can deadlock themselves with them. Therefore, I think Smithay needs to do a (maybe asynchronous) explicit wait (possibly with a timeout) so that the display cannot be locked up forever.

@ids1024 ids1024 force-pushed the drm-syncobj branch 3 times, most recently from 3c50274 to 4f13b5f Compare June 30, 2024 02:49
@ids1024 ids1024 force-pushed the drm-syncobj branch 3 times, most recently from bad4088 to ad6fb25 Compare July 15, 2024 03:25
@ids1024 ids1024 changed the title WIP linux-drm-syncobj-v1 Add linux-drm-syncobj-v1 protocol Sep 19, 2024
@ids1024
Copy link
Member Author

ids1024 commented Sep 19, 2024

I've rebased this (and the cosmic-comp PR) and made a few small improvements to the doc comments.

I believe this reflects how we agreed the implementation/API should work, and as far as I can tell works correctly. Though it's hard to test thoroughly. The latest version of nvidia/egl-wayland should fix the issues that NVIDIA drivers were having on certain clients with compositors that supported explicit sync.

If no one else sees issues with this, it should be good to merge.

@ids1024 ids1024 marked this pull request as ready for review September 19, 2024 01:52
@cmeissl
Copy link
Collaborator

cmeissl commented Sep 19, 2024

Thanks for working on this!

I will try to review and also test it on a few different devices during the next few days.

src/backend/renderer/utils/wayland.rs Outdated Show resolved Hide resolved
@PolyMeilex
Copy link
Member

Tested a few clients that make use of syncobj on my AMD systems, everything works as expected. 👌

Copy link
Collaborator

@cmeissl cmeissl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, so while I would like to see a more generic solution allowing control over passing fences directly to IN_FENCE_FD I agree that we can do that later. For overlay planes, especially when having multiple, no one will probably want to just pass the fences. So the only real use-case would be fullscreen direct scan-out. As most compositors outside of kiosk use-cases will have to handle blockers anyway there is no real point for special casing fullscreen anyway.

src/wayland/drm_syncobj/mod.rs Outdated Show resolved Hide resolved
@Drakulix Drakulix merged commit 66236e3 into Smithay:master Sep 24, 2024
12 of 13 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants