Skip to content
This repository has been archived by the owner on Aug 23, 2024. It is now read-only.

Optimizations > Cached Descriptor Sets, Implied Multiple Frames in Flight, Fencing for faster perf 2X #7

Merged
merged 24 commits into from
Jun 29, 2023
Merged

Optimizations > Cached Descriptor Sets, Implied Multiple Frames in Flight, Fencing for faster perf 2X #7

merged 24 commits into from
Jun 29, 2023

Conversation

SubiyaCryolite
Copy link

@SubiyaCryolite SubiyaCryolite commented Jun 12, 2023

Hi Everyone.

This PR is a followup to a comment I left a few months ago, visible here memononen#614 (comment)

I made some optimizations that have doubled the frame-rate of the example_vulkan demo on my machine. These changes do not break compatibility with the older implementations either, specifically with example_vulkan_min_no_glfw.

The changes I made are listed below:

  • Support for multiple command buffers, specifically one per swap-chain image / buffer
  • Caching of descriptor sets, removing the need to create new ones per draw func or to call vkResetDescriptorPool per frame.
  • Under example_vulkan, using vkWaitForFences to control rendering as opposed to vkQueueWaitIdle (seems to be the biggest perf booster). This change also has implied "multiple frames in flight" as dictated by the swap-chain image count.
  • Using persisted mapped buffers via VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT instead of VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT , allows to skip calls to vkMap/UnmapMemory per frame

The best way to observe the differences in performance is to run example_vulkan and example_vulkan_min_no_glfw separately. On my system ( RX 6800, Ryzen R7 7700 ) the example_vulkan demo has an average frame time of 0.80 ms (1250 fps) as opposed to the example_vulkan_min_no_glfw demo with 1.98 ms (505 fps). For reference example-gl3 runs at 0.55 ms (1818 fps), but this is a big improvement as example_vulkan is now just 31% slower as opposed to 72% slower for example_vulkan_min_no_glfw.

The main difference in terms of setup is:

  • Treating cmd_buffer as an array
  • Setting swapchainImageCount and * currentBuffer within the VKNVGCreateInfo init.

Open for comments and feedback, particular around tests on Nvidia, Intel and Apple Silicon hardware.

Screenshot 2023-06-12 115445

Screenshot 2023-06-12 115528

@SubiyaCryolite
Copy link
Author

Doing additional tests after enabling validations. Caught a few bugs and working on them, however it's clear that a lot of the performance issues come down to synchronization and having multiple frames in flight/processing. Rendering works with no issues, however I feel the validation warnings need to be addressed.

@SubiyaCryolite
Copy link
Author

Happy to say all validation issues and bugs have been addressed. Performance is still maintained at 0.80 to 0.78ms locally. Feel free to provide feedback and thanks for laying the foundations :)

@danilw
Copy link
Owner

danilw commented Jun 13, 2023

Absolutely crazy amount of effort! Wow!
Thank you for doing it!

il check it latter today/tomorrow

@danilw
Copy link
Owner

danilw commented Jun 19, 2023

Sorry for delay, I said "will check" I still did not, will do during this week probably next days.

@danilw
Copy link
Owner

danilw commented Jun 23, 2023

These changes do not break compatibility with the older implementations either, specifically with example_vulkan_min_no_glfw.

on Linux I can not build your repo after cloning, there few mistake in code example/example_vulkan_min_no_glfw.c, I think you change and not tested Linux version

can you change your example/example_vulkan_min_no_glfw.c to this fixed, in attachment
there 3 changes at 450-480 lines arroung VK_USE_PLATFORM_XCB_KHR
example_vulkan_min_no_glfw.zip

@danilw
Copy link
Owner

danilw commented Jun 23, 2023

About performance:

Nvidia - absolutely no change in performance, your updated repo - it is exact same to this my original without your changes, I opened multiple window, make them fullscreen, pressed Space to test - everything is exact same.
And Nvidia performance with this is "same bad" as before about 50% slower than OpenGL. Nvidia drop below 60FSP on fullscreen with Space-clicked.
But maybe this is because I use Wayland with Nvidia being second GPU and Nvidia performance just "downgraded" because Nvidia driver being bad...

AMD - huge improvement on this repository-examples. from ~300fps original to 1200fps your improved.
it is even 2x more fps than "multiple frames in flight" example.

Now il check "multiple frames in flight"...

@danilw
Copy link
Owner

danilw commented Jun 23, 2023

If I understand correctly - you integrated "multiple frames in flight" to nanovg_vk.h but you did it more correctly than I did in my example, you did it with cache and and memory optimizations. Looks good.
You also keep compatibility with "minimal example that use single frame".

I need to remake my external examples with integration, il do it latter.

Il wait for you reply.

Confirm if you do this fixes or I will after accepting your pull request:

  1. Linux fix in Optimizations > Cached Descriptor Sets, Implied Multiple Frames in Flight, Fencing for faster perf 2X #7 (comment)
  2. I also saw g++ complaining about this:
    https://github.com/SubiyaCryolite/nanovg_vulkan/blob/feature/use-persistent-mapped-memory/src/nanovg_vk.h#L1504 arguments of vknvg_UpdateBuffer :
    VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT
    when:
    static void vknvg_UpdateBuffer(... VkMemoryPropertyFlagBits memory_type ...)

I think it may work incorrectly because expected VkMemoryPropertyFlagBits and VkMemoryPropertyFlagBits is typedef enum VkMemoryPropertyFlagBits
https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkMemoryPropertyFlagBits.html

And your value result of VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT may not go correctly.

I think this need to be fixed, by replacing VkMemoryPropertyFlagBits memory_type with uint32_t type or VkFlags type in chain down to vknvg_memory_type_from_properties.

I have not tested it on Windows, I assume it works.
I tested with validation layers on Nvidia and AMD - no errors. (there is 1 error when you close example_vulkan something is not cleaned - but it is same as in my original, I just keep it as it is this is not important)

I also saw bug with resize - crash on resize very common, on AMD GPU only, but my minimal "vulkan shader app" that is source of "no_glfw" example does not crash here, so this need investigation...
There no validation errors on crash it just segfault, crash message is this:

example-vk: /home/danil/2021_vulkan_projects/not_clean/nanovg_vulkan/example/example_vulkan.c:66: prepareFrame: Assertion `res == VK_SUCCESS' failed.

or

example-vk: /home/danil/2021_vulkan_projects/not_clean/nanovg_vulkan/example/example_vulkan.c:171: submitFrame: Assertion `res == VK_SUCCESS' failed.

But same bug happening in this my original version, not your bug.

@SubiyaCryolite
Copy link
Author

If I understand correctly - you integrated "multiple frames in flight" to nanovg_vk.h but you did it more correctly than I did in my example, you did it with cache and and memory optimizations. Looks good. You also keep compatibility with "minimal example that use single frame".

I need to remake my external examples with integration, il do it latter.

Il wait for you reply.

Confirm if you do this fixes or I will after accepting your pull request:

1. Linux fix in [Optimizations > Cached Descriptor Sets, Implied Multiple Frames in Flight, Fencing for faster perf 2X #7 (comment)](https://github.com/danilw/nanovg_vulkan/pull/7#issuecomment-1604170006)

2. I also saw g++ complaining about this:
   https://github.com/SubiyaCryolite/nanovg_vulkan/blob/feature/use-persistent-mapped-memory/src/nanovg_vk.h#L1504 arguments of `vknvg_UpdateBuffer` :
   `VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT` 
   when:
   `static void vknvg_UpdateBuffer(...` **`VkMemoryPropertyFlagBits memory_type`** `...)`

I think it may work incorrectly because expected VkMemoryPropertyFlagBits and VkMemoryPropertyFlagBits is typedef enum VkMemoryPropertyFlagBits https://registry.khronos.org/vulkan/specs/1.3-extensions/man/html/VkMemoryPropertyFlagBits.html

And your value result of VK_MEMORY_PROPERTY_HOST_VISIBLE_BIT | VK_MEMORY_PROPERTY_HOST_COHERENT_BIT may not go correctly.

I think this need to be fixed, by replacing VkMemoryPropertyFlagBits memory_type with uint32_t type or VkFlags type in chain down to vknvg_memory_type_from_properties.

I have not tested it on Windows, I assume it works. I tested with validation layers on Nvidia and AMD - no errors. (there is 1 error when you close example_vulkan something is not cleaned - but it is same as in my original, I just keep it as it is this is not important)

I also saw bug with resize - crash on resize very common, on AMD GPU only, but my minimal "vulkan shader app" that is source of "no_glfw" example does not crash here, so this need investigation... There no validation errors on crash it just segfault, crash message is this:

example-vk: /home/danil/2021_vulkan_projects/not_clean/nanovg_vulkan/example/example_vulkan.c:66: prepareFrame: Assertion `res == VK_SUCCESS' failed.

or

example-vk: /home/danil/2021_vulkan_projects/not_clean/nanovg_vulkan/example/example_vulkan.c:171: submitFrame: Assertion `res == VK_SUCCESS' failed.

But same bug happening in this my original version, not your bug.

Thanks for the feedback. I'll be sure to look into these issues in the upcoming days.

Unfortunately I only have access to an AMD GPU (Windows) and my M1 Macbook Pro. So I may not be able to test on Nvidia anytime soon.

Thanks again.

@danilw
Copy link
Owner

danilw commented Jun 24, 2023

Unfortunately I only have access to an AMD GPU (Windows) and my M1 Macbook Pro. So I may not be able to test on Nvidia anytime soon.

I mean - its fine, il test Linux and Nvidia

@danilw
Copy link
Owner

danilw commented Jun 25, 2023

I see your updates, will check tomorrow.

@danilw
Copy link
Owner

danilw commented Jun 28, 2023

@SubiyaCryolite you missed one Linux fix:
https://github.com/SubiyaCryolite/nanovg_vulkan/blob/feature/use-persistent-mapped-memory/example/example_vulkan_min_no_glfw.c#L486
https://github.com/SubiyaCryolite/nanovg_vulkan/blob/feature/use-persistent-mapped-memory/example/example_vulkan_min_no_glfw.c#L506

change fb->current_frame to fb.current_frame

I also checked dynamic UI - seems work, so only crash with resize - il fix it after, its this project bug.

I will accept this pull request after you fix this 2 lines I mention above.

@SubiyaCryolite
Copy link
Author

Just addressed your last comment, thanks for catching those issues.

@danilw
Copy link
Owner

danilw commented Jun 29, 2023

Thank you again for all this changes, huge improvement and huge effort from you. Much appreciated!

For future real-time communication if you need - you can join my discord, link in project description.

@danilw danilw merged commit 10d5211 into danilw:master Jun 29, 2023
@SubiyaCryolite SubiyaCryolite deleted the feature/use-persistent-mapped-memory branch June 29, 2023 14:42
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants