-
Notifications
You must be signed in to change notification settings - Fork 2.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lens flare effects #15923
Comments
Artic Edge |
Thanks, added to the list. |
Syphon Filter Logan's Shadow |
I don't know how aethersx2 PS2 emulator does it, but they get accurate full speed readback emulation using opengl on Android. I can play need for speed hot pursuit 2 without underclocking the emulator and have accurate readbacks turned on and maintain full speed emulation with no slowdowns and the sun hides when it's supposed to. And I think the PS2 has double the resolution of PSP. I do cheat a little though... I keep all my cores frequencies maxed out and GPU set at 3/4 speed on my rooted phone (SD 855+). The phone doesn't get too hot.... About 147° F on average. So I know ppsspp wouldn't be too demanding with accurate readbacks. I think what helps them is they have CPU affinity option that keeps the heaviest threads on the biggest cores of the phone. |
Aethersx2 is the pcsx2 mobile port btw. |
An alternative to readbacks for the games that peek the Z-buffer using the CPU, as commented elsewhere by @unknownbrackets , would be to run both the software and hardware renderers side by side, that way we'll always have accurate depth in CPU-accessible memory, at the right time. This is expensive though, and to make it less so, it would be possible to have the software renderer only render depth buffers, and just ignore color - depth is a lot less complex so I think this would be way faster than running the full software renderer. This wouldn't work for cases where games reinterpret color and Z like Kuroyou, but I don't think that applies to any of these cases. Also gonna have to look into what PCSX2 does. Maybe SX2 Aether does something special on top, hard to say given it's close source.. |
I will say, the loop to interpolate triangle data is the slowest part of the software renderer now, I think. Texture sampling is still fairly slow as well. We'd still need to texture (because of alpha tests/color tests), but we could skip alpha blend and logicops. Skipping blending would save time, but I don't think it'd make a huge difference overall. Maybe we could have a "fast and loose" mode where it ignores color and alpha tests, though, or at least skip sampling/etc. when they're not enabled (which would be safe.) That would also allow us to skip lighting which is quite expensive. -[Unknown] |
Yeah I think we can go very fast and loose for Z-only. Texturing only needs to be done when we know there's alpha. And we could skip filtering and mipmapping for example.. |
Right. My biggest concern would be "depth boxes" from alpha testing. For example, if some far away trees or clouds were drawn to cover the sun, but without alpha testing they cover the entire thing. If we can safely skip alpha testing, it probably helps the potential speed a lot, because it cuts out many, many things. We might end up in a place where we're using heuristics to skip alpha testing, though. For example, it's probably mainly an issue with flat Z - models probably don't need alpha testing for depth to be correct. -[Unknown] |
Socom US Navy Seal: Tactical Strike is also affected. |
Thanks, added to list. |
Burnout Dominator sun flares is glitchy using the recently build PPSSPP. |
Yeah, I'll have to take a look at those again. |
Resistance Retribution GE Dump Edit: fixes by [ReadbackDepth] compatibility but makes the game slower and make my opponent invisible :( |
Alright, by implementing a new Z-only rasterizer, I have confirmed the theories above, that rasterizing depth-only can be made really really fast. I haven't found a single case yet in the affected games where alpha testing would have a noticeable effect on the lens flare blocking. But of course, it's possible we'll find one. |
Lens flare issues, categorized:
CPU peeking into the depth buffer to check coverage
Framebuffer->CLUT tricks
Framebuffer alpha accumulation tricks:
Not yet investigated in detail:
Graphical bugs in Colin McRae Rally 2005 #7810 Colin McRae 2005? (the sun is rendered wrong)References:
https://github.com/hrydgard/ppsspp/commits/c3bb9437669a4a (old PR for framebuffer CLUTs)
Lens flares are a typical problematic effect on GPUs of the PSP's generation. They are supposed to be drawn only when the sun (or other light source) is visible, but there are no occlusion queries you can use to figure out if it is directly on the GPU, neither is it practical to copy the texel to an image and then use multitexturing to blend the lens effect texture with the copied texel, since multitexturing is not a thing.
So games make use of a variety of dirty tricks.
Let's start with Wipeout Pure, #13344. I started by hacking the interpreter to log out CPU reads from VRAM. For some reason there are a whole bunch that happen every frame, but these stand out:
In the EUR version of Wipeout, the
lhu
instruction doing these reads is at 0888c16c (function starting at 0888C0A8), then there are some additional reads being done by 0881e0c0 (function starting at 0881E098, no idea what it's doing).It's using
lhu
instructions (load 16-bit) and it looks to me like it's sampling a 4x4 rectangle around the sun's screen position from the depth buffer, skipping every other pixel - it is situated at 110000 in VRAM which starts at 04000000, plus the 600000 deswizzle offset that's needed to linearize the depth buffer in 8888 mode. A zero value it treats as sky, that is, sun is not occluded and it will draw the lense flare. As expected, as the sun slides across the image when the camera moves, these addresses, which are read from every frame, change accordingly. The game must be synchronized here since the depth buffer is not double-buffered.For this to work correctly, we have to read back the depth buffer every frame to emulated PSP VRAM, which introduces a massive sync point between the GPU and CPU. This is not really desirable (although we should implement it as an option), so I've been thinking about ways to get around it:
Anyway, I think the first step will be to create the correct-ish but slow solution of doing hard-synced readbacks to PSP VRAM. The question is when exactly in the frame we should do these. "When finished rendering the main depth buffer" is presumably the best option, but there's no clear way to detect that. Maybe just do it when the main framebuffer is displayed, or something.
Syphon Fitler : Dark Mirror flares
I think this will be a good candidate to test a fast-and-loose Z rasterizer, along with Wipeout.
We basically just need to rasterize static meshes (walls), we can ignore skinned characters.
So I think the way to go will be to write a hyper optimized Z-only rasterizer. It can be very loose, maybe even render at half resolution in one or both dimensions.
Syphon Filter walls are drawn with this setup:
Wipeout uses similar vertices, also s16 positions. Strips and indexed lists
So for a proof of concept we need to handle the above. "Just" do a custom vertex decoder that picks the positions out and collect into a triangle list on the side, then bin and raster on each framebuffer switch (or stall?) using a custom rasterizer. Something like Intel's.
.... To be continued
The text was updated successfully, but these errors were encountered: