Depth raster TODO list #19757

hrydgard · 2024-12-22T00:03:41Z

This is about #19748 , which solves a number of lens flare issues across various games, at the cost of running an extra Z-only software renderer.

Ideally other games should run and render good depth buffers too, so we get the bugs out of the system, even when they don't have any use for them.

Problematic things (done)

Quake II (homebrew), Suicide Barbie - generate a 0xFFFF depth buffer
Tekken 6 hangs in a broken display list (I guess we write some memory we shouldn't)

Features:

Add a setting to control it under Speed Hacks

Planned optimizations:

Hierarchical rasterization for large triangles
Raster the screen in tiles, across multiple threads. Ideally we should do binning, although on the other hand, triangle raster time is so dominant that maybe it's ok to just send all draws to all threads.
SIMD-ify triangle setup, do four triangles at a time (all the way from the clip function)
Queue up draws, run them "in the background" on the threads and only flush on render target switches.

fp64 · 2024-12-22T14:15:07Z

Not sure if this comment belongs here, but.
Regarding triangle setup, just had a thought: wouldn't it be possible to just use _mm_madd_epi16 for integer multiplication for edge functions?
Possibly zeroing out high 16 bit of each 32-bit lane of one of the arguments (no need to patch both) - a "sign-retract", if you will.
This assumes all vertex coords are in [-32768;+32767], but you want that regardless, to avoid overflow.

hrydgard · 2024-12-22T17:33:53Z

Yes, that will probably be fine, because both multiplicands are small. Certainly better than the horror of the workaround function :)

Or maybe it's okay to do the triangle setup in float? Although, I'm sure Fabian has a good reason to stick to int..

Btw, in your https://rextester.com/GDHNO44482 , for the hiearchical traversal, I'm pretty sure that you don't have to do four tests like in your test_rect, it should be possible to bias the edge functions instead and do a single test even at the upper level. Though, have not tried that :)

fp64 · 2024-12-23T00:51:28Z

You can even do:

// Returns (a-b)*(c-d)-(e-f)*(g-h) per int32 lane,
// assuming all (...)'s fit into int16.
static __m128i edge_function(__m128i a,__m128i b,__m128i c,__m128i d,__m128i e,__m128i f,__m128i g,__m128i h)
{
    __m128i p=_mm_sub_epi32(a,b);
    __m128i q=_mm_sub_epi32(c,d);
    __m128i r=_mm_sub_epi32(e,f);
    __m128i s=_mm_sub_epi32(h,g); // flipped order, since _mm_madd_epi16 is p*q+r*s, not p*q-r*s.
    __m128i x=_mm_or_si128(_mm_and_si128(p,_mm_set1_epi32(0xFFFF)),_mm_slli_epi32(r,16));
    __m128i y=_mm_or_si128(_mm_and_si128(q,_mm_set1_epi32(0xFFFF)),_mm_slli_epi32(s,16));
    return _mm_madd_epi16(x,y);
}

Tested it, seems to work fine.
You can also have 4-argument version (just pq-rs), though that needs one extra negation.
Win over the naive SSE2 version seems surprisingly small though (~1.33), and it's ~1.4 times worse than straightforward SSE4 version.

Triangle setup/rasterization are done in int pretty much for reasons of exactness: you want to make sure pixels on common edge of 2 triangles are rendered exactly once.
The sameself ryg mentioned somewhere (Twitter, I think, so good luck searching that) that you can use float in certain circumstances: you only care about exactness around where the edge function can change sign in the first place (and float gives you much better range - 2^126 vs 2^31, though not precision). But that needs care, and he didn't go into detail.
Normally, float32 products are exact up to 24 bits, so you are worse off than int32.
Now, if you don't care about exactness (as evidenced by not doing top-left rule, as well as skipping small triangles) - perhaps float is alright. I seem to recall that doing rasterizer in float doesn't produce much visible artifacts in practice.

You mean computing edge function at rect center, and comparing it to sum of absolute values of increments for rect half-sides? Oh, yeah, that should work, nice.

hrydgard · 2024-12-23T10:12:02Z

Nice! That'll come in handy. I'll stick to integer...

By the way, I was driving today and thinking of rasterization hehe. I had two thoughts:

In your 8x8 raster, if there is a block to the left of the current one, you can reuse the top right and bottom right samples as the new top left and bottom left.

But also, I don't understand how your 8x8 method with checking corners doesn't miss small triangles, like a tiny one entirely enclosed by a block...

So it feels like rastering at 8x8 centres with bias, and then, in "inside" blocks, checking corners with the 1x1 biases to see if a block is full or partial would be the way to go?

fp64 · 2024-12-23T12:39:43Z

The block test looks at 12 bits of data: signs of 3 edge functions at block corners.
The block is considered "empty" iff at least one edge function has negative signs at all 4 corners, i.e. the block is entirely in the "outside" half-plane (since half-plane is a convex shape).
This is conservative: there may be some blocks that aren't discarded as "empty", but are, in fact, empty.
However, blocks that are discarded are really empty.

So tiny triangle enclosed by a block poses no problem: the signs of indivadual edge functions at corners would be different.

hrydgard · 2024-12-24T10:49:08Z

Ahhh ok, I understand now :) With reuse, checking the corners may practically be as fast as using a rect-center biased check I guess, since with that method we still need to check the corners to see if a block is fully in or partial..

I'm going to play around with this later.

hrydgard added this to the v1.19.0 milestone Dec 22, 2024

hrydgard added the GE emulation Backend-independent GPU issues label Dec 22, 2024

hrydgard mentioned this issue Dec 22, 2024

Enable software depth raster for Wipeout, Midnight Club LA, Resistance, Syphon Filter: Dark Mirror #19759

Merged

hrydgard mentioned this issue Dec 22, 2024

Enable depth raster for Armored Core by default, minor speedup #19761

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Depth raster TODO list #19757

Depth raster TODO list #19757

hrydgard commented Dec 22, 2024 •

edited

Loading

fp64 commented Dec 22, 2024 •

edited

Loading

hrydgard commented Dec 22, 2024

fp64 commented Dec 23, 2024

hrydgard commented Dec 23, 2024

fp64 commented Dec 23, 2024

hrydgard commented Dec 24, 2024

Depth raster TODO list #19757

Depth raster TODO list #19757

Comments

hrydgard commented Dec 22, 2024 • edited Loading

fp64 commented Dec 22, 2024 • edited Loading

hrydgard commented Dec 22, 2024

fp64 commented Dec 23, 2024

hrydgard commented Dec 23, 2024

fp64 commented Dec 23, 2024

hrydgard commented Dec 24, 2024

hrydgard commented Dec 22, 2024 •

edited

Loading

fp64 commented Dec 22, 2024 •

edited

Loading