Should we use lw/sw in push pop when we used ILP32, whether it's RV32 or RV64 #382

Liaoshihua · 2023-05-19T04:41:10Z

When we implemet ILP32 on RV64 ( this is patchs )
we found rv64 's calling convention increases the budget of the stack usage with ILP32 abi. So, should we use lw/sw in push pop when we used ILP32, whether it's RV32 or RV64

@guoren83 @kito-cheng @palmer-dabbelt @asb @jrtc27

Details :

For 64-bit ISA , callee can't determine the
correct width used in the register, so they saved the maximum width of
the ISA register, i.e., xlen size. We also found this rule in x86-x32,
mips-n32, and aarch64ilp32, which comes from rv64lp64.

Here are two downsides of this:

It would cause a difference with 32ilp32's stack frame, and s64ilp32
reuses 32ilp32 software stack. Thus, many additional compatible
problems would happen during the porting of 64ilp32 software.
It also increases the budget of the stack usage.
<setup_vm>:
auipc a3,0xff3fb
add a3,a3,1234 # c0000000
li a5,-1
lui a4,0xc0000
addw sp,sp,-96
srl a5,a5,0x20
subw a4,a4,a3
auipc a2,0x111a
add a2,a2,1212 # c1d1f000
sd s0,80(sp)----+
sd s1,72(sp) |
sd s2,64(sp) |
sd s7,24(sp) |
sd s8,16(sp) |
sd s9,8(sp) |-> All <= 32b widths, but occupy 64b
sd ra,88(sp) | stack space.
sd s3,56(sp) | Affect memory footprint & cache
sd s4,48(sp) | performance.
sd s5,40(sp) |
sd s6,32(sp) |
sd s10,0(sp)----+
sll a1,a4,0x20
subw a2,a2,a3
and a4,a4,a5

So here is a proposal to riscv 64ilp32 ABI:

Let the compiler prevent callee saving ">32b variables" in
callee-registers. (Q: We need to measure, how the influence of
64b variables cross function call?)

aswaterman · 2023-05-19T05:14:10Z

My sense is that this will be slightly preferable for most code, but there is risk that instruction-count increase in less-common cases will outweigh the benefits. Especially since this ABI decision would differ from x86 x32, I'd advise against doing it without quantitative justification. If you do SPEC CPU runs with both ABI designs, you'll see whether the perf boost from reduced D$ miss rate on some programs is outweighed by the instruction-count increase for 64b values that are live across calls.

kito-cheng · 2023-05-19T06:39:40Z

That kind of become no real callee-save register, one concern from me is that's relative easy for compiler, but it's hard to maintain and debug for assembly code, and seems this rely on an assumption is the HW/MMU will ignore highest 32-bit address otherwise ra and sp might got trouble.

Also here is a big question from me is: the main advantage of ilp32/rv64 is we can have native 64 bit operation, so that can be faster than ilp32/rv32, and this design seems will impact the performance for 64 bit operations, so I feel that contradicts, why not just using ilp32/rv32?

Of cause here is always an option is having both ABI flavor ilp32/RV64 for ordinary ABI and ilp32x/rv64 as this proposal, but before having more data and evidence to prove it's really worth we maintain such new ABI variant, I would prefer sticking to the ordinary ilp32/RV64 ABI.

So let having some benchmark data, like SPEC CPU AND EEMBC since SPEC CPU score might not representative for embedded world.

guoren83 · 2023-05-22T13:23:02Z

On Fri, May 19, 2023 at 2:39 PM Kito Cheng ***@***.***> wrote: That kind of become no real callee-save register, one concern from me is that's relative easy for compiler, but it's hard to maintain and debug for assembly code, and seems this rely on an assumption is the HW/MMU will ignore highest 32-bit address otherwise ra and sp might got trouble. Also here is a big question from me is: the main advantage of ilp32/rv64 is we can have native 64 bit operation, so that can be faster than ilp32/rv32, and this design seems will impact the performance for 64 bit

long long type is rare in ilp32 from my view. The most performance advantage of ilp32 is from cache utilization not 64 bit operation.

operations, so I feel that contradicts, why not just using ilp32/rv32?

We don't have RVA2*S32, and we only have rv64 mmu SoCs. Of cause here is always an option is having both ABI flavor ilp32/RV64 for

ordinary ABI and ilp32x/rv64 as this proposal, but before having more data and evidence to prove it's really worth we maintain such new ABI variant, I would prefer sticking to the ordinary ilp32/RV64 ABI. So let having some benchmark data, like SPEC CPU *AND* EEMBC since SPEC CPU score might not representative for embedded world.

Okay

…

— Reply to this email directly, view it on GitHub <#382 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AETCFK2TZPWGA57MH5ZYW33XG4ITPANCNFSM6AAAAAAYHJBPEQ> . You are receiving this because you were mentioned.Message ID: ***@***.***>

-- Best Regards Guo Ren

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Should we use lw/sw in push pop when we used ILP32, whether it's RV32 or RV64 #382

Should we use lw/sw in push pop when we used ILP32, whether it's RV32 or RV64 #382

Liaoshihua commented May 19, 2023

aswaterman commented May 19, 2023

kito-cheng commented May 19, 2023

guoren83 commented May 22, 2023 via email

Should we use lw/sw in push pop when we used ILP32, whether it's RV32 or RV64 #382

Should we use lw/sw in push pop when we used ILP32, whether it's RV32 or RV64 #382

Comments

Liaoshihua commented May 19, 2023

aswaterman commented May 19, 2023

kito-cheng commented May 19, 2023

guoren83 commented May 22, 2023 via email