New instruction schema #285

dhower-qc · 2024-11-19T17:22:52Z

This PR is a WIP to discuss the structure of ELF relocations in the database. Eventually, we will tie these to the variables in instruction encodings.

AFOliveira · 2024-11-19T19:37:40Z

I just have a couple questions on this:

Couldn't ABI version be a parameter so that we would allow more than 1 ABI?
Couldn't "Not 0 " be a parameter or should the immediate have this defined elsewhere?

dhower-qc · 2024-11-19T19:51:00Z

Good questions. I think both of these can be addressed in how this relocation data gets tied to an encoding. For example, if we had something like:

[common/inst_variable/itype_imm.yaml]

$schema: inst_variable_schema.json
kind: instruction variable
location: 31-20
# ...
elf_psabi_relocation: { $ref: relocation/R_RISCV_PCREL_LO12_I.yaml# }
pe_abi_relocation: # { ... }  future Windows ABI

[inst/addi.yaml]

# ...
encoding:
  fields:
    imm: { $ref: common/inst_variable/itype_imm.yaml# }

Then we can tie as many relocations as needed to an instruction variable. We do probably need to add an abi key to the relocation object.

This also handled the "not 0" case, since that's tied to the instruction variable.

apazos · 2024-11-20T02:31:23Z

We should document the new relocations only. What is already documented in the RISC-V psABI (https://github.com/riscv-non-isa/riscv-elf-psabi-doc), we should skip.

We should follow the psABI document format for recording the info, so that the resulting PDF has the same look and feel of the RISC-V psABI.

A table with Vendor relocation types, their vendor enum, and their descriptions (See Table 13. Relocation types in the RISC-V psABI)
Examples of code sequence
Relaxation Types with Target Relocation, Description, Condition, Extensions Required, Relaxation result

dhower-qc · 2024-11-22T20:04:39Z

I've added a new attempt at an instruction schema. Here is an example:

$schema: "inst_schema.json#"
kind: instruction
name: lb
long_name: Load byte
format: { $ref: inst_format/itype.yaml# } # the instruction format 'I-type'
# ...
assembly: xd, offset(rs1)
encoding:
  match: -----------------000-----0000011
  variables:
  - field: { $ref: inst_format/itype.yaml#/fields/name=imm }
    name: offset
  - field: { $ref: inst_format/itype.yaml#/fields/name=rs1 }
    name: xs1
  - field: { $ref: inst_format/itype.yaml#/fields/name=rs1 }
    name: xd
# ...

and itype:

$schema: inst_format.json#
kind: instruction format
name: I-type
size: 32
fields:
- location: 31-20
  name: imm
  kind: immediate
  sign_extend: true
  relocations:
  - $ref: relocation/R_RISCV_PCREL_LO12_I.yaml#
  - $ref: relocation/R_RISCV_TPREL_LO12_I.yaml#
  - $ref: relocation/R_RISCV_TLSDESC_ADD_LO12.yaml#
  - $ref: relocation/R_RISCV_LO12_I.yaml#
- location: 19-15
  name: rs1
  kind: x source register
- location: 14-12
  name: funct3
  kind: opcode
- location: 11-7
  name: rd
  kind: x destination register
- location: 6-0
  name: opcode
  kind: opcode
  opcode: true

lenary · 2024-12-03T18:25:37Z

Sorry I missed this PR when it was opened.

I have been thinking about this on-and-off since I commented on #256 and we discussed this problem. I am coming around to the view that, for the ABI:

Relocations act on fields in the instruction encoding, not whole instruction types
You can match a relocatable field, without matching the rest of the instruction type.
You can match an instruction type, without all fields being relocatable (qc.swmi is closest to I-type, in which bits are immediate/fixed/register, but you cannot use an I-type relocation on it because it doesn't use the 12 immediate bits as a single, unshifted immediate, it uses those bits as two immediates)
Relocations on instructions have an implicit connection to the pseudocode, in as much as how the relocation is rewriting bits to get a specific effect. i.e., R_RISCV_BRANCH is specifically trying to rewrite a PC-relative offset, or that you cannot use R_RISCV_PCREL_LO_I on xori, because it doesn't use the immediate correctly.

So I have come around to the view that there are different kinds of imm fields in I-type instructions:

those that use bits 31-20 as a single 12-bit signed immediate (xori)
those that use bits 31-20 as a single 12-bit signed immediate that is added to rs1 (and are therefore compatible with several LO_I relocations in the psABI) (addi, loads, stores)
those that use bits 31-20 in their own specific way (arguably not I-type, but close)

Another good example might be the U-type immediate:

those that use bits 31-12 as a single 20-bit signed+shifted absolute value (lui) which can be relocated in one specific way
those that use bits 31-12 as a single 20-bit signed+shifted pc-relative value (auipc) which can be relocated in another specific way
those that use bits 31-12 as a single 20-bit signed+swizzled pc-relative value (jal)
those that use bits 31-12 in a completely different way (I'm not sure we have any of these)

This means I like the "this instruction inherits this field from this definition we have elsewhere" approach, and then we can relate the relocations to that field specifically. Given things we discussed, like needing to use unused corners of the encoding space eventually, I'm ok with not assigning an overall type to every instruction.

I'm not sure how we can verify that the instruction is using the field correctly, which feels like something we'd like to be able to do.

I also haven't worked out how that would let us get linker relaxations defined in this document, and implemented correctly, but I don't think that should block this PR.

Edited: muli doesn't exist, but the same applies to xori so updated the example.

AFOliveira · 2024-12-06T10:57:33Z

@lenary
I understand your approach and indeed different relocation parameters can(and I think will) be the way to describe every field.

However, I don't think I understand this part:

I'm ok with not assigning an overall type to every instruction.
I'm not sure how we can verify that the instruction is using the field correctly, which feels like something we'd like to be able to do.

All those fields can be different in terms of what they happen to do, i.e. represent different values, but they all have the same encodings which we can use for verification, if we know what instruction type they match, right?

lenary · 2024-12-09T15:07:12Z

@lenary I understand your approach and indeed different relocation parameters can(and I think will) be the way to describe every field.

I'm ok with not assigning an overall type to every instruction.
I'm not sure how we can verify that the instruction is using the field correctly, which feels like something we'd like to be able to do.

All those fields can be different in terms of what they happen to do, i.e. represent different values, but they all have the same encodings which we can use for verification, if we know what instruction type they match, right?

I think you're maybe tripping over what one could mean by "verify".

Implementers, when executing a binary, never have to care about more than encoding. They will ignore e.g. information about relocations in any specification, they don't have to care, they care about bits going in, and the right actions being done by the core itself.

But if we're going to have relocation info in the riscv-unified-db, then it would be good to be able to use that to check that any relocatable objects have the right relocations applied to the right instructions. The relocations absolutely rely on how a specific instruction is using its operands, which is why we need to differentiate between e.g. the 12-bit immediate on an xori, which should never have a relocation, vs the 12-bit immediate in an addi, which frequently has relocations.

Mock-up of a relocation

537ef5d

dhower-qc requested review from ayosher and AFOliveira November 19, 2024 17:24

AFOliveira marked this pull request as ready for review November 19, 2024 19:27

AFOliveira marked this pull request as draft November 19, 2024 19:28

Attempt #2 at new instruction schema

8d2b279

dhower-qc changed the title ~~Mock-up of a relocation~~ New instruction schema Nov 22, 2024

dhower-qc requested a review from drom November 22, 2024 20:01

dhower-qc linked an issue Nov 22, 2024 that may be closed by this pull request

Adding instruction type as a parameter for instruction definition #256

Open

lenary mentioned this pull request Dec 3, 2024

Adding instruction type as a parameter for instruction definition #256

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New instruction schema #285

New instruction schema #285

dhower-qc commented Nov 19, 2024

AFOliveira commented Nov 19, 2024

dhower-qc commented Nov 19, 2024 •

edited

Loading

apazos commented Nov 20, 2024 •

edited

Loading

dhower-qc commented Nov 22, 2024

lenary commented Dec 3, 2024 •

edited

Loading

AFOliveira commented Dec 6, 2024

lenary commented Dec 9, 2024

New instruction schema #285

Are you sure you want to change the base?

New instruction schema #285

Conversation

dhower-qc commented Nov 19, 2024

AFOliveira commented Nov 19, 2024

dhower-qc commented Nov 19, 2024 • edited Loading

apazos commented Nov 20, 2024 • edited Loading

dhower-qc commented Nov 22, 2024

lenary commented Dec 3, 2024 • edited Loading

AFOliveira commented Dec 6, 2024

lenary commented Dec 9, 2024

dhower-qc commented Nov 19, 2024 •

edited

Loading

apazos commented Nov 20, 2024 •

edited

Loading

lenary commented Dec 3, 2024 •

edited

Loading