Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New instruction schema #285

Draft
wants to merge 2 commits into
base: main
Choose a base branch
from
Draft

New instruction schema #285

wants to merge 2 commits into from

Conversation

dhower-qc
Copy link
Collaborator

This PR is a WIP to discuss the structure of ELF relocations in the database. Eventually, we will tie these to the variables in instruction encodings.

@AFOliveira AFOliveira marked this pull request as ready for review November 19, 2024 19:27
@AFOliveira AFOliveira marked this pull request as draft November 19, 2024 19:28
@AFOliveira
Copy link
Collaborator

I just have a couple questions on this:

  • Couldn't ABI version be a parameter so that we would allow more than 1 ABI?
  • Couldn't "Not 0 " be a parameter or should the immediate have this defined elsewhere?

@dhower-qc
Copy link
Collaborator Author

dhower-qc commented Nov 19, 2024

Good questions. I think both of these can be addressed in how this relocation data gets tied to an encoding. For example, if we had something like:

[common/inst_variable/itype_imm.yaml]

$schema: inst_variable_schema.json
kind: instruction variable
location: 31-20
# ...
elf_psabi_relocation: { $ref: relocation/R_RISCV_PCREL_LO12_I.yaml# }
pe_abi_relocation: # { ... }  future Windows ABI

[inst/addi.yaml]

# ...
encoding:
  fields:
    imm: { $ref: common/inst_variable/itype_imm.yaml# }

Then we can tie as many relocations as needed to an instruction variable. We do probably need to add an abi key to the relocation object.

This also handled the "not 0" case, since that's tied to the instruction variable.

@apazos
Copy link

apazos commented Nov 20, 2024

We should document the new relocations only. What is already documented in the RISC-V psABI (https://github.com/riscv-non-isa/riscv-elf-psabi-doc), we should skip.

We should follow the psABI document format for recording the info, so that the resulting PDF has the same look and feel of the RISC-V psABI.

  • A table with Vendor relocation types, their vendor enum, and their descriptions (See Table 13. Relocation types in the RISC-V psABI)
  • Examples of code sequence
  • Relaxation Types with Target Relocation, Description, Condition, Extensions Required, Relaxation result

@dhower-qc dhower-qc changed the title Mock-up of a relocation New instruction schema Nov 22, 2024
@dhower-qc dhower-qc requested a review from drom November 22, 2024 20:01
@dhower-qc
Copy link
Collaborator Author

I've added a new attempt at an instruction schema. Here is an example:

$schema: "inst_schema.json#"
kind: instruction
name: lb
long_name: Load byte
format: { $ref: inst_format/itype.yaml# } # the instruction format 'I-type'
# ...
assembly: xd, offset(rs1)
encoding:
  match: -----------------000-----0000011
  variables:
  - field: { $ref: inst_format/itype.yaml#/fields/name=imm }
    name: offset
  - field: { $ref: inst_format/itype.yaml#/fields/name=rs1 }
    name: xs1
  - field: { $ref: inst_format/itype.yaml#/fields/name=rs1 }
    name: xd
# ...

and itype:

$schema: inst_format.json#
kind: instruction format
name: I-type
size: 32
fields:
- location: 31-20
  name: imm
  kind: immediate
  sign_extend: true
  relocations:
  - $ref: relocation/R_RISCV_PCREL_LO12_I.yaml#
  - $ref: relocation/R_RISCV_TPREL_LO12_I.yaml#
  - $ref: relocation/R_RISCV_TLSDESC_ADD_LO12.yaml#
  - $ref: relocation/R_RISCV_LO12_I.yaml#
- location: 19-15
  name: rs1
  kind: x source register
- location: 14-12
  name: funct3
  kind: opcode
- location: 11-7
  name: rd
  kind: x destination register
- location: 6-0
  name: opcode
  kind: opcode
  opcode: true

@dhower-qc dhower-qc linked an issue Nov 22, 2024 that may be closed by this pull request
@lenary
Copy link
Collaborator

lenary commented Dec 3, 2024

Sorry I missed this PR when it was opened.

I have been thinking about this on-and-off since I commented on #256 and we discussed this problem. I am coming around to the view that, for the ABI:

  • Relocations act on fields in the instruction encoding, not whole instruction types
  • You can match a relocatable field, without matching the rest of the instruction type.
  • You can match an instruction type, without all fields being relocatable (qc.swmi is closest to I-type, in which bits are immediate/fixed/register, but you cannot use an I-type relocation on it because it doesn't use the 12 immediate bits as a single, unshifted immediate, it uses those bits as two immediates)
  • Relocations on instructions have an implicit connection to the pseudocode, in as much as how the relocation is rewriting bits to get a specific effect. i.e., R_RISCV_BRANCH is specifically trying to rewrite a PC-relative offset, or that you cannot use R_RISCV_PCREL_LO_I on xori, because it doesn't use the immediate correctly.

So I have come around to the view that there are different kinds of imm fields in I-type instructions:

  • those that use bits 31-20 as a single 12-bit signed immediate (xori)
  • those that use bits 31-20 as a single 12-bit signed immediate that is added to rs1 (and are therefore compatible with several LO_I relocations in the psABI) (addi, loads, stores)
  • those that use bits 31-20 in their own specific way (arguably not I-type, but close)

Another good example might be the U-type immediate:

  • those that use bits 31-12 as a single 20-bit signed+shifted absolute value (lui) which can be relocated in one specific way
  • those that use bits 31-12 as a single 20-bit signed+shifted pc-relative value (auipc) which can be relocated in another specific way
  • those that use bits 31-12 as a single 20-bit signed+swizzled pc-relative value (jal)
  • those that use bits 31-12 in a completely different way (I'm not sure we have any of these)

This means I like the "this instruction inherits this field from this definition we have elsewhere" approach, and then we can relate the relocations to that field specifically. Given things we discussed, like needing to use unused corners of the encoding space eventually, I'm ok with not assigning an overall type to every instruction.

I'm not sure how we can verify that the instruction is using the field correctly, which feels like something we'd like to be able to do.

I also haven't worked out how that would let us get linker relaxations defined in this document, and implemented correctly, but I don't think that should block this PR.

Edited: muli doesn't exist, but the same applies to xori so updated the example.

@AFOliveira
Copy link
Collaborator

@lenary
I understand your approach and indeed different relocation parameters can(and I think will) be the way to describe every field.

However, I don't think I understand this part:

I'm ok with not assigning an overall type to every instruction.
I'm not sure how we can verify that the instruction is using the field correctly, which feels like something we'd like to be able to do.

All those fields can be different in terms of what they happen to do, i.e. represent different values, but they all have the same encodings which we can use for verification, if we know what instruction type they match, right?

@lenary
Copy link
Collaborator

lenary commented Dec 9, 2024

@lenary I understand your approach and indeed different relocation parameters can(and I think will) be the way to describe every field.

I'm ok with not assigning an overall type to every instruction.
I'm not sure how we can verify that the instruction is using the field correctly, which feels like something we'd like to be able to do.

All those fields can be different in terms of what they happen to do, i.e. represent different values, but they all have the same encodings which we can use for verification, if we know what instruction type they match, right?

I think you're maybe tripping over what one could mean by "verify".

Implementers, when executing a binary, never have to care about more than encoding. They will ignore e.g. information about relocations in any specification, they don't have to care, they care about bits going in, and the right actions being done by the core itself.

But if we're going to have relocation info in the riscv-unified-db, then it would be good to be able to use that to check that any relocatable objects have the right relocations applied to the right instructions. The relocations absolutely rely on how a specific instruction is using its operands, which is why we need to differentiate between e.g. the 12-bit immediate on an xori, which should never have a relocation, vs the 12-bit immediate in an addi, which frequently has relocations.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

Adding instruction type as a parameter for instruction definition
4 participants