-
Notifications
You must be signed in to change notification settings - Fork 171
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add human-readable instruction name annotations #520
base: master
Are you sure you want to change the base?
Conversation
Unless the intent is to move the entirety of riscv-isa-manual's instruction semantics descriptions into sail-riscv (which is doable, it's what we do for cheri-specification with sail-cheri-riscv, but I doubt will happen here), I struggle to see how this achieves anything other than duplicate existing descriptions? |
This would be very useful for some of the documentation things I would like to do, so I am quite strongly in favour of adding this metadata. |
Yeah I would also like more documentation metadata here for a RISC-V VSCode extension I'm working on. I think there's quite a bit of demand for machine readable metadata & doc type stuff (c.f. Qualcomm's other ISA specification language), so we should probably start actually doing it. I think at least for the names there isn't really any risk from keeping stuff in sync, and it also helps understand for people who don't know what the ridiculously terse mnemonics mean. How do these name annotations get exposed? Do they end up in the doc JSON generated by Sail? |
3779501
to
39f3115
Compare
There's no room for iterative improvement?
To what "existing descriptions" are you referring? I found just a few in the vector code, but little else of actual use. The goal is to give helpful human-readable names to the mnemonics. Even the current AsciiDoc source doesn't do a consistently good job here. |
Currently, the net effect will be to add the annotations to the associated node in the AST. The current backends ignore the new content. Our out-of-tree JSON backend pulls the annotations into the JSON, associated with the respective mnemonic:
|
I've fixed the issue identified by @jrtc27 (by changing "store fence" to "supervisor memory-management fence" as found in Volume II: RISC-V Privileged Architectures V20211203, section 4.2.1 "Supervisor Memory-Management Fence Instruction"). I'm removing the "draft" tag and marking "ready for review". |
I do not believe this should be here. These names are not official, they are invented, yet they are being put in the official model. Naming the instructions is not the responsibility of the model but of the documentation.
Not if the end goal isn't to avoid duplication.
Words in the ISA manual. That's where the descriptions currently belong
Why is that the Sail model's responsibility? If the AsciiDoc sucks, fix the AsciiDoc. |
That is a valid concern. There should be canonical human-readable names associated with each instruction mnemonic. These should be enumerable and easily associated with the mnemonics, and the best place to represent things like that is in the Sail code. I looked through the PR to see if the names came directly from the ISA documentation or were "invented".
Note that any "invented" instruction names were due to the lack of a clear instruction name in the ISA doc.
I can see your point, but I disagree. If there is to be complete separation between the model and the documentation, then there is a lot that needs to be removed from the documentation. Since humans are writing the model, other humans should be able to read it. What's being added here is annotations, essentially comments that end up in the AST, and otherwise ignored. They make the Sail code better, and easier to understand and interpret.
I'll disagree here, too. I think the model needs to stand pretty well on it's own. You mentioned earlier the "sail-cheri-riscv" effort to merge the documentation with the model, which sounds quite useful. Why is it OK there, but not OK here? Does it have to be all-or-nothing?
I think we should address inadequacies in the Sail code as well. |
It'd be no bad thing for RISC-V to have standardised long names, and if it
did then it'd be useful to include in the model, but:
- I don't think the formal modelling group should unilaterally invent them,
- they should be made to appear uniformly and at least semi-automatically
in the docs, with a scheme for doing that agreed up-front, and
- having them requires one to agree and freeze exactly what is named, that
would need some thought and agreement. There are many choices in what one
deems to be a single "instruction" (with multiple variants) vs multiple
distinct "instructions". The long names might or might not simply
correspond to the mnemonics. In some cases one wants to model to handle a
family of instructions uniformly (eg all the ITYPE instructions together,
as it does now, following the original prose spec) and then in the
documentation either show that uniformity or split them apart - as in the
"one page per instruction" view that some people have asked for, that
Alasdair prototyped.
Peter
…On Mon, 12 Aug 2024 at 21:24, Paul Clarke ***@***.***> wrote:
I do not believe this should be here. These names are not official, they
are invented, yet they are being put in the official model.
That is a valid concern. There should be canonical human-readable names
associated with each instruction mnemonic. These should be enumerable and
easily associated with the mnemonics, and the best place to represent
things like that is in the Sail code. I looked through the PR to see if the
names came directly from the ISA documentation or were "invented".
- Pulled from ISA doc: lui, auipc, jal, jalr, "conditional branch",
slti, "load", "store", ecall, ebreak, wfi, sfence.vma
- "Invented", but with strong analogs/hints in the ISA doc: addi,
andi, ori, xori, sltiu, add, slt, sltu, and, or, xor, sll, srl, sub, sra,
addiw, addw, subw, sllw, srlw, sraw, slliw, srliw, sraiw
- "Invented", but analogs in the ISA doc: slli, srli, srai (*actually,
I see these aren't in the PR*. These should be added.)
- "Invented": fence, fence.tso, mret, sret
Note that any "invented" instruction names were due to the lack of a clear
instruction name in the ISA doc.
Naming the instructions is not the responsibility of the model but of the
documentation.
I can see your point, but I disagree. If there is to be complete
separation between the model and the documentation, then there is a lot
that needs to be removed from the documentation. Since humans are writing
the model, other humans should be able to read it. What's being added here
is annotations, essentially comments that end up in the AST, and otherwise
ignored. They make the Sail code better, and easier to understand and
interpret.
To what "existing descriptions" are you referring? I found just a few in
the vector code, but little else of actual use.
Words in the ISA manual. That's where the descriptions currently belong
I'll disagree here, too. I think the model needs to stand pretty well on
it's own. You mentioned earlier the "sail-cheri-riscv" effort to merge the
documentation with the model, which sounds quite useful. Why is it OK
there, but not OK here? Does it have to be all-or-nothing?
The goal is to give helpful human-readable names to the mnemonics. Even
the current AsciiDoc source doesn't do a consistently good job here.
Why is that the Sail model's responsibility? If the AsciiDoc sucks, fix
the AsciiDoc.
I think we should address inadequacies in the Sail code as well.
—
Reply to this email directly, view it on GitHub
<#520 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABFMZZXU2G4SMYJLVZT5OVTZREKYRAVCNFSM6AAAAABMHTEU66VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDEOBUHA2DGNRSGQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
While I agree we shouldn't invent names here that are not official, I do think having names at least for the ones where they are known is a useful step towards being able to generate per-instruction documentation from the sail model. I don't think this change has any effect on the readability (if anything it makes some of the more obscure mnemonics more obvious without having to find them in the ISA reference) so I'd be fine with this change (at least for the ones with existing long names). |
39f3115
to
0b5ab31
Compare
This PR was discussed in the Tech Golden Model meeting today. What I heard was a general consensus that this additional information is needed, to improve the readability of the Sail model and to facilitate uses downstream. The concern about adding names which do not come from the ISA Specification should be addressed in parallel: add the names to Sail, since they are ready to go (although see below), and propose adding the same names to the ISA Specification. If those additions result in changed names, then the names in the Sail code can easily be changed to match. There was discussion about Sail-to-Asciidoc that I did not follow. The current implementation covers those mnemonics for which the mnemonic is not "constructed", but appears as a fully-specified continuous string. In contrast, there are instruction groups where the set of mnemonics is constructed, like "LOAD":
Here, the mnemonic is constructed from "l" + { "", "u" } + { "", ".aq" } + { "", ".rl" }. Looking just now, it is easy to add annotations conveying the intent of each constructive element (".aq" == "acquire"; ".rl" == "release"). I'll do that now, and push new changes here. (My out-of-tree Sail JSON backend will need to add code to glue the pieces together appropriately in order to construct each full human-readable instruction name, but that's a simple matter of programming.) |
Some instruction groups combinatorically construct mnemonics with various suffixes: ``` mapping clause assembly = LOAD(imm, rs1, rd, is_unsigned, size, aq, rl) <-> "l" ^ size_mnemonic(size) ^ maybe_u(is_unsigned) ^ maybe_aq(aq) ^ maybe_rl(rl) ^ spc() ^ reg_name(rd) ^ sep() ^ hex_bits_signed_12(imm) ^ "(" ^ reg_name(rs1) ^ ")" ``` Here, the mnemonic is constructed from "l" + { "b", "h", "w", "d" } + { "", "u" } + { "", ".aq" } + { "", ".rl" }. These mnemonic suffixes should be annotated inline for readability. In addition human-readable names can be similarly constructed, as in "load word unsigned acquire release", by concatenating the notes.
I think the suggestion was that the Asciidoc backend (which actually outputs JSON) would include the annotations you've added, so you (maybe) wouldn't need a custom JSON backend. |
I disagree with the path forwards. There should be agreement from riscv-isa-manual people before committing to this in the Sail model. Otherwise you'll have to rip them out again if they disagree. |
We had people from the RISC-V documentation side of things at the meeting today, and we don't think that will be an issue.
It actually already does, but I don't think the Asciidoctor plugin has any commands that use them, e.g.
will generate:
|
Is there a reason why attributes is an array of strings or pairs, rather than being a map from name to (optional) value (with something like true or {} as the value for attributes with no value)? |
No particular reason, it just seemed like a fairly simple and lossless JSON encoding. A default value of true would erase the difference between |
Or null, if you're concerned about being able to differentiate those two. I struggle to see when you would, but null vs not present seems even more unlikely to matter. |
Null would work in the JSON output as the Sail attributes are essentially JSON without null, and a slightly more Sail-ish syntax ('=' rather than ':', and unquoted keys). The array output does preserve the order, which most JSON parsers would not guarantee for an object, and it also allows the same attribute to appear twice, which is legal but of dubious value. |
If you need to preserve order or support multiple instances of the same output, then yeah, you can't use an object. But if neither of those are requirements then an object is much easier to work with. |
I could do |
I don't think that really helps, the big pain point is having to iterate to find anything (e.g. trying to extract attributes in jq is likely messy). |
Not sure that making it easier to use with jq is worth losing the ordering information from the source, and jq does have ways to search lists in queries (I think |
FYI, I've added a commit here which adds annotations for the constructed parts of some instruction groups (like LOAD, STORE), as described in my last comment. I think this at least addresses concerns about "full coverage" for the base instructions. Whether these names need to go first into the ISA Spec is the only remaining discussion point to my knowledge. |
I find the content of this PR really interesting and I have a question on this: |
If you are referring to human-readable instruction names, there are no human-readable instruction names in the RISC-V Sail code. This PR provides one way of doing so. Further, there aren't even human-readable instruction names in the RISC-V ISA Specification consistently. This is one of the current objections to this PR, that by adding those names in this way, a standard is being implicitly created without whatever proper review would be warranted. Although, it's not completely clear what review is necessary... is a full new ratified version of the spec with instruction names required before adding names to the Sail model, even though it has also been argued that the names are not part of the model proper. Next steps with this PR are unclear to me.
The current in-tree backends of the Sail parser, to my knowledge, have no interest in human-readable instruction names. (Or if they do, are starving for lack of them.) |
What I was mentioning was a way to describe instructions in which we know what instruction we are concretely mentioning. Your approach would definitely solve this problem for humans, but being long format wouldn't it hurt machine readibility, at least from a parsing stand point? What I was trying to question is if there are concrete tags in the sail models that are easy to refer to, take this examples:
Do this names follow any standard? Because I am not seing which standard they are following and I believe that if they could be at least EXT_INSTRUCTIONNAME it would be easier to digest the sail code and use it for different things. |
The only purpose for the "long format" is just for human-readable names. They are annotations, like comments but they end up in the AST after parsing, so the backends can see them and make use of them (or not).
These were established long before my involvement, so I am not the right person to answer, but I think the answer is an easy "no".
In a previous PR (#506), I changed the way that instructions are "tagged" by their containing extension to hopefully make it easier for a parser backend to determine that. If that is insufficient, let's talk about ways to improve it further. As for "INSTRUCTIONNAME", that's not so easy given the grouping for similar instructions. For example, both "lui" and "auipc" are in the "UTYPE" definition. What should the single "instruction name" be? :-) |
I now understand the proper reason of this PR, sorry for my misunderstanding. Although, I still think that my point on instruction naming is valid, I also definitely do not have a proper solution for that, I think I'll take some time to better understand if there is a way to make this better and then file an issue or PR on that separate topic to get more opinions on this. All in all, thank you for your explanation, it was really useful and I totally agree with the need for a proper long naming standard on RISC-V instruction documentation. |
@ThinkOpenly It looks like the riscv-unified-db project has also been adding |
Or that should be your source of truth for the instruction names, not the Sail model, if there's already such a thing... |
Thanks for the pointer @jordancarlin. I'm aware of the project, and getting more connected with it. The question is, where are those names coming from?
I don't understand how that's a better source for names than what I've already proposed. As there is no canonical single source for names, we're all making them up as we go. FYI, I did pitch a list of names for the base instructions to riscv-isa-manual: riscv/riscv-isa-manual#1638, presuming that is the preferred route. I supposed I should add another column to the table in that issue for the names used by riscv-unified-db, because there are certainly differences. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be merged once we have agreement on the isa-manual issue that these names are "official"
A simple implementation is proposed to annotate each instruction with its human-readable name.
Two methods are used, given the mildly diverse instruction implementations in Sail:
union clause ast
for "simple" instructions that are not defined as a group:Other methods may be required for more complicated mnemonics, especially those with varied suffixes, but these methods easily cover most instructions. This, at least, is a start.
These annotations allow a parser/backend to have relevant human-readable information for such purposes as creating documentation.
This PR is currently marked as a draft because (1) it is incomplete, and (2) we seek feedback on the approach before continuing the work. (Thanks!)