Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IR traits #21

Merged
merged 5 commits into from
Jan 19, 2024
Merged

IR traits #21

merged 5 commits into from
Jan 19, 2024

Conversation

urso
Copy link
Contributor

@urso urso commented Jan 16, 2024

Backport from #20

This PR adds the Typed and Qualified traits for IR objects.

The Typed traits is added to each internal IR object that must have a type.
The Typed trait is used by the typed assembler format directive.

I opted to make Attributes not Typed, as it is up to the implementation of the attribute of the attribute has a type or not.

The Operation is not marked as Typed as well, although that would make sense. We might consider to use the built-in FunctionType for operations.

The Qualified trait describes IR objects that have a mnemonic. In pliron we have TypeID and AttrID as Qualifiers. The Qualified trait is used by the qualified and qualifier printing directives in #20.

The idea of the qualified printer is similar to MLIR, where for attributes/types the qualifier is normally not printed by default. In some cases (e.g. generic types) one might need the qualifier to decide on parsing.

For example in #20 I have a vec type that accepts another type as parameter:

#[def_type]
#[type_name = "testing.vec"]
#[derive(Hash, PartialEq, Eq, Debug, Printable)]
#[asm_format = "`vec ` qualified($elem)"]
pub struct VecType {
    elem: Ptr<TypeObj>,
}

This produces the output:

vec testing.integer int32<4,true>

@vaivaswatha
Copy link
Owner

where for attributes/types the qualifier is normally not printed by default.

How would you parse types or attributes if their typeid or attrid isn't printed? I had gone for the approach that this is (should be) always printed, so that they can always be parsed.

@urso
Copy link
Contributor Author

urso commented Jan 16, 2024

How would you parse types or attributes if their typeid or attrid isn't printed? I had gone for the approach that this is (should be) always printed, so that they can always be parsed.

I see. I originally did print them as well, but I did find the output quite noise and somewhat repetitive.

The parsing issue is where the Parsable trait becomes valuable.

The idea is that the IR will become fully typed. That is in case of Type/Attribute objects we want the parameters to be typed. For object types we require their attributes/outputs/inputs to be typed.

For some Type/Attribute values this is implicitly the case thanks to us using Rust structures. For example

#[def_type]
#[type_name = "testing.integer"]
#[derive(Hash, PartialEq, Eq, Debug, Printable, Parsable)]
#[asm_format = "`int` $width `<` $align `,` $sign `>`"]
pub struct IntegerType {
    width: u32,
    sign: bool,
    align: u32,
}

The parameters width, sign, align are pure Rust types that we know how to parse.

In the case of an attribute referencing a type we only have Ptr<TypeObj>... Unfortunately this representation is untyped on the Rust side, but within the IR we still might want to accept a single kind of Type only. Like:

#[def_attribute]
#[attr_name = "testing.int"]
#[derive(PartialEq, Eq, Debug, Clone, Printable)]
#[asm_format = "format(`0x{:x}`, $val) `: ` $ty"]
pub struct IntegerAttr {
    val: ApInt,
    ty: Ptr<TypeObj>,
}

In order to guarantee that we can only write "safe" code we would have to introduce properly typed setters and getters and choose the right parser manually when implementing Parsable for IntegerAttr. This is somewhat unfortunate because internally we know the ty must be IntegerType.

Ideally we would be able to make ty properly typed in Rust, e.g. by writing:

#[def_attribute]
#[attr_name = "testing.int"]
#[derive(PartialEq, Eq, Debug, Clone, Printable)]
#[asm_format = "format(`0x{:x}`, $val) `: ` $ty"]
pub struct IntegerAttr {
    val: ApInt,
    ty: Param<IntegerType>,
}

Now ty is typed and we could have Param provide us setters, getters and access to Printable/Parsable for IntegerType. For example Param could be a wrapper struct like so:

struct Param<T>{
  ptr: Ptr<TypeObj>,
  marker: PhantomMarker<T>,
}

impl<T> Param<T> { ... }

impl<T: Parsable> Parsable for Param<T> {
  ...
}

Alternatively(or in addition) to introducing the Param wrapper one might consider code generation similar to MLIR TableGen to give parameters a name. Although I might prefer the Param wrapper to reduce the amount of magic code being generated.

With properly typed Type and Attribute definitions finding the right parser is trivial because Param implements Parsable for us.

At least Param will function if we have concrete types and can properly type all attributes. One might introduce types/attributes that are parameterized over other types or parameters. For example array of <elem>. In this case <elem> can be any other type and we can not tell how to parse that one.
Example code for a generic Vec type:

#[def_type]
#[type_name = "testing.vec"]
#[derive(Hash, PartialEq, Eq, Debug, Printable, NotParsableType)]
#[asm_format = "`vec [` qualified($elem) `]` "]
pub struct VecType {
    elem: Ptr<TypeObj>,
}

Here we need elem to be Ptr<TypeObj> because we allow any type to be used with Vec. In order to make it parsable we must print the type ID. In this case the output will be vec [testing.int <...>].

To summarize Ptr<TypeObject will implement Printable/Parsable such that the TypeID will be present. And Param<T> will not need the type id. The Parsable implementation for Ptr will use the qualifier to find the right parser functions in the parsers table.

For Op I'm not fully sure yet on how to express this in code. This is also a challenge for custom generated printer/parser from assembly format directives.

But borrowing from MILR TableGen I'm thinking on extending the def_op macro so we can write (the codegen could automatically implement some builtin traits):

#[def_op {
  arguments = (
    symbol_name: StringAttr
    ...
  );

  results = (
    res1 = ...
    res2 = ...
  );
}]

The Printer/Parser code gen will get meta-data about that definition passed in from def_op. This will make $symbol_name, $res... directly accessible.

Not exactly sure how this will look in code though. maybe something like:

impl MyOp {
  fn symbol_name() -> OperandMeta<StringAttr>
  fn get_symbol_name(&self, ctx &Context) -> StringAttr { ... }
  
  fn res1() -> OpResultMeta<StringAttr>
  fn res2() -> OpResultMeta<StringAttr>

  fn value() -> AttributeMeta<...>
}

here OperandMeta and OpResultMeta are mostly descriptive. They do not store pointers as Operand or OpResult do. Although they carry enough meta-data that we can use deref(&Operation, &Context) to access the actual field in the Operation? Having those we would have asm_format translate $symbol_name into MyOp::symbol_name().parse(...).

@vaivaswatha
Copy link
Owner

vaivaswatha commented Jan 17, 2024

IMO, this is getting needlessly complicated. But specifically,

pub struct IntegerAttr {
    val: ApInt,
    ty: Param<IntegerType>,
}

shouldn't exist. Since Types are uniqued, we want to handle them only via the lightweight Ptr<TypeObj>, and we don't want them to exist independently as above (which makes it non-unique).

but within the IR we still might want to accept a single kind of Type only. Like:

This is a valid concern too, but I think letting the verifier ensure it (as it is doing already) should be sufficient.

Let's keep it as it is w.r.t printing and parsing types and attributes, i.e., they're always led by a qualifier (typeid, attrid).

output quite noise and somewhat repetitive.

This is a genuine concern, and we can solve it (in the future) the way MLIR does it.

  1. builtin types (assuming they are common) will not need a builtin. prefix, and the type_parse function in Type.rs will manually parse them all. This is exactly what MLIR does in the function parseNonFunctionType. When none of the shorthand builtin types match, it hands it over to parseExtendedType (in DialectSymbolParser.cpp), which does what our type_parse currently does, which is to lookup the qualifier and fetch the right parser for parsing it.
  2. Introduce type-aliases for other dialect types that may occur often enough to not warrant writing a qualifier everytime.

But either way, neither of this is critical at the moment. So let's just go with compulsory qualifier for all qualified IR entities.

Copy link
Owner

@vaivaswatha vaivaswatha left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The PR is mostly ok. I just have a couple of minor comments.

src/parsable.rs Outdated
@@ -429,7 +429,7 @@ impl NameTracker {
// Check if there are any unresolved forward label references.
for (id, op) in label_scope {
if matches!(op, LabelRef::ForwardRef(_)) {
input_err!(loc.clone(), UnresolvedReference(id.clone()))?
return input_err!(loc.clone(), UnresolvedReference(id.clone()))?;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ooops, I wanted to clean that up. I had a conflict with the commit introducing locations when rebasing

src/type.rs Outdated
}
}

/// Trait for objects that have a direct type.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

object IR entity. Here and the next line.

error::Result,
};

// Objects that have a qualified name within a dialect.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Objects IR entities

@urso
Copy link
Contributor Author

urso commented Jan 17, 2024

pub struct IntegerAttr {
    val: ApInt,
    ty: Param<IntegerType>,
}

shouldn't exist. Since Types are uniqued, we want to handle them only via the lightweight Ptr, and we don't want them to exist independently as above (which makes it non-unique).

I'm not sure I fully understand your concern. The Param type is basically just a wrapper over Ptr<TypeObj> and provide the same methods. It still must be uniqued. You can only construct a Param<T> from a Ptr<TypeObject.

Alternatively to Param maybe something like PtrOf<TypeObj, IntegerType> to make this more clear?

In case IntegerAttr is supposed to support multiple integer types, even from different dialects we would need Ptr<TypeObj> and must print the mnemonic. If we want to restrict the attribute only to IntegerType types of the current dialect we would use PtrOf<TypeObj, IntegerType>.

This is a genuine concern, and we can solve it (in the future) the way MLIR does it.

I did take inspiration from MLIR actually. At first I was not sure if omitting the type mnemonic is a good idea, but then I checked MLIR and the idea did grow on me. See assembly format directives for attributes and types (The qualified directive is not available in Ops and for ops we would print the mnemonic for now).

In code generated by MLIR the fields of attributes/types are by default printed via printStrippedAttrOrType.

The parsing of fields (for code generated from TableGen) is implemented via FieldParser (used for specialization), which will call parseCustomXWithFallback. If the mnemonic is present (e.g. by checking for #) it will use the generic parser. Otherwise use the result type given to FieldParser<T> in order to call the T::parse method defined for the type/attribute given.
MLIR will use generic format (with special rules for builtin types) if the attribute/type does not implement print directly.

Having said this, we will continue to print attributes/types using mnemonics for the time being. This will be guaranteed by implementing Printable for Ptr<TypeObj> and Ptr<AttrObj>. The code change that I have in mind is that similar to MLIR we will not require types/attributes to print their mnemonic anymore.

Currently the builtin IntegerAttr printer is implemented like this:

impl Printable for IntegerAttr {
    fn fmt(&self, ctx: ..., _state: ..., f: ...) -> core::fmt::Result {
        write!(
            f,
            "{} <0x{:x}: {}>",
            Self::get_attr_id_static(),
            self.val,
            self.ty.disp(ctx)
        )
    }
}

And parse like this:

impl Parsable for IntegerAttr {
    type Arg = ();
    type Parsed = AttrObj;

    fn parse<'a>(state_stream: ..., _arg: Self::Arg) -> ParseResult<'a, Self::Parsed> {
        between(
            token('<'),
            token('>'),
            string("0x")
                .with(many1::<String, _, _>(hex_digit()))
                .skip(token(':'))
                .and(type_parser()),
        )
        .parse_stream(state_stream)
        .map(|(digits, ty)| IntegerAttr::create(ty, ApInt::from_str_radix(16, digits).unwrap()))
        .into()
    }
}

The implementations for Printable and Parsable are not symmetric (which kinda bugs me). With the change for Printable we will end up with:

impl Printable for IntegerAttr {
    fn fmt(&self, ctx: ..., _state: ..., f: ...) -> core::fmt::Result {
        write!(f, "<0x{:x}: {}>", self.val, self.ty.disp(ctx))
    }
}

I think this makes it easier (more error proof) to create the proper printer for the attributes.
Thanks to the Ptr<AttrObj> we will continue printing the mnemonic and we still have the option to introduce something like Param or PtrOf<...> in the future if we want to.

@urso
Copy link
Contributor Author

urso commented Jan 17, 2024

Q: Do you want to keep the Qualified trait? In the printers code gen it is used for the quoted directive from MLIR. If we don't want to support that yet then I don't think we need that trait.
I was also wondering if that trait would be a good alternative to get_type_id for example.
The Type trait might become pub trait Type: Printable + Verify + Downcast + Sync + Debug + Qualified<Qualifier=TypeId>.

@vaivaswatha
Copy link
Owner

I'm not sure I fully understand your concern. The Param type is basically just a wrapper over Ptr and provide the same methods. It still must be uniqued. You can only construct a Param from a Ptr<TypeObject.

You're right. I had misunderstood this.

Having said this, we will continue to print attributes/types using mnemonics for the time being.

Okay good. I don't want to fully go the MLIR way for legibility, where the mnemonics are optional. In fact, I'm now thinking we shouldn't make special cases for builtin types either. The right way to improve legibility is via type aliases. So type_parse will parse either a typeid or an identifier (which is the alias). Based on alias registration, it'll know what to do. For builtin types, we just define aliases ourselves. So no special casing required.

Do you want to keep the Qualified trait? In the printers code gen it is used for the quoted directive from MLIR. If we don't want to support that yet then I don't think we need that trait.
I was also wondering if that trait would be a good alternative to get_type_id for example.
The Type trait might become pub trait Type: Printable + Verify + Downcast + Sync + Debug + Qualified<Qualifier=TypeId>.

If we don't need it at the moment, let's not add it. I think the explicit type_id and attr_id are easier to understand than calling them commonly as a qualifier.

With the change for Printable we will end up with write!(f, "<0x{:x}: {}>", self.val, self.ty.disp(ctx))
think this makes it easier (more error proof) to create the proper printer for the attributes.
Thanks to the Ptr we will continue printing the mnemonic and we still have the option to introduce something like Param or PtrOf<...> in the future if we want to.

Sounds good. So when you make this change, you'll do ☝️ also for all existing (already defined) types and attributes right?

@urso
Copy link
Contributor Author

urso commented Jan 19, 2024

Based on alias registration, it'll know what to do. For builtin types, we just define aliases ourselves. So no special casing required.

I like the idea about using aliases for builtins. This would reduce the need for custom printing/parsing functions. Like int and float types.

If we don't need it at the moment, let's not add it. I think the explicit type_id and attr_id are easier to understand than calling them commonly as a qualifier.

Removed.

Sounds good. So when you make this change, you'll do ☝️ also for all existing (already defined) types and attributes right?

yes.

@vaivaswatha vaivaswatha merged commit e6456f1 into vaivaswatha:master Jan 19, 2024
5 checks passed
@@ -170,6 +170,12 @@ impl Value {
}
}

impl Typed for Value {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you see any problem if I just replace this function with the contents of Value::get_type(..) and remove that impl method?

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i.e., get_type(..) won't be a method on Value anymore, but will provide that functionality by implementing the Typed trait.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem with that. I was thinking to propose this as a follow up :)

@urso urso mentioned this pull request Jan 21, 2024
vaivaswatha added a commit that referenced this pull request Jan 28, 2024
Backport from #20 

This PR introduces the `irfmt/printers` module with common printer functions.

In #20 we implement `asm_format` directives as macros. These macros will  use the printer functions defined here (and more to come). As secondary goal we want to make it easier for downstream developers to implement
their own printers, while trying to keep some kind of standardized format. For this reason we also implement Printable for some more IR objects in pliron.

The change also removes the `PrintableIter` trait in favor of the `list_with_sep` and `iter_with_sep` printer functions in the irfmt package. This change ensures that we have a single trait for printing in the library.

As discussed in #21 we do not require developers to add the type_id or
attribute_id manually to the printer anymore. `pliron` now implements a
printer for TypeObj/AttribObj that will always print the IDs.

---------

Co-authored-by: urso <steffen.siering at gmail.com>
Co-authored-by: Vaivaswatha N <[email protected]>
@urso urso deleted the ir_interfaces branch February 18, 2024 00:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants