From 64a300a811140b92f8e3fcf48e9c531147641a4e Mon Sep 17 00:00:00 2001 From: Eli Orona Date: Sun, 25 Feb 2024 00:44:14 -0800 Subject: [PATCH 1/5] Start working on codecs --- src/lib/translations/en/wiki.json | 3 + wiki/misc/codecs/+page.yml | 1 + wiki/misc/codecs/en.md | 171 ++++++++++++++++++++++++++++++ 3 files changed, 175 insertions(+) create mode 100644 wiki/misc/codecs/+page.yml create mode 100644 wiki/misc/codecs/en.md diff --git a/src/lib/translations/en/wiki.json b/src/lib/translations/en/wiki.json index e34cdf2d..af9f840d 100644 --- a/src/lib/translations/en/wiki.json +++ b/src/lib/translations/en/wiki.json @@ -90,6 +90,9 @@ }, "misc": { "title": "Misc", + "codecs": { + "title": "Codecs" + }, "commands": { "title": "Adding Commands" }, diff --git a/wiki/misc/codecs/+page.yml b/wiki/misc/codecs/+page.yml new file mode 100644 index 00000000..b918ee8c --- /dev/null +++ b/wiki/misc/codecs/+page.yml @@ -0,0 +1 @@ +title: wiki.misc.codecs.title diff --git a/wiki/misc/codecs/en.md b/wiki/misc/codecs/en.md new file mode 100644 index 00000000..7b721b48 --- /dev/null +++ b/wiki/misc/codecs/en.md @@ -0,0 +1,171 @@ +# Codecs + +WARNING: This tutorial expects a strong understanding of both Java basics and generics. + +The `Codec` class from [DataFixerUpper](https://github.com/Mojang/DataFixerUpper) is the backbone of content serialization and deserialization. +It provides an abstraction layer between Java Objects and serialization types, such as `json`, `nbt`, and more. +Each `Codec` is made of a `Encoder` and a `Decoder`, but you rarely need to create a raw `Codec` from scratch. +Let's start off with the primative `Codec`s. + + +
+## Primative Codecs + +Mojang thankfully builds in many default `Codec` implementations, making our lives easier as most objects are composed of these few types. +We will cover building `Codec`s composed of other `Codec`s later on. +It'll be important to understand the basics. + +A non-exhaustive list of `Codec`s: +- `Codec.BOOL`: A `boolean` codec. +- `Codec.BYTE`: A `byte` codec. +- `Codec.SHORT`: A `short` codec. +- `Codec.INT`: An `int` codec. + - `Codec Codec.intRange(int min, int maxInc)`: An `int` codec with an inclusive range. +- `Codec.LONG`: A `long` codec. +- `Codec.FLOAT`: A `float` codec. + - `Codec Codec.floatRange(float min, float maxInc)`: A `float` codec with an inclusive range. +- `Codec.DOUBLE`: A `double` codec. + - `Codec Codec.doubleRange(double min, double maxInc)`: An `double` codec with an inclusive range. +- `Codec.STRING`: A `string` codec. + +"Ok", you tell me, "Thats cool. But... I still don't know what codecs are for or how to use them". + +### Basic Codec Example +Let's go over a very basic example: + +```java +boolean bool = + Codec.BOOL.decode( + JsonOps.INSTANCE, + new JsonPrimitive(true) + ) + .result() + .get() + .getFirst(); + +assert bool; +``` + +WOAH! That doesn't look simple *at all*. What happened? + +Well, using `Codec`s is fairly verbose, but that means you get a lot of useful information to help with errors and such, which is important for Mojang to provide in their library since we want to know why Minecraft failed to load something, not just that it failed. + +Now, lets break this down into a couple sections. + +First off: +```java +Codec.BOOL.decode( + JsonOps.INSTANCE, // DynamicOps ops + new JsonPrimitive(true) // T input +) // DataResult> +... +``` +The `decode` method on a codec takes two values, an `ops` and an `input`. +As shown in the comments above, the type of the input and a generic parameter on `ops` must match. +This is because the `ops` needs to know about how the `input` functions. +In this example, we use `com.mojang.serialization.JsonOps.INSTANCE`, which operates on json elements from `gson`. +We then pass in a `JsonPrimative` with a value of `true` for this example. + +Finally, the `com.mojang.serialization.DataResult>` type allows us to encode more information than just the result. +First off, the `A` type is the output of the `Codec`, which is `Boolean` in this case, and the `T` is the same as the input. + +Let's look more into the `DataResult`: +```java +... +.result() // Option> +... +``` + +Ok, this starts to make more sense. +`DataResult` has a lot of associated methods on it, but for now let's only cover two: `result` and `error`. +`error` returns a `PartialResult`, which allows you to both recover a decode, and to get the error message for why the decode failed. Right now, the `result` method is more important to us. +`result` returns an `Option>`, which makes sure that we know for sure if we have a result, otherwise we could just get null. + +Finally, we get to the last two lines: +```java +... +.get() // Pair +.getFirst(); // Boolean +``` + +We use `get` to unbox the `Option`. Generally this is unsafe to do, an IntelliJ even gives a warning. +In this case we know that it is safe due to the simplicity of the example. +Then finally, we call `getFirst` on `com.mojang.datafixers.util.Pair` to get the first half of the pair + +Wow. That sure was a lot. +Now, I know this may seem like the `Codec` system is complicated right now, but unfortunately we have only scratched the surface. +Yep. That's right, it gets so much worse. +Let's step back and look at some more `Codec` types. +
+ +## Collection Codecs + +While the primative `Codec`s are the most basic building blocks for `Codec`s, we need to we able to put them together to be able to fully represent serializable objects. +These collection `Codec`s are fairly straight forward, and each has a constructor which takes a `Codec` parameter for each associated type with the collection. + + + +- `ListCodec`: A codec for `List`. You can also make a list by calling `listOf` on a `Codec`. +- `SimpleMapCodec`: A codec for `Map` with a known set of `K`s. This known set is an additional parameter. +- `UnboundedMapCodec`: A codec for `Map`. +- `PairCodec`: A codec of a `Pair`. +- `EitherCodec`: A codec of `Either`. + +## The `RecordCodecBuilder` +Oh no. There is an explicit type name in a header. This is going to get crazy. + +Let's start it off simple: a `RecordCodecBuilder` creates a `Codec` that can directly serialize and deserialize a Java object. +While it has the name `Record` in it, this isnt specific to the `record`s in Java, but it's often a good idea to use `record`s. +Going over a basic example will probably be the clearest here. + +```java +record Foo(int bar, List baz, String qux) { + public static final Codec CODEC = + RecordCodecBuilder.create( + instance -> + instance.group( + Codec.INT.fieldOf("bar").forGetter(Foo::bar), + Codec.BOOL.listOf().fieldOf("baz").forGetter(Foo::baz), + Codec.STRING.optionalFieldOf("qux", "default string").forGetter(Foo::qux) + ).apply(instance, Foo::new) + ); +} +``` + +Ok, thats not too bad. +One nice thing about this is that Mojang did a lot of magic behind the scene to make this feel nice. +Trust me, I (OroArmor) once wrote a similar library and partially gave up on doing the right thing. + +Now, `RecordBuilder.create` takes a lambda, providing an `instance` parameter. +The main bulk of this lambda is the `group` method. +You can pass up to 16 different `Codec`s turned into fields through this method. + +Turning a `Codec` into a field follows a fairly simple pattern. + +First, you start with the `Codec` (`Codec.INT`, `Codec.BOOL.listOf()`, and `Codec.STRING`). + +Then, you can call one of two methods: + - `fieldOf`, which takes a string parameter for the serialized field name. + - `optionalFieldOf`, also takes the same name parameter. + By default this represents an `Optional`, with `T` being the `Codec` type. + There is a method overload, like used in the example, which allows you to provide a default value and not have to store an `Optional`. + +Finally, you call `forGetter`, which takes a `Function`, with `O` being the object you are trying to serialize, and `T` being the type of the field on the object. + +Now, let's see a serialized `new Foo(8, List.of(true, false, true), "string")` in json: +```json +{ + "bar": 8, + "baz": [true, false, true], + "qux": "string" +} +``` + +Now, since we had an optional field, let's see what this json looks like: +```json +{ + "bar": -2, + "baz": [] +} +``` +Once deserialized, we get an object equal to `new Foo(-2, List.of(), "default string")` \ No newline at end of file From 0bf85f41f2ff2aff5552c5d80840c0bde7b6a08f Mon Sep 17 00:00:00 2001 From: Eli Orona Date: Thu, 1 Aug 2024 11:04:20 -0700 Subject: [PATCH 2/5] Apply suggestions from code review Co-authored-by: Pyrofab --- wiki/misc/codecs/en.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/wiki/misc/codecs/en.md b/wiki/misc/codecs/en.md index 7b721b48..af107c9d 100644 --- a/wiki/misc/codecs/en.md +++ b/wiki/misc/codecs/en.md @@ -5,11 +5,11 @@ WARNING: This tutorial expects a strong understanding of both Java basics and ge The `Codec` class from [DataFixerUpper](https://github.com/Mojang/DataFixerUpper) is the backbone of content serialization and deserialization. It provides an abstraction layer between Java Objects and serialization types, such as `json`, `nbt`, and more. Each `Codec` is made of a `Encoder` and a `Decoder`, but you rarely need to create a raw `Codec` from scratch. -Let's start off with the primative `Codec`s. +Let's start off with the primitive `Codec`s.
-## Primative Codecs +## Primitive Codecs Mojang thankfully builds in many default `Codec` implementations, making our lives easier as most objects are composed of these few types. We will cover building `Codec`s composed of other `Codec`s later on. @@ -64,7 +64,7 @@ The `decode` method on a codec takes two values, an `ops` and an `input`. As shown in the comments above, the type of the input and a generic parameter on `ops` must match. This is because the `ops` needs to know about how the `input` functions. In this example, we use `com.mojang.serialization.JsonOps.INSTANCE`, which operates on json elements from `gson`. -We then pass in a `JsonPrimative` with a value of `true` for this example. +We then pass in a `JsonPrimitive` with a value of `true` for this example. Finally, the `com.mojang.serialization.DataResult>` type allows us to encode more information than just the result. First off, the `A` type is the output of the `Codec`, which is `Boolean` in this case, and the `T` is the same as the input. @@ -72,7 +72,7 @@ First off, the `A` type is the output of the `Codec`, which is `Boolean` in this Let's look more into the `DataResult`: ```java ... -.result() // Option> +.result() // Optional> ... ``` From 96939996523abe99131d49207333ef19a34137d4 Mon Sep 17 00:00:00 2001 From: Eli Orona Date: Thu, 1 Aug 2024 15:03:59 -0700 Subject: [PATCH 3/5] Clean up what has been done --- wiki/misc/codecs/en.md | 57 +++++++++++++++++++++++++++++++----------- 1 file changed, 43 insertions(+), 14 deletions(-) diff --git a/wiki/misc/codecs/en.md b/wiki/misc/codecs/en.md index af107c9d..9a7b5f6a 100644 --- a/wiki/misc/codecs/en.md +++ b/wiki/misc/codecs/en.md @@ -1,6 +1,6 @@ # Codecs -WARNING: This tutorial expects a strong understanding of both Java basics and generics. +**WARNING**: This tutorial expects a strong understanding of both Java basics and generics. The `Codec` class from [DataFixerUpper](https://github.com/Mojang/DataFixerUpper) is the backbone of content serialization and deserialization. It provides an abstraction layer between Java Objects and serialization types, such as `json`, `nbt`, and more. @@ -63,7 +63,7 @@ Codec.BOOL.decode( The `decode` method on a codec takes two values, an `ops` and an `input`. As shown in the comments above, the type of the input and a generic parameter on `ops` must match. This is because the `ops` needs to know about how the `input` functions. -In this example, we use `com.mojang.serialization.JsonOps.INSTANCE`, which operates on json elements from `gson`. +In this example, we use `com.mojang.serialization.JsonOps.INSTANCE`, which operates on JSON elements from `gson`. We then pass in a `JsonPrimitive` with a value of `true` for this example. Finally, the `com.mojang.serialization.DataResult>` type allows us to encode more information than just the result. @@ -92,9 +92,11 @@ We use `get` to unbox the `Option`. Generally this is unsafe to do, an IntelliJ In this case we know that it is safe due to the simplicity of the example. Then finally, we call `getFirst` on `com.mojang.datafixers.util.Pair` to get the first half of the pair -Wow. That sure was a lot. -Now, I know this may seem like the `Codec` system is complicated right now, but unfortunately we have only scratched the surface. -Yep. That's right, it gets so much worse. +Wow. That sure was a lot. Fortunately, most of time you only need to provide the `Codec`, and Minecraft will do the (de)serialization for you. + + +Now, this may seem like the `Codec` system is complicated right now, and you would be right. We have only scratched the surface of how powerful codecs are. However, I hope you are beginning to see the masterpiece that they are. + Let's step back and look at some more `Codec` types.
@@ -105,14 +107,37 @@ These collection `Codec`s are fairly straight forward, and each has a constructo -- `ListCodec`: A codec for `List`. You can also make a list by calling `listOf` on a `Codec`. -- `SimpleMapCodec`: A codec for `Map` with a known set of `K`s. This known set is an additional parameter. -- `UnboundedMapCodec`: A codec for `Map`. -- `PairCodec`: A codec of a `Pair`. -- `EitherCodec`: A codec of `Either`. +### `ListCodec` +A codec for a `List`. +You can make a `ListCodec` by calling +- `listOf()` on an instance of `Codec`. +- `Codec.list(Codec)` with the codec for the element type. + +There are also methods that allow you to set a minimum and maximum size for the list. + +### `SimpleMapCodec` +A codec for a `Map` with a known set of keys of type `K`. This known set is an additional parameter. Because of this, we usually recommend using `UnboundedMapCodec` + +You create a `SimpleMapCodec` by calling `Codec.simpleMap(Codec, Codec, Keyable)`. + +### `UnboundedMapCodec` +A codec for a `Map`. + +You create a `UnboundedMapCodec` by calling `Codec.unboundedMap(Codec, Codec)`. + +### `PairCodec` +A codec of a `Pair`. This is fairly rare as it isn't too often you have just two values you want to serialize together without names for the fields. + +You create a `PairCodec` by calling `Codec.pair(Codec, Codec)`. + +### `EitherCodec` +A codec of an `Either`. This is part of the strength of `Codec`s. This allows you to represent one value with multiple different serializers, and it will choose the correct one based on the type. For example, if you want something to serialize to either a `String` or and `Integer`, you would use an `EitherCodec`. We will cover this more in depth later on, as there are still a few more concepts to go over before the full use of `EitherCodec` becomes apparant. + +You create a `EitherCodec` by calling `Codec.either(Codec, Codec)`. + ## The `RecordCodecBuilder` -Oh no. There is an explicit type name in a header. This is going to get crazy. +Oh no. There a full type name from DFU in a header. This is going to get crazy. Let's start it off simple: a `RecordCodecBuilder` creates a `Codec` that can directly serialize and deserialize a Java object. While it has the name `Record` in it, this isnt specific to the `record`s in Java, but it's often a good idea to use `record`s. @@ -126,7 +151,7 @@ record Foo(int bar, List baz, String qux) { instance.group( Codec.INT.fieldOf("bar").forGetter(Foo::bar), Codec.BOOL.listOf().fieldOf("baz").forGetter(Foo::baz), - Codec.STRING.optionalFieldOf("qux", "default string").forGetter(Foo::qux) + Codec.STRING.optionalFieldOf("qux", "default").forGetter(Foo::qux) ).apply(instance, Foo::new) ); } @@ -134,7 +159,7 @@ record Foo(int bar, List baz, String qux) { Ok, thats not too bad. One nice thing about this is that Mojang did a lot of magic behind the scene to make this feel nice. -Trust me, I (OroArmor) once wrote a similar library and partially gave up on doing the right thing. +Trust us, there have been a few Quilt developers who have tried making a similar library (OroArmor) and gave up on doing the right thing. Now, `RecordBuilder.create` takes a lambda, providing an `instance` parameter. The main bulk of this lambda is the `group` method. @@ -152,6 +177,10 @@ Then, you can call one of two methods: Finally, you call `forGetter`, which takes a `Function`, with `O` being the object you are trying to serialize, and `T` being the type of the field on the object. +Once you have finished calling group, the next thing to do is chain an `apply` call. The first parameter is always `instance`, and the second parameter is usually the method handle to constructor for the object you are making the `Codec` for. One thing that is important is to make sure that the order of the constructor parameters and the order of the fields in the group match, as this could cause either runtime or compile time errors. + + +### Serialization Now, let's see a serialized `new Foo(8, List.of(true, false, true), "string")` in json: ```json { @@ -168,4 +197,4 @@ Now, since we had an optional field, let's see what this json looks like: "baz": [] } ``` -Once deserialized, we get an object equal to `new Foo(-2, List.of(), "default string")` \ No newline at end of file +Once deserialized, we get an object equal to `new Foo(-2, List.of(), "default")` \ No newline at end of file From 6db38ac36bc5779e77e29705eb6ad46cba9dbbe0 Mon Sep 17 00:00:00 2001 From: Eli Orona Date: Thu, 1 Aug 2024 16:03:51 -0700 Subject: [PATCH 4/5] Codec.dispatch --- wiki/misc/codecs/en.md | 61 +++++++++++++++++++++++++++++++++++++++--- 1 file changed, 58 insertions(+), 3 deletions(-) diff --git a/wiki/misc/codecs/en.md b/wiki/misc/codecs/en.md index 9a7b5f6a..25a9bae6 100644 --- a/wiki/misc/codecs/en.md +++ b/wiki/misc/codecs/en.md @@ -153,7 +153,7 @@ record Foo(int bar, List baz, String qux) { Codec.BOOL.listOf().fieldOf("baz").forGetter(Foo::baz), Codec.STRING.optionalFieldOf("qux", "default").forGetter(Foo::qux) ).apply(instance, Foo::new) - ); + ).codec(); } ``` @@ -175,10 +175,11 @@ Then, you can call one of two methods: By default this represents an `Optional`, with `T` being the `Codec` type. There is a method overload, like used in the example, which allows you to provide a default value and not have to store an `Optional`. -Finally, you call `forGetter`, which takes a `Function`, with `O` being the object you are trying to serialize, and `T` being the type of the field on the object. +Next, you call `forGetter`, which takes a `Function`, with `O` being the object you are trying to serialize, and `T` being the type of the field on the object. Once you have finished calling group, the next thing to do is chain an `apply` call. The first parameter is always `instance`, and the second parameter is usually the method handle to constructor for the object you are making the `Codec` for. One thing that is important is to make sure that the order of the constructor parameters and the order of the fields in the group match, as this could cause either runtime or compile time errors. +Finally, you call `.codec()` on the returned value, since the returned value would be a `MapCodec` otherwise. Even though it sounds like it, a `MapCodec` isn't like a Java `Map`. While we won't cover `MapCodec`s much, they do have their uses which will be explained later. ### Serialization Now, let's see a serialized `new Foo(8, List.of(true, false, true), "string")` in json: @@ -197,4 +198,58 @@ Now, since we had an optional field, let's see what this json looks like: "baz": [] } ``` -Once deserialized, we get an object equal to `new Foo(-2, List.of(), "default")` \ No newline at end of file +Once deserialized, we get an object equal to `new Foo(-2, List.of(), "default")` + +## `Codec.dispatch` + +Dispatched `Codec`s are probably both the most complex feature of `Codec`, but also the most elegant and powerful. What if I told you that something you already deserialized could change the rest of the deserialization? An example will probably be the best way to start off: + +```java +interface Dispatched { + String type(); + + Map> TYPES = Map.of( + "a", A.CODEC, + "b", B.CODEC + ); + + Codec CODEC = Codec.STRING.dispatch( + Dispatched::type, + TYPES::get + ); +} + +record A(int a) implements Dispatched { + public static MapCodec CODEC = + RecordCodecBuilder.create( + instance -> + instance.group( + Codec.INT.fieldOf("a").forGetter(A::a) + ).apply(instance, A::new) + ); + + public String type() { return "a"; } +} + +record B(String b) implements Dispatched { + public static MapCodec CODEC = + RecordCodecBuilder.create( + instance -> + instance.group( + Codec.STRING.fieldOf("b").forGetter(B::b) + ).apply(instance, B::new) + ); + + public String type() { return "b"; } +} +``` + +Alright, thats a *lot* of code, but most of it is fairly straight forward. First we have an interface that defines one method, `String type()`. This is so that we can know the types of any implementing classes. We then have a `TYPES` map which is a map of the type of object to its `Codec`. Next, we have the dispatch `Codec`. Since our type's type is `String`, we start off with `Codec.STRING`, since this is how to serialize/deserialize the type. Then we call `dispatch`. The first parameter is a `Function` that takes in the object (a `Dispatched`), and returns the type (a `String`), and here we use the method reference. The second parameter is another `Function`, but takes in a `String` and returns a `MapCodec` (explained shortly). Here we use a method reference to `TYPES.get(s)` to keep the code cleaner. + +Finally, we have two different records implementing `Dispatched` with their own `MapCodec`s. Now, `MapCodec`s can be thought of like `Map`, where the keys are field names. While it's certainly more flexible than that, this covers 90+% of use cases. The reason we use a `MapCodec` is because a `MapCodec` can be inlined into a larger object. I can't put a `Codec.BOOL` into an object without some form of name for it. + +Now, lets look at some serializations: +| Java | JSON | +|:---:|:---:| +| `new A(10)` | `{"type": "a", "a": 10}` | +| `new B("str")` | `{"type": "b", "b": "str"}` | \ No newline at end of file From 74d48bb3140e208298b8c2704c7f9059d981536c Mon Sep 17 00:00:00 2001 From: Eli Orona Date: Thu, 5 Sep 2024 14:34:09 -0700 Subject: [PATCH 5/5] Update en.md --- wiki/misc/codecs/en.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/wiki/misc/codecs/en.md b/wiki/misc/codecs/en.md index 25a9bae6..1902ea85 100644 --- a/wiki/misc/codecs/en.md +++ b/wiki/misc/codecs/en.md @@ -252,4 +252,4 @@ Now, lets look at some serializations: | Java | JSON | |:---:|:---:| | `new A(10)` | `{"type": "a", "a": 10}` | -| `new B("str")` | `{"type": "b", "b": "str"}` | \ No newline at end of file +| `new B("str")` | `{"type": "b", "b": "str"}` |