# Device path code generation

This module of `xtask` generates code for reading and building UEFI
device paths. The command is `cargo xtask gen-code`.

There are a large number of device path nodes; the UEFI Specification
devotes some 40 pages to describing them all. These definitions are
specified in Rust-like code in [`spec.rs`], and the code generator
produces [`src/proto/device_path/device_path_gen.rs`] containing the
final Rust code. We check this generated file into the git repo, so
there's no need for a `build.rs`.

For each device path node, we generate a packed struct and a builder
struct. The packed struct corresponds almost exactly to the node
structure in the UEFI Specification and is used for read-only access to
a node. The builder struct is used to create new nodes.

## `spec.rs`

The `spec.rs` file is the input that describes each node. The code in
this file is syntactically valid Rust code, but it's not included
directly with a `mod` statement anywhere. Instead, the code is parsed
with [`syn`] and processed in various ways.

The file is organized with modules, one for each [`DeviceType`]. Within
each module are the node definitions. Each node is a `struct` marked
with a `#[node(...)]` attribute, which can contain the following
properties:
* `static_size = <N>` (required): Specifies the expected static size (in
  bytes) of the node. This excludes dynamically-sized fields. This is
  compared against the internally-calculated size of the struct to help
  validate that the node definition is correct. The UEFI Specification
  usually says what this value is when describing the node, although a
  few are missing or incorrect.
* `sub_type` (optional): Sets the [`DeviceSubType`]. This is usually
  inferred from the node's name and the module it's in, but there are a
  few edge cases where it needs to be manually specified.
  
A node struct can be a unit struct, or contain some number of fields. By
default, fields are used unchanged in both the packed and builder
structs. Fields can optionally be marked with a `[#node(...)]` attribute
to alter the code generation, with the following optional properties:
* `no_get_func`: No getter will be generated for this field.
* `custom_get_impl`: A getter will be generated for this field, but the
  autogenerated implementation will be replaced with a call to
  `self.get_<field_name>`.
* `build_type = <false|"string">`: If set to `false`, no field will be
  generated in the builder struct. If set to a string, the contents of
  the string will be parsed as a type to use for the build field.
* `custom_build_impl`: When building a node, the autogenerated
  implementation for this field will be replaced with a call to
  `self.build_<field_name>`. If the field is a DST, the destination
  buffer will be passed in. Otherwise, the type is copyable and the
  function will just return the value directly.
* `custom_build_size_impl`: When calculating the size of node before
  building it, the autogenerated implementation for this field will be
  replaced with a call to `self.build_size_<field_name>`.
  
Any items in a module that are not node structs will be passed through
unmodified to the generated output file. An item can be annotated with a
`#[build]` attribute to put it in the corresponding build module,
otherwise it will go in the corresponding packed module.

## Design notes

### Why have two structs for each node type?

Having two structs for each node type, a packed struct and a builder
struct, is motivated primarily by DST nodes. Many nodes end in a
dynamically-sized slice, which prevents the normal struct construction
syntax from being used. One option would be to generate a construction
function that takes an argument for each field, but that can negatively
impact readability since there's no named-argument syntax. Having a
separate builder struct allows us to use the normal struct construction
syntax. DST fields in the builder are replaced with slice references.

### Why code generation?

With the need for two structs per node type established, the need for
some kind of code generation becomes clear: having to actually write
everything out by hand twice would be a huge pain and bug prone.

Code generation is also very helpful for all the code to write out
builder nodes into the packed form, and for generating other functions
such as debug and conversion impls.

### Why this type of code generation?

Rust offers a few built-in options for code generation: declarative
macros, proc macros, and `build.rs`.

Declarative macros can get quite hard to read for anything too
complicated. There are a fair number of idiosyncratic node types, so a
declarative macro would almost certainly be quite complicated and
therefore hard to read.

A proc macro, on the other hand, would certainly work for this use
case. It can read arbitrary Rust syntax and produce arbitrary Rust
code. The macro itself is fairly normal Rust code and hence can be quite
readable. However, there are a couple drawbacks. First, it makes it
harder for the `uefi` to ever stop having a hard dependency on
`uefi-macros` in the future. Many crates try to make proc macros
optional to improve compilation time, so it would be nice to keep that
option open. Second, the generated code is invisible without special
compilation flags. For a big complicated macro, that makes it more
challenging to get the code generation correct in the first place, and
also makes it harder to provide good errors to end-user code that uses
the generated items, since the error message will point to the input to
the macro, not the implicit generated output.

Next up there's `build.rs`, which is run automatically as part of the
build and can generate arbitrary output files. Using `build.rs` we could
use [`syn`] and [`quote`] just like a proc macro to specify nodes in a
convenient format and generate code in a real file. That would solve the
"invisible generated code" problem of proc macros, but it has the same
compilation-time drawbacks. It also introduces a new problem: `build.rs`
may not integrate well with non-cargo build systems.

That brings us to the solution actually implemented here, which is to
use [`syn`] and [`quote`] like a proc macro, but to do so "offline" with
an `xtask` command, and store the result in the git repo. This solves
all the previous problems, and the only drawback is that it's possible
to forget to run the command to update the generated code. However, a CI
job verifies that the generated code is up to date so such mistakes
won't make it into `main`.

[`quote`]: https://docs.rs/quote
[`spec.rs`]: ./spec.rs
[`syn`]: https://docs.rs/syn
[`src/proto/device_path/device_path_gen.rs`]: ../../../src/proto/device_path/device_path_gen.rs
[`DeviceType`]: https://docs.rs/uefi/latest/uefi/proto/device_path/struct.DeviceType.html
[`DeviceSubType`]: https://docs.rs/uefi/latest/uefi/proto/device_path/struct.DeviceSubType.html
