specification – coding like a boss

I recently ported Kafka.Protocol‘s source code generation functionality from Text Templating (T4) to a Source Generator, and I thought I would share my experience with how they differ and what to expect.

Text Templating

Text Templating has been around since 2005 and is available on .NET Framework using C# 6. Mono.TextTemplating has been around for a couple of years which supports .NET and C# 10, and recently Visual Studio 2022 started to ship with a revamped CLI tool for text templating.

A text template uses text blocks and directives together to generate source code, it reminds of old ASP Classic in a way, or razor templates. Here’s an example borrowed from Microsoft:

<#@ output extension=".cs" #>
<#@ assembly name="System.Xml" #>
<#
 System.Xml.XmlDocument configurationData = ...; // Read a data file here.
#>
namespace Fabrikam.<#= configurationData.SelectSingleNode("jobName").Value #>
{
  ... // More code here.
}

This is a design time template, it runs when the text template file is saved and produces the output in a separate file. There are also run time templates which can generate code from other code during runtime, which can simplify splitting generated code into multiple files, but requires some other code to run in order to do so. Design time templates produces a single file per text template. There are tooling that can get around this limitation, but it has it’s own limitations.

Source Generator

Source generators were first introduced in .NET 5, and runs during compile time. It has to target .NET Standard 2.0 but can use any C# version. It can generate source code from input like a data model specification or based on objects being compiled. Generated code is added to the compilation, meaning both ordinary written code and source generated code gets compiled together into the same assembly. This means that code generated isn’t written to any files, like for T4 templates, it’s written directly to the output assembly.

Writing Generated Code to Files

Emitting generated code to files can be enabled with some simple project directives:

<EmitCompilerGeneratedFiles>true</EmitCompilerGeneratedFiles>
<CompilerGeneratedFilesOutputPath>Generated</CompilerGeneratedFilesOutputPath>

Make sure to remove any generated files before compilation, remember, the generated source code is already baked into the compilation!

<ItemGroup>
  <Compile Remove="$(CompilerGeneratedFilesOutputPath)/**/*.cs" />
</ItemGroup>

There are no limitations on how many files a generator can produce or where it should be outputted. Storing generated source code in files are great if you want to track how changes in the source generator affects the generated code in source control, specifically if you are generating code from a specification and not content from the compilation.

Using a Generator

To use a source generator, add a reference to it from a project:

<ItemGroup>
  <ProjectReference Include="..\Path\To\Generator\Generator.csproj" 
    OutputItemType="Analyzer" 
    ReferenceOutputAssembly="false" />
</ItemGroup>

…or a NuGet package:

<PackageReference Include="My.Generator" Version="1.2.3" PrivateAssets="all" />

Note the OutputItemType="Analyzer" directive in the project reference directive. It tells the compiler that the project is to be treated as an analyzer instead of being a runtime reference. Output from an analyzer can be found under Dependencies in Visual Studio, and that’s where we find the generated types.

Note that they appear as files, even though they are not, it’s just an identifier. To find them in Visual Studio you would need to search for the type they contain or navigate to them in the Solution Explorer.

It would be possible to include the emitted files in a project and exclude them from compilation, that would make them searchable as any other file, but since they aren’t part of compilation they will lack some analysis disabling some functionality like symbol navigation etc. I recommend keeping emitted files solely for source control purposes.

Limitations

Source Generators, as analyzers, have limited exception handling. All exceptions thrown by a source generator is wrapped by a standard error message and contains very little information of what the problem is.

CSC : warning CS8785: Generator 'SourceGenerator' failed to generate source. It will not contribute to the output and compilation err
ors may occur as a result. Exception was of type 'NullReferenceException' with message 'Object reference not set to an instance of an object.'.

It’s possible to export the full exception including the stack trace by using the ErrorLog directive and output it as a SARIF formatted file. This isn’t great to work with as you’d like proper diagnostic feedback from the compiler directly. A workaround can be to construct a diagnostic error manually and include the stack trace, but neither multiline messages nor the description property is outputted so everything needs to be packed into a single-line message. Locations from stack traces are also problematic with incremental source generators where if a generator reruns with no code changes the stack frame location is gone.

Diagnostic reporting only work with error and warnings, other severities are ignored. A proposal on how informational diagnostic output should work can be found here.

Generated Code and Analyzers

The assemblies containing generated source code have had issues not being properly analyzed due to analyzers not being reloaded when the generated code in an assembly changes, which required deleting the .vs cache directory and restarting Visual Studio to force-reload them. This was resolved in the Visual Studio 2022 17.12 release.

Conclusion

Even though Source Generators still have a few quirks, they are much easier to work with than T4 Text Templates. It enables unit testing, file splitting and does not require running under Windows. They are also easier to distribute as they can be packed in NuGet-packages, and I’ve barely mentioned content based generator, which opens up a whole other world of opportunities! Check out the Source Generator Cook Book to get started.

Path Parameters

Path parameters are parameters defined in a path, for example id in the path /user/{id}. Path parameters can be described using label, matrix or simple styles, which are all defined by RFC6570, URI Template.

Here are some examples using the JSON primitive value 1 for a parameter named id :

Simple: /user/1
Matrix: /user/;id=1
Label: /user/.1

It’s also possible to describe arrays. Using the same parameter as above with two JSON primitive values, 1 and 2, it get’s serialized as:

Simple: /users/1,2
Matrix: /user/;id=1,2
Label: /user/.1.2

Given a JSON object for a parameter named user with the value, { "id": 1, "name": "foo" }, it becomes:

Simple: /user/id,1,name,foo
Matrix: /user/;user=id,1,name,foo
Label: /user/.id.1.name.foo

The explode modifier can be used to enforce composite values (name/value pairs). For primitive values this has no effect, neither for label and simple arrays. With the above examples and styles where explode has effect, here’s the equivalent:

Arrays

Matrix: /user/;id=1;id=2

Objects

Simple: /user/id=1,name=foo
Matrix: /user/;id=1;name=foo
Label: /user/.id=1.name=foo

Query Parameters

Query parameters can be described with form, space delimited, pipe delimited or deep object styles. The form style is defined by RFC6570, the rest are defined by OpenAPI.

Primitives

Using the example from path parameters, a serialized user, ?user={user}, or user id, ?id={id}, defined as a query parameter value would look like:

Form: id=1

Note that the examples doesn’t describe any primitive values for pipe and space delimited styles, even though they are quite similar to the simple style.

Arrays

Form: id=1,2
SpaceDelimited: id=1%202
PipeDelimited: id=1|2

Objects

Form: user=id,1,name,foo
SpaceDelimited: user=id%201%20name%20foo
PipeDelimited: user=id|1|name|foo

Note that the examples lack the parameter name for array and objects, this has been corrected in 3.1.1.

Not defining explode for deepObject style is not applicable, and like path styles, explode doesn’t have effect on primitive values.

Exploded pipe and space delimited parameters are not described in the example, even though they are similar to form. Do note though that neither of them would be possible to parse, as the parameter name cannot be inferred.

With all this in mind here are the respective explode examples:

Arrays

Form: id=1&id=2
SpaceDelimited: id=1%202
PipeDelimited: id=1|2

Objects

Form: id=1&name=foo
SpaceDelimited: id=1%20name=foo
PipeDelimited: id=1|name=foo
DeepObject: user[id]=1&user[name]=foo

Header Parameters

Header parameter can only be described using the simple style.

Given the header user: {user} and id: {id}, a respective header parameter value with simple style would look like:

Primitives

Simple: 1

Arrays

Simple: 1,2

Objects

Simple: id,1,name,foo

Similar to the other parameters described, explode with primitive values have no effect, neither for arrays. For objects it would look like:

Simple: id=1,name=foo

Cookie Parameters

A cookie parameter can only be described with form style, and is represented in a similar way as query parameters. Using the example Cookie: id={id} and Cookie: user={user} a cookie parameter value would look like:

Primitives

Form: id=1

Arrays

Form: id=1,2

Objects

Form: user=id,1,name,foo

Similar to the other parameters described, explode with primitive values have no effect. For arrays and objects it looks like:

Arrays

Form: id=1&id=2

Objects

Form: id=1&name=foo

Note that exploded objects for cookie parameters have the same problem as query parameters; the parameter name cannot be inferred.

Object Complexity

Theoretically an object can have an endless deep property structure, where each property are objects that also have properties that are objects and so on. RFC6570 nor OpenAPI 3.1 defines how deep a structure can be, but it would be difficult to define array items and object properties as objects in most styles.

As OpenAPI provides media type objects as a complement for complex parameters, it’s advisable to use those instead in such scenarios.

OpenAPI.ParameterStyleParsers

To support parsing and serialization of style defined parameters, I’ve created a .NET library, OpenAPI.ParameterStyleParsers. It parses style serialized parameters into the corresponding JSON instance and vice versa. It supports all examples defined in the OpenAPI 3.1 specification, corrected according to the inconsistencies described earlier. It only supports arrays with primitive item values and objects with primitive property values, for more complex scenarios use media type objects. The JSON instance type the parameter get’s parsed according to is determined by the schema type keyword. If no type information is defined it falls back on a best effort guess based on the parameter style.

Category: specification

Migrating from Text Templating to Source Generator

Text Templating

Source Generator

Writing Generated Code to Files

Using a Generator

Limitations

Generated Code and Analyzers

Conclusion

Parsing OpenAPI Style Parameters

Path Parameters

Arrays

Objects

Query Parameters

Primitives

Arrays

Objects

Arrays

Objects

Header Parameters

Primitives

Arrays

Objects

Cookie Parameters

Arrays

Objects

Arrays

Objects

Object Complexity

OpenAPI.ParameterStyleParsers