Skip to content

StephenButtolph/canoto

Repository files navigation

Canoto

Canoto is a serialization format designed to be:

  1. Fast
  2. Compact
  3. Canonical
  4. Backwards compatible
  5. Read compatible with Protocol Buffers.

Install

go install github.com/StephenButtolph/canoto/canoto@latest

Define Messages

Canoto messages are defined as normal golang structs:

type ExampleStruct0 struct {
	Int32              int32           `canoto:"int,1"`
	Int64              int64           `canoto:"int,2"`
	Uint32             uint32          `canoto:"uint,3"`
	Uint64             uint64          `canoto:"uint,4"`
	Sfixed32           int32           `canoto:"fint32,5"`
	Fixed32            uint32          `canoto:"fint32,6"`
	Sfixed64           int64           `canoto:"fint64,7"`
	Fixed64            uint64          `canoto:"fint64,8"`
	Bool               bool            `canoto:"bool,9"`
	String             string          `canoto:"string,10"`
	Bytes              []byte          `canoto:"bytes,11"`
	OtherStruct        ExampleStruct1  `canoto:"value,12"`
	OtherStructPointer *ExampleStruct1 `canoto:"pointer,13"`

	canotoData canotoData_ExampleStruct0
}

type ExampleStruct1 struct {
	Int32 int32 `canoto:"int,536870911"`

	canotoData canotoData_ExampleStruct1
}

All structs must include a field called canotoData that will cache the results of calculating the size of the struct.

The type canotoData_${structName} is automatically generated by Canoto.

For a given Struct, Canoto automatically implements the Message interface:

// Message defines a type that can be a stand-alone Canoto message.
type Message interface {
	Field
	// MarshalCanoto returns the Canoto representation of this message.
	//
	// It is assumed that this message is ValidCanoto.
	MarshalCanoto() []byte
	// UnmarshalCanoto unmarshals a Canoto-encoded byte slice into the message.
	UnmarshalCanoto(bytes []byte) error
}

// Field defines a type that can be included inside of a Canoto message.
type Field interface {
	// CanotoSpec returns the specification of this canoto message.
	//
	// If there is not a valid specification of this type, it returns nil.
	CanotoSpec(types ...reflect.Type) *Spec
	// MarshalCanotoInto writes the field into a [Writer] and returns the
	// resulting [Writer].
	//
	// It is assumed that CalculateCanotoCache has been called since the last
	// modification to this field.
	//
	// It is assumed that this field is ValidCanoto.
	MarshalCanotoInto(w Writer) Writer
	// CalculateCanotoCache populates internal caches based on the current
	// values in the struct.
	CalculateCanotoCache()
	// CachedCanotoSize returns the previously calculated size of the Canoto
	// representation from CalculateCanotoCache.
	//
	// If CalculateCanotoCache has not yet been called, or the field has been
	// modified since the last call to CalculateCanotoCache, the returned size
	// may be incorrect.
	CachedCanotoSize() uint64
	// UnmarshalCanotoFrom populates the field from a [Reader].
	UnmarshalCanotoFrom(r Reader) error
	// ValidCanoto validates that the field can be correctly marshaled into the
	// Canoto format.
	ValidCanoto() bool
}

Generate

In order to generate canoto information for all of the structs in a file, simply run the canoto command with one or more files.

canoto example0.go example1.go

The above example will generate example0.canoto.go and example1.canoto.go.

The corresponding proto file for a canoto file can also be generated by adding the --proto flag.

canoto --proto example.go

The above example will generate example.canoto.go and example.proto.

go:generate

To automatically generate the .canoto.go version of a file, it is recommended to use go:generate. By using go run inside the go:generate, the version of the command being run will be taken from the local go.mod.

Place

//go:generate go run github.com/StephenButtolph/canoto/canoto $GOFILE

at the top of a file to update the .canoto.go version of the file every time go generate ./... is run.

Best Practices

canoto only inspects a single golang file at a time, so it is recommended to define nested messages in the same file to be able to generate the most useful proto file.

Additionally, while fully supported in the canoto output, type aliases and generic types will result in proto files with default types. It is still guaranteed for the generated proto file to be able to parse canoto data, but the types may not be as specific as they could be.

If type aliases are needed, it may make sense to modify the generated proto file to specify the most specific proto type possible.

Generics

Canoto supports generic structs through the canoto.FieldPointer[T] constraint. To guarantee safe usage of a struct, type constraints can be used to implement a struct with a generic field T. Canoto inspects the generic types, so the struct must include a type parameter of canoto.FieldPointer[T]. For example:

type GenericField[T any, _ canoto.FieldPointer[T]] struct {
	Value   T  `canoto:"value,1"`
	Pointer *T `canoto:"pointer,2"`

	canotoData canotoData_GenericField
}

If canoto.FieldPointer is aliased to a different type or is otherwise re-implemented, Canoto will not be able to correctly tie the type constraints together.

Pass-By-Value Messages

The auto-generated canotoData struct utilizes atomic reads and writes to ensure that MarshalCanoto can be called at the same time on multiple threads. Concurrent calls to MarshalCanoto are expected for what appears to be a read-only method.

However, concurrently marshalling and passing messages by value may cause race detection failures. To explicitly disallow passing messages by value, the canoto:"nocopy" tag can be added to the canotoData field. This results in the NoCopy pattern being used to warn against incorrect usage.

For example:

type NotPassableByValue struct {
	Int int64 `canoto:"int,1"`

	canotoData canotoData_NotPassableByValue `canoto:"nocopy"`
}

Standalone Implementations

In some instances, it may be desirable for the generated code to avoid introducing the dependency on this repo into the go.mod file. As an example, if the user must support having multiple versions of canoto utilized in the same application.

There are two CLI flags that enable using canoto without impacting the go.mod.

  1. --library when specified generates the canoto library in the provided folder. For example --library="./internal" generates the canoto library in the ./internal/canoto package.
  2. --import specifies the canoto library to depend on in any generated code.

For example:

canoto --library="./internal" --import="github.com/StephenButtolph/canoto/internal/canoto" ./canoto.go

Will generate the canoto library in ./internal/canoto and will import "github.com/StephenButtolph/canoto/internal/canoto" rather than the default "github.com/StephenButtolph/canoto" when generating ./canoto.canoto.go.

Custom Identifiers

Six CLI flags control the naming of generated Go identifiers:

Flag Default Scope Description
--format-cache canotoData_{struct} Message Name of the generated cache struct type
--format-number canotoNumber_{cStruct}__{cField} Field Name of generated field number constants
--format-tag canotoTag_{cStruct}__{cField} Field Name of generated field tag constants
--format-oneof-type canotoOneOfType_{cStruct}__{cOneOf} OneOf Name of generated oneOf type aliases
--format-oneof-unset canotoOneOfUnset_{cStruct}__{cOneOf} OneOf Name of generated unset oneOf constants
--format-oneof-field canotoOneOf_{cStruct}__{cField} Field Name of generated oneOf field constants

Each flag accepts a template string. The available variables depend on the flag's scope, and each scope extends the one above it:

Scope Variables
Message {struct}, {cStruct}
OneOf {struct}, {cStruct}, {oneOf}, {cOneOf}
Field {struct}, {cStruct}, {oneOf}, {cOneOf}, {field}, {cField}
Variable Description
{struct} Original struct name (e.g. My_Struct)
{cStruct} Canonicalized struct name: _ replaced with _1 (e.g. My_1Struct)
{oneOf} Original oneOf name
{cOneOf} Canonicalized oneOf name: _ replaced with _1
{field} Original field name
{cField} Canonicalized field name: _ replaced with _1

The __1 canonicalization prevents ambiguity with __, which can be used as a separator between variables.

For example:

canoto \
  --format-cache="cache_{struct}" \
  --format-number="fieldNum_{cStruct}__{cField}" \
  --format-tag="fieldTag_{cStruct}__{cField}" \
  --format-oneof-type="{struct}{oneOf}" \
  --format-oneof-unset="{struct}{oneOf}__Unset" \
  --format-oneof-field="{struct}{oneOf}__{field}" \
  example.go

For a struct Foo with a OneOf Type and a field Bar, this would generate:

type cache_Foo struct { ... }

const (
    fieldNum_Foo__Bar = 1
    fieldTag_Foo__Bar = "\x0a"
)

type FooType uint32

const (
    FooType__Unset FooType = 0
    FooType__Bar   FooType = fieldNum_Foo__Bar
)

Supported Types

go type canoto type proto type wire type
int8 int sint32 varint
int16 int sint32 varint
int32 int sint32 varint
int64 int sint64 varint
uint8 uint uint32 varint
uint16 uint uint32 varint
uint32 uint uint32 varint
uint64 uint uint64 varint
int32 fint32 sfixed32 i32
uint32 fint32 fixed32 i32
int64 fint64 sfixed64 i64
uint64 fint64 fixed64 i64
bool bool bool varint
string string string len
[]byte bytes bytes len
[x]byte fixed bytes bytes len
T Message value message len
*T Message pointer message len
[]int8 repeated int repeated sint32 len
[]int16 repeated int repeated sint32 len
[]int32 repeated int repeated sint32 len
[]int64 repeated int repeated sint64 len
[]uint8 repeated uint repeated uint32 len
[]uint16 repeated uint repeated uint32 len
[]uint32 repeated uint repeated uint32 len
[]uint64 repeated uint repeated uint64 len
[]int32 repeated fint32 repeated sfixed32 len
[]uint32 repeated fint32 repeated fixed32 len
[]int64 repeated fint64 repeated sfixed64 len
[]uint64 repeated fint64 repeated fixed64 len
[]bool repeated bool repeated bool len
[]string repeated string repeated string len
[][]byte repeated bytes repeated bytes len
[][x]byte repeated fixed bytes repeated bytes len
[]T Message repeated value repeated message len
[]*T Message repeated pointer repeated message_pointer len
[x]int8 fixed repeated int repeated sint32 len
[x]int16 fixed repeated int repeated sint32 len
[x]int32 fixed repeated int repeated sint32 len
[x]int64 fixed repeated int repeated sint64 len
[x]uint8 fixed repeated uint repeated uint32 len
[x]uint16 fixed repeated uint repeated uint32 len
[x]uint32 fixed repeated uint repeated uint32 len
[x]uint64 fixed repeated uint repeated uint64 len
[x]int32 fixed repeated fint32 repeated sfixed32 len
[x]uint32 fixed repeated fint32 repeated fixed32 len
[x]int64 fixed repeated fint64 repeated sfixed64 len
[x]uint64 fixed repeated fint64 repeated fixed64 len
[x]bool fixed repeated bool repeated bool len
[x]string fixed repeated string repeated string len
[x][]byte fixed repeated bytes repeated bytes len
[x][y]byte fixed repeated fixed bytes repeated bytes len
[x]T Message fixed repeated value repeated message len
[x]*T Message fixed repeated pointer repeated message_pointer len

OneOf Fields

OneOfs allow message definitions to declare that a set of fields are mutually exclusive. When serialized, Canoto ensures that at most one field in each OneOf group appears on the wire.

Each OneOf has a group name, specified as an additional option after the field number in the struct tag:

type OneOf struct {
	Int  int64 `canoto:"int,1,Type"`
	Bool bool  `canoto:"bool,2,Type"`

	canotoData canotoData_OneOf
}

For every OneOf group, the generated code includes a helper method that lets you quickly determine which field was populated.

In the example above, the method CachedWhichOneOfType will be generated, and it returns a generated typed enum for the Type OneOf.

All OneOf accessor methods follow the naming pattern:

CachedWhichOneOf<GroupName>

By default, that enum uses collision-resistant canoto... identifiers. If you want the enum type and its constants exported, customize --format-oneof-type, --format-oneof-unset, and --format-oneof-field.

The enum's underlying values still match the generated field number constants, so exported number constants can be used in comparisons.

After the cache has been initialized by calling one of UnmarshalCanoto, UnmarshalCanotoFrom, or CalculateCanotoCache, the method returns the populated field's enum value. If no field in the OneOf group was set, the method returns 0.

Non-standard encoding

It is valid to define a Field that implements a non-standard format. However, this format should still be canonical and the corresponding Proto file should report opaque bytes.

Why not Proto?

Proto is a fast, compact encoding format with extensive language support. However, Proto is not canonical.

Proto is designed to be forwards-compatible. Almost by definition, a forwards-compatible serialization format cannot be canonical. The format of a field cannot be validated to be canonical if the expected type of the field is not known during decoding.

Why is being canonical important?

In some cases, non-canonical serialization formats are subtle to work with.

For example, if the hash of the serialized data is important or if the serialized data is cryptographically signed.

In order to ensure that the hash of the serialized data does not change, it is important to carefully avoid re-serializing a message that was previously serialized.

For canonical serialization formats, the hash of the serialized data is guaranteed never to change. Every correct implementation of the format will produce the same hash.

Why be read compatible with Proto?

By being read compatible with Proto, users of the Canoto format inherit some of Proto's cross-language support.

If an application only needs to read Canoto messages, but not write them, it can simply treat the Canoto message as a Proto message.

Is Canoto Fast?

Canoto is typically more performant for both serialization and deserialization than Proto. However, Proto does not typically validate that fields are canonical. If a field is expensive to inspect, it's possible Canoto can be slightly slower.

Canoto is optimized to perform no unnecessary memory allocations, so careful management to ensure messages are stack allocated can significantly improve performance over Proto.

Is Canoto Forwards Compatible?

No. Canoto chooses to be a canonical serialization format rather than being forwards compatible.

About

Canonical Proto formatting in golang

Resources

License

Stars

Watchers

Forks

Packages

 
 
 

Languages