Abhinav Gupta | About

How to write flexible YAML shapes in Go

Introduction

With gopkg.in/yaml.v2, you can evolve a YAML shape in a backwards-compatible manner by implementing the yaml.Unmarshaler interface on your type.

Consider the following shape.

type Config struct {
  Users []string `yaml:"users"`
}

Valid YAML input for this shape takes the following form.

users:
  - alice
  - bob
  - carol
  - dave

Suppose that the program evolves over time, and there is now need to optionally specify a role for users. Valid roles are: User, Mod, and Admin. User is the default role.

type Role int

const (
  RoleUser Role = iota
  RoleMod
  RoleAdmin
)

func (r Role) String() string {
  switch r {
  case RoleUser:
    return "user"
  case RoleMod:
    return "mod"
  case RoleAdmin:
    return "admin"
  default:
    // unknown role
    return fmt.Sprintf("Role(%d)", int(r))
  }
}

Tip: Prefer to start iota-based enumerations so that the default is zero. For example, since User is the default here, its value is zero.

If there is no obvious default, start enumerations at one.

False start

One way to quickly hack in support for roles is by adding a separate roles section.

roles:
  admins: [alice]
  mods: [dave]

This is brittle, complex, and inflexible.

Evolve the schema

A better approach is to evolve the YAML schema so that each item in users can be one of two things:

The above will now take the following form.

users:
  - name: alice
    role: admin
  - bob
  - carol
  - name: dave
    role: mod

The entries for Bob and Carol use the old form, and the entries for Alice and Dave use the new form.

Compare this with the False start.

Implement it

To implement this with gopkg.in/yaml.v2, switch the old Config.Users field to a list of newly-declared User structs.

 type Config struct {
-  Users []string `yaml:"users"`
+  Users []*User  `yaml:"users"`
 }
+
+type User struct {
+  Name string `yaml:"name"`
+  Role Role   `yaml:"role"`
+}

Teach the YAML library how to parse Role values from YAML, by implementing encoding.TextUnmarshaler for Role.

// UnmarshalText specifies how to parse a Role from a string.
func (r *Role) UnmarshalText(bs []byte) error {
  switch string(bs) {
  case "user":
    *r = RoleUser
  case "mod":
    *r = RoleMod
  case "admin":
    *r = RoleAdmin
  default:
    return fmt.Errorf("unknown role %q", bs)
  }
  return nil
}

Tip: You can use dmarkham/enumer with the -text flag to generate this automatically instead of writing it by hand.

This gets the implementation as far as supporting objects with a name and role.

users:
  - name: alice
    role: admin
  - name: dave
    role: mod

Make it flexible

To make the new format fully backwards-compatible, it needs to support plain strings for users whose role is user.

Do this by implementing the yaml.Unmarshaler interface for User. The interface expects the following method:

func (*User) UnmarshalYAML(
  unmarshal func(interface{}) error,
) error

When decoding a User, the YAML library will call this method with a reference to a function called unmarshal. This function, when invoked with a pointer to a value, will attempt to decode the underlying YAML into that value. For example,

var s string
err := unmarshal(&s)

The key feature here is this: you can call unmarshal any number of times.

This lets User.UnmarshalYAML attempt to decode its YAML data into a string (for the old form), and if that fails, into a full User object.

func (u *User) UnmarshalYAML(
  unmarshal func(interface{}) error,
) error {
  var name string
  if err := unmarshal(&name); err == nil {
    // The old format was used. Only the name was specified.
    // For example,
    //
    // - bob
    //
    // Set just the name.
    u.Name = name
    return nil
  }

  // The new format was used. A full object was specified.
  // For example,
  //
  // - name: dave
  //   role: mod
  //
  // Decode the whole object.
  type rawUser User
  if err := unmarshal((*rawUser)(u)); err != nil {
    return err
  }

  // Nothing to do. (*rawUser)(u) above hydrated *u.
  return nil
}

Note that the new rawUser type is critical. It wraps User but it does not implement UnmarshalYAML. Therefore, it relies on the YAML library to decode and fill its fields, which in turn fills User. Without it, the second unmarshal call will recurse to infinity.

func (u *User) UnmarshalYAML(
  unmarshal func(interface{}) error,
) error {
  return unmarshal(u) // error: infinite loop!
}

What about yaml.v3?

gopkg.in/yaml.v3 includes a similar yaml.Unmarshaler interface.

type Unmarshaler interface {
  UnmarshalYAML(value *yaml.Node) error
}

Instead of an unmarshal function, yaml.v3 gives UnmarshalYAML a *yaml.Node. This object includes the following method.

// Decode decodes the node and stores its data
// into the value pointed to by v.
func (*Node) Decode(interface{}) error

This method is similar to the unmarshal function.

Change the parameter in User.UnmarshalYAML to a *yaml.Node and all unmarshal(x) function calls to value.Decode(x), and this will work with gopkg.in/yaml.v3.

 func (u *User) UnmarshalYAML(
-  unmarshal func(interface{}) error,
+  value *yaml.Node,
 ) error {
   var name string
-  if err := unmarshal(&name); err == nil {
+  if err := value.Decode(&name); err == nil {
     ...
   }

   ...
-  if err := unmarshal((*rawUser)(u)); err != nil {
+  if err := value.Decode((*rawUser)(u)); err != nil {
     return err
   }

JSON

The same method be adapted for flexible JSON.

The "encoding/json" package from the Go standard library includes a json.Unmarshaler interface.

type Unmarshaler interface {
  UnmarshalJSON([]byte) error
}

Although it does not supply a handy unmarshal function, there is nothing stopping direct use of json.Unmarshal on the provided byte slice.

func (u *User) UnmarshalJSON(data []byte) error {
  var name string
  if err := json.Unmarshal(data, &name); err == nil {
    // {"users": ["carol"]}
    u.Name = name
    return nil
  }

  // {"users": [{"name": "alice", "role": "admin"}]}
  type rawUser User
  if err := json.Unmarshal(data, (*rawUser)(u)); err != nil {
    return err
  }

  return nil
}

Conclusion

Interfaces like yaml.Unmarshaler, its v3 variant, and json.Unmarshaler facilitate evolution of YAML and JSON shapes in a backwards-compatible way.

This pattern likely applies to libraries for other encoding formats too.

Written on 2021-02-24.