Abhinav Gupta | About

How to write flexible YAML shapes in Go

Table of Contents

1. Introduction

With gopkg.in/yaml.v2, you can evolve a YAML shape in a backwards-compatible manner by implementing the yaml.Unmarshaler interface on your type.

Consider the following shape.

type Config struct {
  Users []string `yaml:"users"`
}

Valid YAML input for this shape takes the following form.

users:
  - alice
  - bob
  - carol
  - dave

Suppose that the program evolves over time, and there is now need to optionally specify a role for users. Valid roles are: User, Mod, and Admin. User is the default role.

type Role int

const (
  RoleUser Role = iota
  RoleMod
  RoleAdmin
)

func (r Role) String() string {
  switch r {
  case RoleUser:
    return "user"
  case RoleMod:
    return "mod"
  case RoleAdmin:
    return "admin"
  default:
    // unknown role
    return fmt.Sprintf("Role(%d)", int(r))
  }
}
💡 Tip

Prefer to start iota-based enumerations so that the default is zero. For example, since User is the default here, its value is zero.

If there is no obvious default, start enumerations at one.

2. False start

One way to quickly hack in support for roles is by adding a separate roles section.

roles:
  admins: [alice]
  mods: [dave]

This is brittle, complex, and inflexible.

Brittle

End users must now duplicate names from the users section into the roles section. It is easy to get a name wrong with a bad copy-paste or a typo.

Complex

The implementation must now cross-validate entries between users and roles.

Inflexible

This leaves no room for more properties to be associated with users. Every new property will end up with its own similar section, further exacerbating the prior two issues.

3. Evolve the schema

A better approach is to evolve the YAML schema so that each item in users can be one of two things:

  • a string specifying the name; this will behave like before

  • a user object specifying the name and other properties

The above will now take the following form.

users:
  - name: alice (1)
    role: admin
  - bob (2)
  - carol
  - name: dave
    role: mod
  1. Alice and Dave use the new form, specifying an object with the fields name and role.

  2. Bob and Carol use the old form, specifying just the name as a string.

Compare this with the false start.

  • user names do not need to be duplicated

  • the implementation validates the shape once

  • the user object has room for new properties

4. Implement it

To implement this with gopkg.in/yaml.v2, switch the old Config.Users field to a list of newly-declared User structs.

 type Config struct {
-  Users []string `yaml:"users"`
+  Users []*User  `yaml:"users"`
 }
+
+type User struct {
+  Name string `yaml:"name"`
+  Role Role   `yaml:"role"`
+}

Teach the YAML library how to parse Role values from YAML, by implementing encoding.TextUnmarshaler for Role.

// UnmarshalText specifies how to parse a Role from a string.
func (r *Role) UnmarshalText(bs []byte) error {
  switch string(bs) {
  case "user":
    *r = RoleUser
  case "mod":
    *r = RoleMod
  case "admin":
    *r = RoleAdmin
  default:
    return fmt.Errorf("unknown role %q", bs)
  }
  return nil
}
💡 Tip

Use dmarkham/enumer with the -text flag to generate this automatically instead of writing it by hand.

This gets the implementation as far as supporting objects with a name and role.

users:
  - name: alice
    role: admin
  - name: dave
    role: mod

But it does not yet support the old form:

users:
  - carol  # ERROR

5. Make it flexible

To make the new format fully backwards-compatible, it needs to support plain strings for users whose role is user.

Do this by implementing the yaml.Unmarshaler interface for User. The interface expects the following method:

func (*User) UnmarshalYAML(
  unmarshal func(interface{}) error,
) error

When decoding a User, the YAML library will call this method with a reference to a function called unmarshal. This function, when invoked with a pointer to a value, will attempt to decode the underlying YAML into that value. For example,

var s string
err := unmarshal(&s)

The key feature here is this: you can call unmarshal any number of times.

This lets User.UnmarshalYAML attempt to decode its YAML data into a string (for the old form), and if that fails, into a full User object.

func (u *User) UnmarshalYAML(
  unmarshal func(interface{}) error,
) error {
  var name string
  if err := unmarshal(&name); err == nil {
    // The old format was used. Only the name was specified.
    // For example,
    //
    // - bob
    //
    // Set just the name.
    u.Name = name
    return nil
  }

  // The new format was used. A full object was specified.
  // For example,
  //
  // - name: dave
  //   role: mod
  //
  // Decode the whole object.
  type rawUser User
  if err := unmarshal((*rawUser)(u)); err != nil {
    return err
  }

  // Nothing to do. (*rawUser)(u) above hydrated *u.
  return nil
}
📢 Important

The new rawUser type is critical. It wraps User but it does not implement UnmarshalYAML. The YAML library decodes and fills its fields, which in turn fills User. Without it, the second unmarshal call will recurse to infinity.

func (u *User) UnmarshalYAML(
  unmarshal func(interface{}) error,
) error {
  return unmarshal(u) // error: infinite loop!
}

6. What about yaml.v3?

gopkg.in/yaml.v3 includes a similar yaml.Unmarshaler interface.

type Unmarshaler interface {
  UnmarshalYAML(value *yaml.Node) error
}

Instead of an unmarshal function, yaml.v3 gives UnmarshalYAML a yaml.Node This object includes the following method, which behaves similar to the unmarshal function in yaml.v2.

// Decode decodes the node and stores its data
// into the value pointed to by v.
func (*Node) Decode(interface{}) error

Change the parameter in User.UnmarshalYAML to a *yaml.Node and all unmarshal(x) function calls to value.Decode(x). This now works with gopkg.in/yaml.v3.

 func (u *User) UnmarshalYAML(
-  unmarshal func(interface{}) error,
+  value *yaml.Node,
 ) error {
   var name string
-  if err := unmarshal(&name); err == nil {
+  if err := value.Decode(&name); err == nil {
     ...
   }

   ...
-  if err := unmarshal((*rawUser)(u)); err != nil {
+  if err := value.Decode((*rawUser)(u)); err != nil {
     return err
   }

7. JSON

The same method be adapted for flexible JSON.

The encoding/json package from the Go standard library includes a json.Unmarshaler interface.

type Unmarshaler interface {
  UnmarshalJSON([]byte) error
}

Although it does not supply a handy unmarshal function like yaml.v2, there is nothing stopping direct use of json.Unmarshal on the provided byte slice.

func (u *User) UnmarshalJSON(data []byte) error {
  var name string
  if err := json.Unmarshal(data, &name); err == nil {
    // {"users": ["carol"]}
    u.Name = name
    return nil
  }

  // {"users": [{"name": "alice", "role": "admin"}]}
  type rawUser User
  if err := json.Unmarshal(data, (*rawUser)(u)); err != nil {
    return err
  }

  return nil
}

7.1. Share logic with YAML

(Added on 2022-01-24.)

If a type implements both, UnmarshalYAML and UnmarshalJSON, you can share the decoding logic between them by taking advantage of the unmarshal argument of UnmarshalYAML.

func (*User) UnmarshalYAML(
  unmarshal func(interface{}) error,
) error

This argument typically comes from the YAML library, but you can also provide your own implementation.

Before we do this, note that you can restructure the UnmarshalJSON implementation above like so:

 func (u *User) UnmarshalJSON(data []byte) error {
+  unmarshal := func(target interface{}) error {
+    return json.Unmarshal(data, target)
+  }
+
   var name string
-  if err := json.Unmarshal(data, &name); err == nil {
+  if err := unmarshal(&name); err == nil {
     // {"users": ["carol"]}
     u.Name = name
     return nil
   }

   // {"users": [{"name": "alice", "role": "admin"}]}
   type rawUser User
-  if err := json.Unmarshal(data, (*rawUser)(u)); err != nil {
+  if err := unmarshal((*rawUser)(u)); err != nil {
     return err
   }

   return nil
 }

This defines an anonymous function unmarshal which decodes data into a target using json.Unmarshal. The rest of the UnmarshalJSON implementation uses that function everywhere it previously called json.Unmarshal.

The UnmarshalJSON implementation above should look familiar. Besides the anonymous function defined at the start, it’s exactly the same as UnmarshalYAML.

// UnmarshalJSON(data)                           | // UnmarshalYAML(unmarshal)
                                                 |
unmarshal := func(target interface{}) error {    |
  return json.Unmarshal(data, target)            |
}                                                |
                                                 |
var name string                                  | var name string
if err := unmarshal(&name); err == nil {         | if err := unmarshal(&name); err == nil {
  u.Name = name                                  |   u.Name = name
  return nil                                     |   return nil
}                                                | }
                                                 |
type rawUser User                                | type rawUser User
if err := unmarshal((*rawUser)(u)); err != nil { | if err := unmarshal((*rawUser)(u)); err != nil {
  return err                                     |   return err
}                                                | }
                                                 |
return nil                                       | return nil

And that’s the key insight to make this re-use possible: call UnmarshalYAML from UnmarshalJSON, passing in that anonymous function as the unmarshal argument.

func (u *User) UnmarshalJSON(data []byte) error {
  unmarshal := func(target interface{}) error {
    return json.Unmarshal(data, target)
  }
  return u.UnmarshalYAML(unmarshal)
}
💡 Tip

The above works but it merits a minor refactor to avoid questions like "Why does the JSON decoding method call the YAML decoding method?" in the future.

If you do this, move the core decoding logic into a separate unmarshalWith method that you call from UnmarshalYAML and UnmarshalJSON.

func (*User) unmarshalWith(unmarshal func(interface{}) error) error {
  var name string
  if err := unmarshal(&name); err == nil {
    u.Name = name
    return nil
  }

  type rawUser User
  if err := unmarshal((*rawUser)(u)); err != nil {
    return err
  }

  return nil
}

func (u *User) UnmarshalJSON(data []byte) error {
  return u.unmarshalWith(func(target interface{}) error {
    return json.Unmarshal(data, target)
  })
}

func (*User) UnmarshalYAML(unmarshal func(interface{}) error) error {
  return u.unmarshalWith(unmarshal)
}

8. Conclusion

Interfaces like yaml.Unmarshaler, its v3 variant, and json.Unmarshaler facilitate evolution of YAML and JSON shapes in a backwards-compatible way.

This pattern likely applies to libraries for other encoding formats too.

Edit(2022-01-24): Added Share logic with YAML.

Written on 2021-02-24. Last modified on 2022-01-24.