When writing tooling that interacts with Go code, the packages in the go/*
tree in the standard library are both, invaluable and necessary. All packages
in this tree have a direct dependency on go/token
.
Let’s look at the subset of the API exported by go/token
.
type File
func (*File) AddLine(offset int)
func (*File) Offset(Pos) int
func (*File) Pos(offset int) Pos
func (*File) Position(Pos) Position
type FileSet
func NewFileSet() *FileSet
func (*FileSet) AddFile(name string, base, size int) *File
func (*FileSet) File(Pos) *File
func (*FileSet) Position(Pos) Position
type Pos int
type Position struct {
Filename string
Offset int
Line int
Column int
}
Keeping the above subset in mind, this post discusses the purpose of these APIs, how they work, and how to use them.
# What and why
A FileSet
manages state across zero or more File
s. File
s get added to the
FileSet
with the AddFile
method. Each File
knows its name, length, and
offsets within it where new lines start. A Pos
is an integer unique across
the FileSet
that indexes to a specific offset in a specific File
in that
FileSet
.
The key insight here is that File
s know offsets at which new lines start, and
Pos
maps to an offset within a File
. Given an offset, and the offsets at
which new lines start, the algorithm to calculate the line and column number of
that offset is pretty straightforward.
This makes Pos
a cheap representation of positional information which would
otherwise need a struct with 4 fields: file name, line number, column number,
and offset in file. Pos
being an integer also makes moving between offsets a
matter of basic arithmetic.
For example, imagine you have 4 files of lengths 10, 15, 5, and 16 bytes. The
following visualizes the FileSet
built from those files.
File # 1 2 3 4
+----------+---------------+-----+----------------+
| 10 bytes | 15 bytes | 5 b | 16 bytes |
+----------+---------------+-----+----------------+
Pos 1 12 28 34 51
Integers in the range [1, 51)
map to a specific file and an offset within
that file.
File # 1 2 3 4
+----------+---------------+-----+----------------+
| 10 bytes | 15 bytes | 5 b | 16 bytes |
+----------+---------------+-----+----------------+
Pos 1 ^ 12 28 34 51
|
5
Given the Pos
for offset 5, adding 3 to it moves to the Pos
for offset 8
within the same file.
The following table presents other examples of Pos
values given the above
FileSet
.
Pos | File | Offset in File |
---|---|---|
5 | 1 | 4 |
6 | 1 | 5 |
12 | 2 | 0 |
30 | 3 | 2 |
40 | 4 | 6 |
# Usage
token.NewFileSet
creates a new empty FileSet
.
fset := token.NewFileSet()
## Adding files
go/parser
handles adding files to the FileSet
when parsing Go code, but it
may sometimes be necessary to do it yourself. Add a file to the FileSet
with
AddFile(name, base, size)
where,
name
is the name of the file. It’s not required to be unique across theFileSet
.base
is the position within theFileSet
at which the range for this file starts. Set this to-1
to say that the range for the new file starts when the range for the previous file ends.size
specifies the number of bytes in the file.
file1 := fset.AddFile("file1", -1, 10) // base == 1
file2 := fset.AddFile("file2", -1, 15) // base == 12
file3 := fset.AddFile("file3", -1, 5) // base == 28
file4 := fset.AddFile("file4", -1, 16) // base == 34
fmt.Println(fset.Base()) // base == 51
Inform Files
where new lines begin using one of the following methods.
- Zero or more
file.AddLine(offset)
calls informing it of the offsets at which each new line begins.go/scanner
uses this API as it encounters newlines while tokenizing a Go file. - A single
file.SetLines([]int)
call which accepts a series of offsets of the first characters of each line. - A single
file.SetLinesForContent([]byte)
call which accepts the contents of the file.
Note that the offsets accepted by AddLine
and SetLines
are not those of the
newline character (\n
). They’re offsets of the character following that: the
first character of each new line.
## Accessing positional information
Retrieve the File
for a Pos
using the FileSet.File(Pos)
method.
file := fset.File(pos)
Convert a Pos
to an offset within its File
or the reverse with the
File.Pos(int)
and File.Offset(Pos)
methods.
off := file.Offset(pos) // offset for the Pos
pos := file.Pos(5) // Pos for offset 5
fmt.Println(file.Pos(file.Offset(pos)) == pos) // true
Extract positional information from a FileSet
with the
FileSet.Position(Pos)
method. This returns a Position
struct.
fmt.Println(fset.Position(pos)) // example.go:5:3
The same information is available from a File
with the Position(Pos)
method. This is more efficient if you already have the File
for a Pos
.
file := fset.File(pos)
fmt.Println(file.Position(pos)) // example.go:5:3
## Correlating positions in generated code
The recorded positional information alone is not useful for generated files.
Correlating it to the source files is more useful. go/token
supports this
with the File.AddLineColumnInfo(offset, filename, line, column)
method.
file.AddLineColumnInfo(5, "src.in", 15, 1)
This indicates that the contents of the file from offset 5 onwards correspond
to line 15 and column 1 of src.in
. The Position
s for these offsets will be
in src.in
, relative to line 15 and column 1.
fmt.Println(file.Position(file.Pos(5))) // src.in:15:1
fmt.Println(file.Position(file.Pos(6))) // src.in:15:2
fmt.Println(file.Position(file.Pos(7))) // src.in:16:1
go/scanner
uses this API to interpret //line
directives.
Raw positional information is still accessible with the PositionFor(Pos, bool)
method on File
and FileSet
. The second argument specifies whether to
respect positional overrides in calculating the Position
.
fmt.Println(file.PositionFor(file.Pos(5), false)) // example.go:3:2
fmt.Println(file.PositionFor(file.Pos(6), false)) // example.go:3:3
fmt.Println(file.PositionFor(file.Pos(7), false)) // example.go:4:1
# Conclusion
token.Pos
and related types use a clever technique to cheaply and efficiently
track and represent positional information in a parser. You can reuse this
technique in your own systems if you are writing a parser.