Paul Smith

"Writing an Interpreter in Go" in Zig, part 1

This post is part of the "Writing an Interpreter in Go" in Zig series.

I’m going to implement, in Zig, the interpreter for the Monkey language that is the subject of Thorsten Ball’s book. I highly encourage you to support Thorsten and go buy a copy of his book.

I’ve written some Zig before, including a template language library, but it’s been a while. I’m a little rusty. The language has also evolved since then.

Getting started: lexing

My approach is translate 1:1 from the Go in the book to Zig as closely as possible. Obviously there will be points of divergence, mostly related to memory management, but I would like the overall architecture of the interpreter to be the same at the end.

I’m letting Zig’s nature guide me on small organizational decisions, for example, using .zig source files as structs for things like the token and lexer types.

The main difference early on between the Go in the book and my Zig is the representation of token types (IDENT, ASSIGN, EOF, etc.). Zig has enums, which I’ll use for this, whereas the Go is a type derived from strings and a top-level set of consts for each token type.

I also know I’ll need to be mindful of lifetimes and ownership in my Zig code, as I would if I were writing this in C. The lifetime of the input source text is a question, and my instinct is to make a copy of it early on once it enters the boundary of the interpreter (just the lexer, to start). I can hand out slices to that memory as tokens to the parser.

Another consideration is the use of Zig’s type system to express some of the semantics of the lexer. I could return the error union for various operations. I could also return optional values in certain circumstances. I’ll feel this out a bit as I go and may change my mind.

The question I’m considering is where in the API of the lexer to introduce an allocator. This has implications for callers, of course, and introduces additional burdens on them, such ensuring to call methods like deinit().

I think for simplicity’s sake I’ll defer that as long as I can, until it really becomes something I can’t proceed without.

The main divergence I’ve made so far related to memory is, instead of newToken() taking a char/byte, as in the Go version so far in this chapter, it takes a string (in Zig, a slice of u8 integers). I’m just using string literals, but eventually I’ll need to pass slices of the input text when we get to the token types that have literal content at runtime, like identifiers.

BTW I’m ok returning stack-allocated structs, like from newToken() and Lexer.next(), because these are going to be copies that are owned by the caller (as assignments to their stack-allocated variables, in turn). I’m also passing the lexer via the address of operator to next(). This kind of stack discipline is something I like to try to get away with as long as I can. I will need to heap-allocate things eventually, but you can manage the lifetime of your stack objects yourself fine in a lot of cases.

The code for checking an identifier for its token type and whether it’s a keyword is almost the same amount of code in Zig as it is in Go. The Zig version is slightly more verbose, due to the fact that maps are part of the stdlib and not part of the language like they are in Go. But the static map of keywords gets to show off a bit of Zig’s comptime. And the fact that a language very like C has hash maps in its stdlib makes it feel productive.

I could write my own little ctype.h-like helper functions like isLetter() but I chose to use the stdlib’s ascii module.

This concludes up through and including section 1.3. in the book.


This series