Paul Smith

"Writing an Interpreter in Go" in Zig, part 5

This post is part of the "Writing an Interpreter in Go" in Zig series.

Introducing the AST, polymorphism in Zig, and starting the parser. This is chapter 2, sections 2.1 through 2.5.

AST and fat pointer polymorphism in Zig

Since Zig is barely more than C (in a good way), the language doesn’t have interfaces like Go, traits like Rust, or protocols like Swift: a mechanism for runtime dynamic dispatch as well as compile-time membership (i.e., I want to express a collection of items of possibly different underlying concrete types that all implement a common set of methods).

With Zig’s comptime and builtin functions, we can DIY whatever system we like. The current vogue in Zig is a “fat pointer”, where a structure represents the interface type, and it has two fields: a pointer to the object implementing the interface, and a pointer to the function implementation. This struct can just be copied around as a value, since all we can about are those pointers.

For an AST, we’re going to have a lot of struct types that represent the various syntactic components of the language. These types are generally just simple containers without much logic, a way to put names around pieces of source tokens and carry around the tree structure built-up during parsing.

It’s handy that this zoo of struct types each implement a common interface, so that we can say, these are the set of AST objects, and we can combine and compose them in as needed, treating them as opaque nodes in the tree. For example, an if-statement will have a consequent block, which is a sequence of statement nodes. We don’t care how nestedly complex those statement nodes are, they may be arbitrarily deep.

The core fat pointer pattern is something like:

const Node = struct {
    ptr: *anyopaque,
    fooFn: *const fn(*anyopaque) void,

    fn init(pointer: anytype): Node {
        // more to come ...
    }

    inline fn foo(node: Node) void {
        node.fooFn(node.ptr);
    }
};

const A = struct {
    fn foo(a: *A): void {
        std.debug.print("A\n", .{});
    }
};

const B = struct {
    fn foo(b: *B): void {
        std.debug.print("B\n", .{});
    }
};

test {
    var a: A = undefined;
    var b: B = undefined;
    const nodes = [_]Node{
        Node.init(&a),
        Node.init(&b),
    };
    for (nodes) |node| {
        node.foo();
    }
    // prints "A\n"
    // prints "B\n"
}

Types A and B both implement a foo function that has, other than the pointer receiver first argument, an identical signature.

The Node type wraps a pointer of our implementing types, via the init() functions. From there, we can call foo() directly, or whatever the interface function is.

The trick to making this work is this bit of code in Node.init() that I elided:

// ...
    fn init(pointer: anytype) Node {
        const Ptr = @TypeOf(pointer);
        assert(@typeInfo(Ptr) == .Pointer);
        assert(@typeInfo(Ptr).Pointer.size == .One);
        assert(@typeInfo(@typeInfo(Ptr).Pointer.child) == .Struct);

        const gen = struct {
            fn foo(ptr: *anyopaque) void {
                const self: Ptr = @ptrCast(@alignCast(ptr));
                @call(.always_inline, @typeInfo(Ptr).Pointer.child.foo, .{self});
            }
        };

        return .{
            .ptr = pointer,
            .fooFn = gen.foo,
        };
    }
// ...

First we use some comptime asserts that the passed-in object to be wrapped is actually a pointer to a struct type.

Then we comptime-generate a wrapper function that casts the pointer to the original underlying concrete type and calls the actual implementation function, with the pointer as the receiver/first argument.

Finally, we return the Node struct with the wrapped pointer and wrapped function call in place.

Getting a rough working parser going

Starting to implement the parser, I knew right away that it was time to introduce an allocator. Parsing in general, and recursive descent parsing in particular, is a dynamic process with nested data structures (trees of AST nodes) - it’s challenging, perhaps impossible to maintain a stack discipline (at least with my merely mortal self) with regard to lifetimes: we’re pushing and popping stack frames to parse nested constructs, accumulating the results into arrays and setting nodes as pointers to other parts of the tree, etc. It’s almost inherently a heap-based memory environment.

Incidentally - I was thinking today that Zig’s std.ArrayList is sort of like Go’s chan: it’s a useful construct mainly for internal plumbing, and you generally don’t want them showing up in your public APIs.

At any rate, I decided to pass in an allocator as part of creating a new parser, and for it to allocate internal AST nodes as needed.

// src/Parser.zig
// ...
allocator: Allocator,
lexer: *Lexer,
cur_token: Token = undefined,
peek_token: Token = undefined,

const Self = @This();

pub fn init(allocator: Allocator, lexer: *Lexer) Self {
    var parser = Self{ .allocator = allocator, .lexer = lexer };
    (&parser).nextToken();
    (&parser).nextToken();
    return parser;
}
// ...

The main public API of the parser is parseProgram(), which returns an object representing the fully-parsed source text of the program provided. We’re going to put the burden on the caller to free that result. That means we need to supply a function the caller can use for that purpose, and have it walk the tree and clean up any internal allocations it made down deep in the AST.

I wanted to keep the AST node types themselves clear of any memory-related bookkeeping, so that means keeping track in the parser. Since allocations can be spread out over any recursive-descent function, but there is only one freeing function at the top, we’ll have to rely on the std.testing.allocator’s leak detection and writing good tests that cover any path that allocate, to make sure we’re cleaning up after ourselves properly.

Here is the recursive-descent code for parsing a program AST node. We have a limited number of forms to detect and parse at this stage, early in development.

pub fn parseProgram(self: *Self) !*ast.Program {
    var program = try self.allocator.create(ast.Program);
    var statements = std.ArrayList(ast.Statement).init(self.allocator);
    errdefer statements.deinit();
    while (self.cur_token.token_type != .eof) {
        const stmt = try self.parseStatement();
        if (stmt) |s| {
            try statements.append(s);
        }
        self.nextToken();
    }
    program.statements = try statements.toOwnedSlice();
    return program;
}

fn parseStatement(self: *Self) !?ast.Statement {
    switch (self.cur_token.token_type) {
        .let => {
            return .{ .let = try self.parseLetStatement() orelse return null };
        },
        else => {
            return null;
        },
    }
}

fn parseLetStatement(self: *Self) !?*ast.LetStatement {
    var stmt = try self.allocator.create(ast.LetStatement);
    errdefer self.allocator.destroy(stmt);
    stmt.token = self.cur_token;

    if (!self.expectPeek(.ident)) {
        return null;
    }

    stmt.name = try self.allocator.create(ast.Identifier);
    errdefer self.allocator.destroy(stmt.name);
    stmt.name.* = .{ .token = self.cur_token, .value = self.cur_token.literal };

    if (!self.expectPeek(.assign)) {
        return null;
    }

    while (!self.curTokenIs(.semicolon)) {
        self.nextToken();
    }

    return stmt;
}

Looks very similar to the Go code.

Now here’s the cleanup function, which frees all memory associated with parsing a program.

// ...
pub fn deinit(self: *Self, program: *ast.Program) void {
    for (program.statements) |stmt| {
        switch (stmt) {
            .let => |s| {
                self.allocator.destroy(s.name);
                self.allocator.destroy(s);
            },
            .program => unreachable,
        }
    }
    self.allocator.free(program.statements);
    self.allocator.destroy(program);
}
// ...

test {
    var lexer = Lexer.init( /* ... * /);
    var parser = init(std.testing.allocator, &lexer);
    const program = try (&parser).parseProgram();
    defer parser.deinit(program);
}

The challenge will be keeping that function up to date as we extend the parser to parse more of the language. I should probably use an arena allocator instead and not bother with ensuring I free everywhere I need to.


This series