Virtual Poster @ SIGCSE 2022. March 4, 2022. Providence, RI. U.S.A.
1. Abstract
For many students, code generation is the most demanding topic covered in a typical undergraduate Compiler Design course. In this “poster” the author presents how he has successfully used the fairly novel and promising WebAssembly technology in his course in order to make the said topic more relevant and engaging for learners. Once we have a compiler frontend that produces an abstract syntax tree (AST) and a symbol table for a given source program, the code generation phase involves traversing the AST and emitting the corresponding WebAssembly instructions in a generally straightforward fashion. The code generation phase is significantly simplified given that WebAssembly is actually an intermediate code for a stack-based virtual machine. The real machine code is produced later by the WebAssembly runtime’s just-in-time (JIT) compiler. During execution, the generated code may call functions written in a high-level language directly supported by the runtime. This allows having basic I/O capabilities, like printing to the screen or reading input from the keyboard. At the end of the semester, students have authored the frontend and backend of a fully working compiler that translates a simple C-like procedural language into platform-neutral executable WebAssembly code.
2. Introduction
WebAssembly [Mozilla] (Wasm for short) is a binary code format specification released in 2017. This technology can be implemented in web browsers or standalone applications in a secure, open, portable, and efficient fashion [Eberhardt]. Wasm was primarily designed as a compilation target, so using it for code generation in a Compiler Design course is quite suitable and makes compiler projects more manageable for students.
Code generation is a topic that has been extensively covered in compiler construction books in the past [Aho][Cooper], although most of these texts don’t emphasize on how to produce code for a new generation of stack-based virtual machine instruction sets such as those for the Java Virtual Machine (JVM) [Lindholm], the Common Language Infrastructure (CLI) [ECMA], the Yet Another RubyVM (YARV) [Sasada] or WebAssembly.
3. General Overview
In the first part of the course, students write the compiler frontend which creates the AST and a symbol table starting from an input source program. They do this by handcrafting in the C# programming language a recursive descent parser [Aho]. Afterwards, they apply the Visitor design pattern_ [Gamma] to write the compiler backend in order to traverse the AST and emit the corresponding Wasm text format (WAT) [Mozilla] instructions.
The generated textual code is translated into Wasm binary code and executed using the Wasmer Python package [Wasmer], a Wasm standalone runtime. Wasmer also allows us to write a simple runtime library using Python in order to provide basic I/O and memory management facilities. Although Wasm only has direct support for integers and floats, provision for strings and arrays can be achieved by using a resource handle mechanism that is capable of interacting with the runtime system.
Anecdotal evidence shows that this approach to code generation has been well received by students. At the end, their projects are able to compile to Wasm a variety of interesting programs, including: converting to binary numbers, computing factorials, checking for palindromes, and sorting arrays.
4. A Simple Example
4.1. Installing Wasmer Python
The Wasmer Python package brings the required API to execute WebAssembly modules from within a Python runtime system. In a nutshell, wasmer compiles the WebAssembly module into compiled code, and then executes it. Wasmer is designed to work in various environments and platforms: From nano single-board computers to large and powerful servers.
You’ll need Python version 3.7 or newer to use the wasmer
package. To check that you have the correct version of Python, at the terminal type:
python -V
NOTE: You might need to use the command python3
instead.
The output should be something like this:
Python 3.10.2
Go to the Python website if you don’t have Python installed or if it’s older than 3.7.
To install Wasmer, type at the terminal the following two commands:
pip install wasmer wasmer_compiler_cranelift
NOTE: You might need to use the command pip3
instead. Also, you might need to prepend sudo
to run the command with admin privileges.
4.2. The Code Generator
The following Python source code is a simple implementation of an AST. It constructs an arithmetic expression and then traverses the AST in order to produce the WAT code generation which is then converted into the corresponding Wasm binary code. This code is not using the Visitor pattern pattern as mentioned before. Instead, for the sake of simplicity, it uses the Composite design pattern [Gamma].
arithmetic_code_generator.py
from __future__ import annotations
from wasmer import wat2wasm # type: ignore
class Node: (1)
def __init__(self, *children: Node) -> None: (2)
self.children = children
def __getitem__(self, index: int) -> Node: (3)
return self.children[index]
def gen_code(self) -> str: (4)
raise NotImplementedError()
class Program(Node): (5)
def gen_code(self) -> str: (6)
return (
'(module\n'
+ ' (import "math" "pow"\n'
+ ' (func $pow (param i32 i32) (result i32)))\n' (7)
+ ' (func\n'
+ ' (export "start")\n' (8)
+ ' (result i32)\n'
+ self[0].gen_code() (9)
+ ' )\n'
+ ')\n')
class Plus(Node): (10)
def gen_code(self) -> str:
return (
self[0].gen_code()
+ self[1].gen_code()
+ ' i32.add\n')
class Times(Node): (10)
def gen_code(self) -> str:
return (
self[0].gen_code()
+ self[1].gen_code()
+ ' i32.mul\n')
class Pow(Node): (10)
def gen_code(self) -> str:
return (
self[0].gen_code()
+ self[1].gen_code()
+ ' call $pow\n')
class Number(Node):
def __init__(self, value: int) -> None: (11)
self.value = value
def gen_code(self) -> str:
return f' i32.const {self.value}\n'
def main() -> None:
# 2 ** 3 + 4 * 5
tree = ( (12)
Program(
Plus(
Pow(
Number(2),
Number(3)
),
Times(
Number(4),
Number(5)))))
wat_code = tree.gen_code() (13)
wasm_code = wat2wasm(wat_code) (14)
with open('example.wat', 'w') as wat_file:
wat_file.write(wat_code) (15)
with open('example.wasm', 'wb') as wasm_file:
wasm_file.write(wasm_code) (16)
main()
1 | The Node class is the base class for all the other kinds of nodes. |
2 | The __init__ method is used to initialize every node. It takes as input a variable number of arguments which represent the node’s children. All derived classes, except for Number , inherit this method. |
3 | The __getitem__ method allows accesing a specific child of a node by using the corresponding zero-based index inside square bracket. |
4 | The gen_code method should be overriden by all the derived classes. In these methods is where the WAT code gets emitted. |
5 | The Program class represents the root node for an AST representing an expression. Instances of this class should only have one child. |
6 | The code generation for the Program class provides the basic WAT code template for a complete Wasm module. Note that each WAT instruction should end with a new line (\n ) character. |
7 | The Wasm module imports the $pow function which is provided by the runtime host (the execute.py program below). |
8 | The exported function start is where the module’s execution begins. |
9 | The code generated by the other nodes is placed inside the body of the start function. |
10 | Instances of the Plus , Times , and Pow classes should have exactly two children. |
11 | The __init__ method of the Number class needs to be overriden because its instances don’t have any children, just an integer stored in the value instance variable. |
12 | Create the AST. |
13 | Obtain a string with the result of the WAT code generation starting from the root of the AST. |
14 | Convert in memory the WAT code into the binary Wasm code. |
15 | Create the file example.wat with the human readable Wasm text format code. |
16 | Create the file example.wasm with the Wasm binary code. |
It’s important to note that the
|
To run the program, at the terminal type:
python arithmetic_code_generator.py
After running the program, if you open the example.wat
file in a text editor you should see the code that corresponds to the expression: 23+4×5.
(module
(import "math" "pow"
(func $pow (param i32 i32) (result i32)))
(func
(export "start")
(result i32)
i32.const 2
i32.const 3
call $pow
i32.const 4
i32.const 5
i32.mul
i32.add
)
)
4.3. Running the Wasm Code
The next Python source file demonstrate how to execute the Wasm binary code produced by the arithmetic_code_generator.py
program.
execute.py
from wasmer import engine, Module, Store, Instance # type: ignore
from wasmer import ImportObject, Function # type: ignore
from wasmer_compiler_cranelift import Compiler # type: ignore
def make_import_object(store: Store) -> ImportObject:
def pow(base: int, expo: int) -> int: (1)
return base ** expo
import_object = ImportObject() (2)
import_object.register("math", {"pow": Function(store, pow)}) (3)
return import_object
def main() -> None:
with open('example.wasm', 'rb') as input:
wasm_code = input.read() (4)
store = Store(engine.Universal(Compiler)) (5)
module = Module(store, wasm_code) (6)
instance = Instance(module, make_import_object(store)) (7)
print(instance.exports.start()) (8)
main()
1 | This is the implementation of the pow function that was imported by the Wasm module. |
2 | Create an ImportObject . All entities that are imported by a module are registered using this object. |
3 | Register the pow function. |
4 | Read into memory the example.wasm file that contains the Wasm binary code. |
5 | Create a store object. This object represents the global state that can be manipulated by Wasm programs (functions, tables, memories, and globals). A store holds the engine that is used to compile the Wasm bytes into a valid module. |
6 | Compile the Wasm module by specifying its store and the Wasm bytes. A module contains stateless Wasm code that can be instantiated multiple times. |
7 | Create a Wasm instance (a stateful, executable instance of a Wasm module) and provide the import object that contains the registered functions. |
8 | Call the exported start function and print the value returned. |
To run this program, at the terminal type:
python execute.py
The expected output is:
28
5. Recommended Resources for a Compiler Design Course
5.1. External Resources
5.2. Books
-
Colin Eberhardt. What Is WebAssembly? O’Reilly Media. 2019.
-
Brian Sletten. WebAssembly: The Definitive Guide. O’Reilly Media. 2021.
5.3. Resources Developed by the Author
-
Complete C# program (lexer, recursve descent parser, semantic analysis, and Wasm code generation) for a simple expression language that supports addition, multiplication and exponentiation:
-
Buttercup: A simple little programming language.
-
Full implementation in C# (lexer, recursve descent parser, semantic analysis, and Wasm code generation) : buttercup-0.5.tgz
-
Falak: A language for a semester long compiler project.
-
Falak API Source Code: falaklib.py
References
-
[Aho] Alfred V. Aho, Monica S. Lam, Ravi Sethi, and Jeffrey D. Ullman. 2006. Compilers: Principles, Techniques, and Tools (2nd. ed.). Addison-Wesley, Boston, MA.
-
[Cooper] Keith D. Cooper and Linda Torczon. 2011. Engineering a Compiler (2nd. ed.). Morgan Kaufmann, San Francisco, CA.
-
[Eberhardt] Colin Eberhardt. 2019. What Is WebAssembly? O’Reilly Media, Sebastopol, CA.
-
[ECMA] ECMA International. 2012. ECMA-335 Common Language Infrastructure. Retrieved February 23, 2022 from https://www.ecma-international.org/publications-and-standards/standards/ecma-335/
-
[Gamma] Erich Gamma, Richard Helm, Ralph Johnson, and John M. Vlissides. 1995. Design Patterns: Elements of Reusable Object Oriented Software. Addison-Wesley, Boston, MA.
-
[Lindholm] Tim Lindholm, Frank Yellin, Gilad Bracha, Alex Buckley, and Daniel Smith. 2021. The Java Virtual Machine Specification Java SE (17th ed.). Retrieved February 23, 2022 from https://docs.oracle.com/javase/specs/jvms/se17/html/
-
[Mozilla] Mozilla and individual contributors. 2021. WebAssembly. Retrieved February 23, 2022 from https://developer.mozilla.org/en-US/docs/WebAssembly
-
[Sasada] Koichi Sasada. 2005. YARV: Yet Another RubyVM: Innovating the Ruby Interpreter. In Companion to the 20th annual ACM SIGPLAN Conference on Object-Oriented Programming, Systems, Languages, and Applications (OOPSLA ’05). ACM Press, New York, NY, 158–159. https://doi.org/10.1145/1094855.1094912
-
[Wasmer] Wasmer, Inc. 2021. Wasmer-Python GitHub Repository. Retrieved February 23, 2022 from https://github.com/wasmerio/wasmer-python