Tutorial @ PyCon USA 2025. May 15, 2025. Pittsburgh, PA, USA.
The source code repository for this tutorial is available on GitHub at: |
1. Introduction
In this tutorial, you will learn the fundamental principles of creating a language processor by hand, focusing specifically on recursive descent parsers. A recursive descent parser is a program used for analyzing the structure of some input language through a series of functions that represent its grammatical rules.
Knowing how to write a parser from scratch is incredibly useful for anyone interested in understanding how programming languages work under the hood. It also provides valuable insights into compiler design, interpreters, and any form of automatic text analysis. By mastering this skill, you can create your own mini-languages, customize existing ones, or simply satisfy your curiosity about the inner workings of language processors.
Throughout this tutorial, you will:
-
Gain a clear understanding of what a recursive descent parser is and how it operates.
-
Learn the step-by-step process of building in Python your own language processor from the ground up.
-
Improve your problem-solving skills by designing and implementing parsing algorithms.
-
Explore practical examples and hands-on exercises to solidify your understanding.
By the end of the tutorial, you will have the knowledge and confidence to craft your own language processor and appreciate the elegance and power of recursive descent parsing.
2. Core Concepts
2.1. Tokenizer
A tokenizer is a process that takes raw input (like text or code) and breaks it down into a stream of basic, meaningful units called tokens (e.g., keywords, identifiers, symbols). These tokens then serve as the input for a parser.
2.2. Parser
A parser is a computer program that takes a stream of tokens and analyzes its arrangement based on a defined set of rules known as a grammar. It checks if the input follows these rules and, if it does, often converts it into a more organized form, such as a tree, that illustrates the input’s structure and meaning. Essentially, a parser is a program that understands the rules of a language or data format and can analyze input to determine its structure and meaning.
2.3. Grammar
A grammar is a set of rules that defines how to combine basic symbols to create valid structures in a language. This could be how words form sentences in a natural language, or how keywords and symbols form code in a programming language.
2.4. Terminal Symbol
Terminal symbols are the fundamental, indivisible units of the language. These are what we previously referred to as tokens. They are the actual things you see or use, like individual words in a natural language (“the”, “cat”, “runs”) or specific characters and keywords in a programming language (if
, while
, x
, 10
). You can’t break them down further using the grammar’s rules.
2.5. Nonterminal Symbol
Nonterminal symbols are labels or names that represent larger structural concepts or categories within the grammar. In a natural language, these might be “sentence”, “noun-phrase”, or “verb-phrase”. In programming, they could be “statement”, “expression”, or “function-call”. Nonterminals are defined by the grammar’s rules in terms of other terminals and nonterminals.
2.6. Production Rule
A production rule is a single rule within a grammar. It specifies how a nonterminal symbol can be formed by other terminal or nonterminal symbols. For example, a rule in a simple natural language grammar might be “a noun-phrase can be an article followed by a noun”.
Unless explicitly specified otherwise, the initial production rule presented in a grammar functions as its entry point. |
2.7. EBNF
EBNF stands for Extended Backus-Naur form. It’s a language used to define languages. More specifically, it’s a notation that allows a specific way of writing down grammar rules, providing a clear and organized method to describe how terminal and nonterminal symbols can be combined. It uses special symbols to indicate things like repetition or optional parts, making the grammar rules of a formal language easier to read and understand.
The production “a noun-phrase can be an article followed by a noun” could be expressed in EBNF as:
noun-phrase = article , noun ;
Similarly, the production “a noun can be one of the words man, ball, woman, or table” could be expressed as:
noun = "man" | "ball" | "woman" | "table" ;
In these examples, the nonterminals are noun-phrase
, article
, and noun
, and the terminals are "man"
, "ball"
, "woman"
, and "table"
.
The specifics of EBNF will be discussed in the section on EBNF Syntax.
2.8. Exercise A ★
Consider the following EBNF grammar:
sentence = noun-phrase , verb-phrase , ? end of input ? ;
noun-phrase = article , noun ;
verb-phrase = verb , noun-phrase ;
article = "the" | "a" ;
noun = "man" | "ball" | "woman" | "table" ;
verb = "hit" | "took" | "saw" | "liked" ;
Answer the following questions:
-
What are all the terminal symbols?
-
What are all the nonterminal symbols?
-
How many productions are there in this grammar?
3. EBNF Syntax
3.1. Syntax Summary
The precise syntax for EBNF is formally specified by the international standard ISO/IEC 14977:1996. This tutorial utilizes a subset of this standard, summarized in the following table.
EBNF Element | Syntax | Description | Notes for Mapping to Python Code |
---|---|---|---|
Terminal |
"a" |
An indivisible unit of the language, a.k.a token. |
Call the |
Nonterminal |
a |
A name representing a category defined by a production rule. |
Call the parsing function that is specifically created to handle the production rule of the nonterminal. |
Definition |
= |
Separates the left-hand side (the nonterminal symbol being defined) from the right-hand side (the definition of that nonterminal in terms of terminals and other nonterminals) of a production rule. |
Define a parsing function whose body contains the logic to parse the sequence of terminals and nonterminals according to the rule’s structure. |
Termination |
; |
Ends a production rule. |
N/A |
Concatenation |
a , b |
Element |
Sequentially call the |
Alternative |
a | b |
Either element |
Provide conditional logic (using |
Grouping |
( a ) |
Groups elements to override precedence. |
Ensure the code’s logic is executed according to the desired order. |
Optional |
[ a ] |
Zero or one occurrence of |
Use an |
Repetition |
{ a } |
Zero or more repetitions of |
Use a |
Special Sequence |
? a ? |
User-defined construct that has an implementation-specific meaning. Some examples:
|
Handle these special tokens using the |
Comment |
(* ... *) |
Allows including human-readble remarks or notes to the grammar. |
N/A |
3.2. Operator Precedence
The EBNF operator precedence rules are as follows (strongest to weakest):
-
Optional (
[…]
), Repetition ({…}
) -
Concatenation (
,
) -
Alternative (
|
)
As mentioned before, you can use parentheses to override these precedence rules.
4. Steps to Build a Recursive Descent Parser
Recursive descent parsing is a top-down technique where each nonterminal in the grammar corresponds to a function designed to match the input according to that nonterminal’s production rules. The “recursive” aspect arises from these functions calling each other (or themselves) to traverse the grammar’s structure during the matching process.
To begin building our first recursive descent parser, we will follow the steps outlined in this section. We’ll start by creating an empty Python file named english_subset.py
, which will serve as the container for all the necessary Python code for our parser implementation.
4.1. Defining the Initial Grammar
The most important step is to have a well-defined grammar for the language the parser will recognize.
Before we can effectively implement a recursive descent parser, it’s crucial to prepare the grammar by addressing two potential issues:
For all the grammars used in this tutorial, explicit left recursion elimination or left factoring will not be necessary. |
For our main example, we will be using the grammar from Exercise A, which is reproduced below for your convenience.
sentence = noun-phrase , verb-phrase , ? end of input ? ;
noun-phrase = article , noun ;
verb-phrase = verb , noun-phrase ;
article = "the" | "a" ;
noun = "man" | "ball" | "woman" | "table" ;
verb = "hit" | "took" | "saw" | "liked" ;
For clarity and reference, it’s helpful to place a copy of this grammar as comments in the source file.
4.2. Create the ParserError
class
This class will be used to signal errors during the parsing process, indicating that the input does not conform to the expected grammar or contains some sort of logical inconsistencies.
class ParserError(Exception): (1)
def __init__(self, message): (2)
super().__init__(f'PARSER ERROR DETECTED:\n{message}') (3)
1 | Defines a custom exception class named ParserError that inherits from the base Exception class. |
2 | Defines the instance initializer method for the ParserError class, taking an error message as input. |
3 | Calls the initializer of the parent Exception class with a formatted error message. |
4.3. Create the Parser
class
This class will contain the methods for parsing each nonterminal in the grammar and will manage the parsing state (like the current position in the input).
class Parser:
EOI = '? end of input ?' (1)
1 | Defines a class constant EOI representing the end of input marker. |
4.4. Implement basic tokenization functionality
An instance of the Parser
class is in charge of tokenizing the input source into words, which it then uses to step through and verify the input against expected patterns, signaling errors when a mismatch occurs.
The tokenization process relies on the coordinated execution of the following three methods:
-
__init__
: Initializes the parser with the input source and gets the first token.# NOTE: This is part of the Parser class. def __init__(self, source): self._tokens = iter(source.split() + [self.EOI]) (1) self.advance() (2)
1 Splits the source
into a list of words, appends theEOI
marker at the end, and creates an iterator[1].2 Initializes the parser by advancing to the first token. -
advance
: Advances the parser to the next token from the input.# NOTE: This is part of the Parser class. def advance(self): try: (1) self._current = next(self._tokens) (2) except StopIteration: (3) self._current = None (4)
1 Begins a try
block to handle potential end of input.2 Retrieves the next token from the iterator and stores it in _current
.3 Catches the exception raised when the iterator is exhausted. 4 Sets _current
toNone
to indicate the end of input. -
expect
: Checks if the current token matches an expected value and advances.# NOTE: This is part of the Parser class. def expect(self, expected_tokens): (1) if self._current not in expected_tokens: (2) token_found_str = f'"{self._current}"' (3) expected_tokens_str = \ ", ".join([f'"{token}"' for token in expected_tokens]) (4) message = (f'Found the token: {token_found_str}\n' (5) f'But was expecting one of: {expected_tokens_str}') raise ParserError(message) (6) original_current = self._current (7) self.advance() (8) return original_current (9)
1 Takes a parameter which is a list containing the expected tokens. 2 Checks if the current token is not one of the expected tokens. 3 Creates a string representation of the found token, enclosed in double quotes. 4 Creates a single string containing the expected tokens separated by commas and enclosed in double quotes. 5 Constructs a multiline error message. 6 Raises a ParserError
exception with the constructed error message.7 Stores the current token before advancing. 8 Moves to the next token in the input stream. 9 Returns the matched (and consumed) token. While not used in the current example, this value can be useful in more advanced parsing tasks.
4.5. Create and implement parsing functions for each nonterminal
For each nonterminal in the grammar (sentence
, noun-phrase
, verb-phrase
, article
, noun
, and verb
), create a corresponding instance method with that same name in the Parser
class. Implement the logic to match the sequence of tokens according to the production rules of the corresponding nonterminal. This involves calling other parsing functions for nonterminals in the rule, or using the expect()
method for terminal values.
Example for the nonterminal sentence
:
# NOTE: This is part of the Parser class.
# sentence = noun-phrase , verb-phrase , ? end of input ? ;
def sentence(self):
self.noun_phrase()
self.verb_phrase()
self.expect([self.EOI])
Always call the expect() method with a list as argument, even if it contains only one token, such as the end of input marker (EOI) here.
|
Example for the nonterminal article
:
# NOTE: This is part of the Parser class.
# article = "the" | "a" ;
def article(self):
self.expect(["the", "a"])
4.6. Exercise B ★
Finish implementing the remaning parsing functions: noun-phrase
, verb-phrase
, noun
, and verb
.
4.7. Execute the Parser
Instantiate the Parser
class with the input string and initiate parsing by calling the grammar’s root nonterminal function (sentence()
in our example). Use a try
/except
block to catch ParserError
exceptions, which are raised by the expect()
method when the input doesn’t conform to the grammar.
All this can be done by adding the following code at the end of the english_subset.py
file:
if __name__ == '__main__':
parser_example = Parser('the man hit the ball')
try:
parser_example.sentence()
print('Syntax OK!')
except ParserError as e:
print(e)
4.8. Exercise C ★
-
Verify the parser’s correct operation by testing it with another valid string.
-
To understand its response to errors, test the parser with an invalid input.
4.9. Exercise D ★★
The following grammar is an extension of the grammar from Exercise A:
sentence = noun-phrase , verb-phrase , ? end of input ? ;
noun-phrase = article , { adjective } , noun ; (* UPDATED *)
verb-phrase = [ adverb ] , verb , [ noun_phrase ] ; (* UPDATED *)
article = "the" | "a" ;
noun = "man" | "ball" | "woman" | "table" ;
verb = "hit" | "took" | "saw" | "liked" ;
adjective = "digital" | "virtual" | "cyber" | "pixelated" ; (* NEW *)
adverb = "algorithmically" | "securely" | "wirelessly" | "recursively" ; (* NEW *)
One of the main differences from the original grammar is that this new version incorporates the optional ([…]
) and repetition ({…}
) operators. Modify the parser built so far in order to incorporate all these new extensions. Check the notes on the EBNF Syntax Summary table to see how to map these operators into Python code.
Ensure the parser correctly recognizes the following valid strings
-
a woman saw
-
the virtual digital man algorithmically hit a pixelated ball
However, the parser should reject these invalid strings:
-
the cyber table wirelessly
-
every woman recursively saw a digital ball
5. Writing an Expression Evaluator
Our goal in this section is to implement a simple numerical expression evaluator capable of evaluating expressions like:
(14 + 8) + x * 7
Assuming that variable x
equals 11, this expression evaluates to 99.
5.1. Operator Precedence
When evaluating mathematical and programming expressions, the concept of operator precedence is fundamental. Precedence defines the priority of different operators, determining which operations are performed before others. For instance, multiplication generally has higher precedence than addition. Without clear rules of precedence (or the use of parentheses to explicitly define order), the evaluation of expressions with multiple operators would be ambiguous and could lead to incorrect results. Understanding and correctly implementing operator precedence is therefore essential for building a functional expression evaluator.
5.2. The Initial Grammar
We will use the following grammar to define the expressions it can handle:
start = expression , ? end of input ? ;
expression = additive ;
additive = multiplicative , { "+" , multiplicative } ;
multiplicative = primary , { "*" , primary } ;
primary = ? integer ? | ? variable ? | "(" , expression , ")"
The terminal symbols ? integer ?
and ? variable ?
in this grammar are defined as follows:
-
? integer ?
: Represents a sequence of one or more digits (0 - 9). Examples include0
,42
, and1729
. -
? variable ?
: Represents a sequence of one or more letters from the English alphabet (a - z, A - Z). Examples includex
,spam
, andRamanujan
.
The above grammar establishes operator precedence through its hierarchical structure. Operations defined lower in the grammar have higher precedence because the parser will attempt to reduce those parts of the expression first. For example, primary
is the lowest level and contains the most basic units (integers, variables, and parenthesized expressions). Next, multiplicative
operates on primary
using the *
operator, giving multiplication the next level of precedence. Finally, additive
operates on multiplicative
using the +
operator, resulting in the lowest precedence among the defined operators. Parenthesized expressions within primary
can override this default precedence.
5.3. The Code
The following code, which largely incorporates what we’ve already written, now utilizes an environment — a dictionary holding defined variables and their values — to enable variable lookups during parsing. The changes from the original program are highlighted below.
# start = expression , ? end of input ? ;
# expression = additive ;
# additive = multiplicative , { "+" , multiplicative } ;
# multiplicative = primary , { "*" , primary } ;
# primary = ? integer ? | ? variable ? | "(" , expression , ")"
import re (1)
class ParserError(Exception):
def __init__(self, message):
super().__init__(f'PARSER ERROR DETECTED:\n{message}')
class ExpressionParser:
EOI = '? end of input ?'
def __init__(self, source, environment): (2)
self._environment = environment (3)
self._tokens = iter(re.findall(r'[a-z]+|\d+|\S', source, re.I) (4)
+ [self.EOI])
self.advance()
def advance(self):
try:
self._current = next(self._tokens)
except StopIteration:
self._current = None
def expect(self, expected_tokens):
if self._current not in expected_tokens:
token_found_str = f'"{self._current}"'
expected_tokens_str = \
", ".join([f'"{token}"' for token in expected_tokens])
message = (f'Found the token: {token_found_str}\n'
f'But was expecting one of: {expected_tokens_str}')
raise ParserError(message)
original_current = self._current
self.advance()
return original_current
# The code from this point forward is the implementation of the
# grammar we are defining.
def start(self): (5)
result = self.expression()
self.expect([self.EOI])
return result
def expression(self):
return self.additive()
def additive(self):
result = self.multiplicative()
while self._current == '+':
self.advance()
result += self.multiplicative()
return result
def multiplicative(self):
result = self.primary()
while self._current == '*':
self.advance()
result *= self.primary()
return result
def primary(self):
current = self._current
if current.isdigit(): (6)
result = int(self._current) (7)
self.advance()
return result (8)
if current.isalpha(): (9)
if current not in self._environment: (10)
raise ParserError( (11)
f'Variable "{current}" not found in '
f'environment {self._environment}')
self.advance()
return self._environment[current] (12)
if current == '(': (13)
self.advance()
result = self.expression()
self.expect([')'])
return result
token_found_str = f'"{current}"' (14)
message = (f'Found the token: {token_found_str}\n'
f'But was expecting an integer, variable or "("')
raise ParserError(message)
if __name__ == '__main__': (15)
expression = '(14 + 8) + x * 7'
environment = {'x': 11}
evaluator = ExpressionParser(expression, environment)
try:
result = evaluator.start()
print(f'{expression} = {result}')
except ParserError as e:
print(e)
1 | Imports the re module, which provides support for regular expressions[2]. |
||
2 | Defines the initializer for the ExpressionParser class taking the input source string and a dictionary containing variable names and their values. |
||
3 | Initializes the parser’s environment with the provided dictionary of variables. | ||
4 | Creates an iterator over the tokens found in the source string using a regular expression.
The
|
||
5 | Defines the grammar’s starting rule. | ||
6 | Checks if the current token is a sequence of digits, indicating an integer. | ||
7 | Converts the digit token to an integer. | ||
8 | Returns the resulting integer value. | ||
9 | Identifies if the current token is alphabetic, treating it as a variable name. | ||
10 | Determines if this variable name is not found in the parser’s environment, implying it’s undefined. | ||
11 | If the variable is undefined, raises a ParserError exception. |
||
12 | Returns the value associated with the current variable name from the environment. | ||
13 | If the current token is an opening parenthesis, the parser advances, recursively evaluates the inner expression, expects a closing parenthesis, and returns the result of the inner expression. | ||
14 | If the current token doesn’t match an integer, a variable, or an opening parenthesis, the code proceeds to format an error message and then raises a ParserError to indicate a syntax error. |
||
15 | Sets up an expression with a simple environment, creates an ExpressionParser , attempts to evaluate the expression and print the result, and catches any ParserError exceptions to print the error message. |
For the evaluator to work correctly, each production function must return an integer value representing the result of the parsed expression. |
5.4. Exercise E ★★
Modify the ExpressionParser
class to implement the following grammar, which includes support for subtraction (-
), division (/
), and remainder (%
) operations.
start = expression , ? end of input ? ;
expression = additive ;
additive = multiplicative , { ("+" | "-") , multiplicative } ; (* UPDATED *)
multiplicative = primary , { ("*" | "/" | "%" ) , primary } ; (* UPDATED *)
primary = ? integer ? | ? variable ? | "(" , expression , ")"
In you Python code, implement division using floor division (// ) operator in order to get integer results.
|
Verify the parser’s expected behavior by testing it with several valid and invalid expressions. For example, try it with this expression:
11 * 2 / (15 - 10) + 20 % 3
The result should be 6.
5.5. Exercise F ★
The following mathematical problem has been at the center of significant online controversy for several years:
Address the following questions:
-
What’s the correct solution for this problem?
-
How does our parser behave when processing it?
5.6. Exercise G ★★
Extend the ExpressionParser
class to support the following grammar, which incorporates the exponentiation (^
) operator.
start = expression , ? end of input ? ;
expression = additive ;
additive = multiplicative , { ("+" | "-") , multiplicative } ;
multiplicative = power , { ("*" | "/" | "%" ) , power } ; (* UPDATED *)
power = primary , [ "^" , power ] ; (* NEW *)
primary = ? integer ? | ? variable ? | "(" , expression , ")"
Perform exponentiation in your Python code using the power operator (** ) to obtain the desired results.
|
Ensure the parser behaves as expected by testing it with a variety of valid and invalid expressions. For example, try it with this expression:
11 * t + 2 ^ t ^ 2 - t + 1
If t
is 3, the result is 543.
5.7. Exercise H ★★
The following expression grammar has been updated to include the unary negation operator. All operators defined before this were binary operators, meaning they operate on two values. In contrast, the new unary negation operator works with just one value.
start = expression , ? end of input ? ;
expression = additive ;
additive = multiplicative , { ("+" | "-") , multiplicative } ;
multiplicative = negation , { ("*" | "/" | "%" ) , negation } ; (* UPDATED *)
negation = "-" , negation | power ; (* NEW *)
power = primary , [ "^" , power ] ;
primary = ? integer ? | ? variable ? | "(" , expression , ")" ;
Perform the following:
-
Update the
ExpressionParser
class to implement this new grammar. -
Determine the precedence of the unary negation operator in relation to the other defined operators.
-
Validate the parser’s correct behavior by testing it with various expressions. For example, if you try:
--(-p + -p) + - 2 ^ 4
where
p
is 2, the result is -20.
5.8. Operator Associativity
As previously discussed, operator precedence governs the order of evaluation, with multiplicative operators taking precedence over additive operators. When operators share the same precedence level, associativity defines the direction of evaluation:
-
Left-associative means operators are evaluated from left to right.
For instance,a - b - c
is interpreted as(a - b) - c
. The+
,-
,*
,/
, and%
operators in our grammar are left-associative. -
Right-associative means operators are evaluated from right to left.
For example,a ^ b ^ c
is interpreted asa ^ (b ^ c)
. In our grammar, the exponentiation and unary negation are right-associative operators.
Parentheses provide a mechanism to explicitly override both precedence and associativity, ensuring the desired order of operations.
In the expression grammars we’ve seen so far, left associativity for operators like +
, -
, *
, /
, and %
is achieved through the left-recursive-like structure created by the { … }
(repetition) in their respective rules. For example, the additive
production first matches a multiplicative
, and then iteratively matches subsequent +
or -
operators followed by another multiplicative
, naturally leading to left-to-right evaluation.
On the other hand, right associativity for the exponentiation (^
) and unary negation (-
) operators is achieved by the recursive structure on the right side in their corresponding rules. This structure forces the parser to group these operations from right to left.
5.9. Exercise I ★★★
This is a challenging exercise that will test your understanding of parsing techniques, operator precedence and associativity, logical expressions, truth table generation, output formatting, and function implementation in Python.
If you solve it correctly, I hereby grant you bragging rights during the whole PyCon US 2025 conference 😁 |
5.9.1. Title
Truth Table Generator Function for a Logical Expression Language
5.9.2. Objective
Write a Python function that evaluates logical expressions involving boolean variables and prints their truth tables.
5.9.3. Function Specification
You need to implement a Python function that adheres to the following specification:
def generate_truth_table(expression):
"""
Evaluates a logical expression string and prints its truth table.
Parameter:
expression: A string containing the logical expression to
evaluate. For example: "x AND (y OR NOT z) XOR x"
"""
# Your implementation here
pass
The function should support a logical expression language with the following features:
-
Boolean Variables: Variables represented by single lowercase letters (e.g.,
x
,y
,a
,b
). Each variable can take one of two boolean values:1
(true) or0
(false). -
Logical Operators: The following logical operators must be supported:
-
AND
: (Logical AND) Is true if and only if both of its operands are true. -
OR
: (Logical OR) Is true if and only if at least one of its operands is true. -
NOT
: (Logical NOT) Is true if its single operand is false, and false if its single operand is true. -
XOR
: (Logical Exclusive OR) Is true if and only if exactly one of its operands is true. -
EQV
: (Logical Equivalence) Is true if and only if both of its operands have the same boolean value. -
IMP
: (Logical Implication) Is false only when the first operand is true and the second operand is false; otherwise, it is true. -
NAND
: (Logical NOT AND) Is true if and only if at least one of its operands is false. -
NOR
: (Logical NOT OR) Is true if and only if both of its operands are false.
-
-
Operator Precedence: The
NOT
operator possesses a higher precedence than all other operators. All remaining operators share the same precedence level. -
Operator Associativity: All operators are left-associative, except for
NOT
, which is a unary prefix operator that evaluates from right to left. -
Parentheses: Parentheses
(
and)
can be used to override the default precedence and associativity.
5.9.4. Function Requirements
The generate_truth_table
function should:
-
Input: Accept a logical expression as a string argument (
expression
). -
Variable Identification: Identify all the unique boolean variables present in the input expression.
-
Truth Table Generation and Printing:
-
Generate all possible boolean assignments for the identified variables.
-
Evaluate the input logical expression for each of these truth assignments.
-
Print a truth table to the standard output in a clear and readable format. The table should include:
-
A header row listing all the variables in alphabetical order, followed by the original expression, followed by the string “Result”.
-
Subsequent rows representing each truth assignment, with the corresponding boolean values for each variable, the original expression with variable values substituted, and the resulting boolean value of the expression.
-
-
5.9.5. Example
For the input expression:
x AND (y OR NOT z) XOR x
Calling generate_truth_table('x AND (y OR NOT z) XOR x')
should print the following to the standard output:
x y z | x AND (y OR NOT z) XOR x | Result ------|--------------------------|------- 0 0 0 | 0 AND (0 OR NOT 0) XOR 0 | 0 0 0 1 | 0 AND (0 OR NOT 1) XOR 0 | 0 0 1 0 | 0 AND (1 OR NOT 0) XOR 0 | 0 0 1 1 | 0 AND (1 OR NOT 1) XOR 0 | 0 1 0 0 | 1 AND (0 OR NOT 0) XOR 1 | 0 1 0 1 | 1 AND (0 OR NOT 1) XOR 1 | 1 1 1 0 | 1 AND (1 OR NOT 0) XOR 1 | 0 1 1 1 | 1 AND (1 OR NOT 1) XOR 1 | 0
5.9.6. Implementation Notes
-
You will need to implement a parser to understand the structure of the input expression, taking into account operator precedence and associativity. Begin by defining the grammar for this logical expression language.
-
Upon encountering a parser error, such as a grammar mismatch or variables with more than one character, the function should print a descriptive error message to the standard output indicating the issue.
-
You will need a mechanism to generate all possible boolean assignments for the identified variables.
The itertools.product() function can come in handy for this purpose. -
You will need to implement the logic for each of the boolean operators.
Good luck!
— Always look on the bright side of life.
6. Additional Resources
6.1. Books
-
Writing a Simple Recursive Descent Parser
By David Beazley
O’Reilly Media, Inc., 2024 -
Writing a C Compiler
By Nora Sandler
No Starch Press, 2024
ISBN: 978-1718500426
Relevant Chapter: 1 -
Engineering a Compiler, 2nd Edition
By Keith D. Cooper, Linda Torczon
Morgan Kaufmann, 2011
ISBN: 978-9380931876
Relevant Chapter: 3 -
Domain Specific Languages
By Martin Fowler
Addison-Wesley Professional, 2010
ISBN: 978-0321712943
Relevant Chapter: 21 -
Language Implementation Patterns
By Terence Parr
Pragmatic Bookshelf, 2009
ISBN: 978-1934356456
Relevant Chapter: 2
6.2. Online Resources
-
Building Recursive Descent Parsers: The Definitive Guide
By Supriyo Biswas
Code samples in Python -
Recursive Descent Parser
By GeeksforGeeks
Code samples in C -
Writing a Simple Recursive Descent Parser
By Jamis Buck
Code samples in Ruby
7. License and Credits
-
Copyright © 2025 by Ariel Ortiz.
-
This work is licensed under a CC BY-NC-SA 4.0.
-
Free use of the source code presented here is granted under the terms of the GPL version 3 License.
-
The author gratefully acknowledges the helpful contributions of Gemini in the writing and review process of these notes.
-
This document was prepared using the Asciidoctor text processor.
-
Icons by www.flaticon.com.