1. Introduction
This document contains the complete technical specification of the Falak programming language.
Falak is a giant serpent from Arabian mythology. It appears in the “One Thousand and One Nights”. It resides below Bahamut, the giant fish, which carries along with a bull and an angel, the rest of the universe including six hells, the earths and the heavens. Falak itself resides in the seventh hell below everything else and it is said to be so powerful that only its fear of the greater power of Allah prevents it from swallowing all the creation above.
This language specification was developed for the 2021 autumn semester TC3048 Compiler Design course at the Tecnológico de Monterrey, Campus Estado de Mexico.
2. Lexicon
In the following sections, a letter is any character from the English alphabet from A
to Z
(both lowercase and uppercase). A digit is any character from 0
to 9
.
2.1. Tokens
There are five kinds of tokens: identifiers, keywords, literals, operators, and other separators. Spaces, tabulators, newlines, and comments (collectively, “white space”) are used as delimiters between tokens, but are otherwise ignored.
If the input stream has been separated into tokens up to a given character, the next token is the longest string of characters that could constitute a token.
2.2. Comments
Comments can be either single or multi-line. Single line comments start with a hash symbol (#
) and conclude at the end of the line. Multi-line comments start with a less than symbol followed by a hash symbol (<#
) and end with a hash symbol followed by a greater that symbol (#>
). Comments cannot be placed inside string literals. Multi-line comments cannot nest.
2.3. Identifiers
An identifier is composed of a letter and a sequence of zero or more letters, digits and the underscore character (_
). Uppercase and lowercase letters are considered different. Identifiers can be of any length.
An identifier token appears as an ‹id› terminal symbol in the language grammar. |
2.4. Keywords
The following twelve identifiers are reserved for use as keywords with special meanings and may not be used for any other purpose:
|
|
|
|
|
|
|
|
|
|
|
|
2.5. Literals
At a lexical level there are four kinds of literals: booleans, integers, characters, and strings.
2.5.1. Booleans
A boolean literal is either the keyword true
or the keyword false
. These are equal to 1 and 0, respectively.
A boolean literal token appears as a ‹lit-bool› terminal symbol in the language grammar. |
2.5.2. Integers
An integer literal is a sequence of one or more digits from 0 to 9. It can optionally start with a minus (−
) sign. Only decimal (base 10) numbers are supported. Valid range: −2,147,483,648 to 2,147,483,647 (\(-2^{31}\) to \(2^{31} - 1\)).
An integer literal token appears as a ‹lit-int› terminal symbol in the language grammar. |
2.5.3. Characters
A character literal is a Unicode character enclosed in single quotes, as in 'x'
. The compiler translates the specified character into its corresponding Unicode integer code point.
Character literals cannot contain the quote character ('
) or a newline character; in order to represent them, and certain other characters, the following escape sequences may be used:
Name | Escape Sequence | Code Point |
---|---|---|
Newline |
\(\backslash \texttt{n}\) |
10 |
Carriage Return |
\(\backslash \texttt{r}\) |
13 |
Tab |
\(\backslash \texttt{t}\) |
9 |
Backslash |
\(\backslash \backslash\) |
92 |
Single Quote |
\(\backslash \texttt{'}\) |
39 |
Double Quote |
\(\backslash \texttt{"}\) |
34 |
Unicode Character |
\(\backslash \texttt{u} hhhhhh\) |
\(hhhhhh\) |
The escape sequence \(\backslash \texttt{u} hhhhhh\) consists of the backslash, followed by the lower case letter “u”, followed by six hexadecimal digits (digits “0” to “9” and the upper or lower case letters “a” to “f”), which are taken to specify the code point in base 16 of the desired character.
A character literal token appears as a ‹lit-char› terminal symbol in the language grammar. |
2.5.4. Strings
A string literal is a sequence of zero or more Unicode characters delimited by double quotes, for example: "this is a string"
. String literals cannot contain newline or double-quote characters. These and other special characters can be represented using the same escape sequences available for character literals.
A string literal is stored in memory as an array (accessible through a 32-bit handle) containing zero or more int32 values. Each value is the code point of the character in the corresponding position of the given string. In other words, a string is stored using the UTF-32 encoding.
A string literal token appears as a ‹lit-str› terminal symbol in the language grammar. |
3. Syntax
The following BNF context free grammar defines the syntax of the Falak programming language. The red elements represent explicit terminal symbols (tokens).
‹program› |
\(\rightarrow\) |
‹def-list› |
‹def-list› |
\(\rightarrow\) |
‹def-list› ‹def› |
‹def-list› |
\(\rightarrow\) |
ε |
‹def› |
\(\rightarrow\) |
‹var-def› |
‹def› |
\(\rightarrow\) |
‹fun-def› |
‹var-def› |
\(\rightarrow\) |
var ‹var-list› ; |
‹var-list› |
\(\rightarrow\) |
‹id-list› |
‹id-list› |
\(\rightarrow\) |
‹id› ‹id-list-cont› |
‹id-list-cont› |
\(\rightarrow\) |
, ‹id› ‹id-list-cont› |
‹id-list-cont› |
\(\rightarrow\) |
ε |
‹fun-def› |
\(\rightarrow\) |
‹id› ( ‹param-list› ) { ‹var-def-list› ‹stmt-list› } |
‹param-list› |
\(\rightarrow\) |
‹id-list› |
‹param-list› |
\(\rightarrow\) |
ε |
‹var-def-list› |
\(\rightarrow\) |
‹var-def-list› ‹var-def› |
‹var-def-list› |
\(\rightarrow\) |
ε |
‹stmt-list› |
\(\rightarrow\) |
‹stmt-list› ‹stmt› |
‹stmt-list› |
\(\rightarrow\) |
ε |
‹stmt› |
\(\rightarrow\) |
‹stmt-assign› |
‹stmt› |
\(\rightarrow\) |
‹stmt-incr› |
‹stmt› |
\(\rightarrow\) |
‹stmt-decr› |
‹stmt› |
\(\rightarrow\) |
‹stmt-fun-call› |
‹stmt› |
\(\rightarrow\) |
‹stmt-if› |
‹stmt› |
\(\rightarrow\) |
‹stmt-while› |
‹stmt› |
\(\rightarrow\) |
‹stmt-do-while› |
‹stmt› |
\(\rightarrow\) |
‹stmt-break› |
‹stmt› |
\(\rightarrow\) |
‹stmt-return› |
‹stmt› |
\(\rightarrow\) |
‹stmt-empty› |
‹stmt-assign› |
\(\rightarrow\) |
‹id› = ‹expr› ; |
‹stmt-incr› |
\(\rightarrow\) |
inc ‹id› ; |
‹stmt-decr› |
\(\rightarrow\) |
dec ‹id› ; |
‹stmt-fun-call› |
\(\rightarrow\) |
‹fun-call› ; |
‹fun-call› |
\(\rightarrow\) |
‹id› ( ‹expr-list› ) |
‹expr-list› |
\(\rightarrow\) |
‹expr› ‹expr-list-cont› |
‹expr-list› |
\(\rightarrow\) |
ε |
‹expr-list-cont› |
\(\rightarrow\) |
, ‹expr› ‹expr-list-cont› |
‹expr-list-cont› |
\(\rightarrow\) |
ε |
‹stmt-if› |
\(\rightarrow\) |
if ( ‹expr› ) { ‹stmt-list› } ‹else-if-list› ‹else› |
‹else-if-list› |
\(\rightarrow\) |
‹else-if-list› elseif ( ‹expr› ) { ‹stmt-list› } |
‹else-if-list› |
\(\rightarrow\) |
ε |
‹else› |
\(\rightarrow\) |
else { ‹stmt-list› } |
‹else› |
\(\rightarrow\) |
ε |
‹stmt-while› |
\(\rightarrow\) |
while ( ‹expr› ) { ‹stmt-list› } |
‹stmt-do-while› |
\(\rightarrow\) |
do { ‹stmt-list› } while ( ‹expr› ) ; |
‹stmt-break› |
\(\rightarrow\) |
break ; |
‹stmt-return› |
\(\rightarrow\) |
return ‹expr› ; |
‹stmt-empty› |
\(\rightarrow\) |
; |
‹expr› |
\(\rightarrow\) |
‹expr-or› |
‹expr-or› |
\(\rightarrow\) |
‹expr-or› ‹op-or› ‹expr-and› |
‹op-or› |
\(\rightarrow\) |
|| |
‹op-or› |
\(\rightarrow\) |
^ |
‹expr-or› |
\(\rightarrow\) |
‹expr-and› |
‹expr-and› |
\(\rightarrow\) |
‹expr-and› && ‹expr-comp› |
‹expr-and› |
\(\rightarrow\) |
‹expr-comp› |
‹expr-comp› |
\(\rightarrow\) |
‹expr-comp› ‹op-comp› ‹expr-rel› |
‹expr-comp› |
\(\rightarrow\) |
‹expr-rel› |
‹op-comp› |
\(\rightarrow\) |
== |
‹op-comp› |
\(\rightarrow\) |
!= |
‹expr-rel› |
\(\rightarrow\) |
‹expr-rel› ‹op-rel› ‹expr-add› |
‹expr-rel› |
\(\rightarrow\) |
‹expr-add› |
‹op-rel› |
\(\rightarrow\) |
< |
‹op-rel› |
\(\rightarrow\) |
<= |
‹op-rel› |
\(\rightarrow\) |
> |
‹op-rel› |
\(\rightarrow\) |
>= |
‹expr-add› |
\(\rightarrow\) |
‹expr-add› ‹op-add› ‹expr-mul› |
‹expr-add› |
\(\rightarrow\) |
‹expr-mul› |
‹op-add› |
\(\rightarrow\) |
+ |
‹op-add› |
\(\rightarrow\) |
− |
‹expr-mul› |
\(\rightarrow\) |
‹expr-mul› ‹op-mul› ‹expr-unary› |
‹expr-mul› |
\(\rightarrow\) |
‹expr-unary› |
‹op-mul› |
\(\rightarrow\) |
* |
‹op-mul› |
\(\rightarrow\) |
/ |
‹op-mul› |
\(\rightarrow\) |
% |
‹expr-unary› |
\(\rightarrow\) |
‹op-unary› ‹expr-unary› |
‹expr-unary› |
\(\rightarrow\) |
‹expr-primary› |
‹op-unary› |
\(\rightarrow\) |
+ |
‹op-unary› |
\(\rightarrow\) |
− |
‹op-unary› |
\(\rightarrow\) |
! |
‹expr-primary› |
\(\rightarrow\) |
‹id› |
‹expr-primary› |
\(\rightarrow\) |
‹fun-call› |
‹expr-primary› |
\(\rightarrow\) |
‹array› |
‹expr-primary› |
\(\rightarrow\) |
‹lit› |
‹expr-primary› |
\(\rightarrow\) |
( ‹expr› ) |
‹array› |
\(\rightarrow\) |
[ ‹expr-list› ] |
‹lit› |
\(\rightarrow\) |
‹lit-bool› |
‹lit› |
\(\rightarrow\) |
‹lit-int› |
‹lit› |
\(\rightarrow\) |
‹lit-char› |
‹lit› |
\(\rightarrow\) |
‹lit-str› |
4. Semantics
4.1. Compile Time Validations
-
The language only supports a 32-bit signed two’s complement integer (int32) data type. This is the data type for every variable, parameter and function return value. This means that a Falak compiler doesn’t need to verify type consistency.
-
Every program starts its execution in a function called
main
. It is an error if the program does not contain a function with this name. -
Any variable defined outside a function is a global variable. The scope of a global variable is the body of all the functions in the program, even those defined before the variable itself.
-
Function names and global variables exist in different namespaces. This means that you can have a global variable with the same name as a function and vice versa.
-
It’s an error to define two global variables with the same name.
-
It’s an error to define two functions with the same name.
-
A function definition is visible from the body of all the functions in a program, even from itself. Thus, functions can call themselves recursively directly or indirectly.
-
In every function call the number of arguments must match the number of parameters contained in the corresponding function definition.
-
The following names are part of the initial namespace for functions and constitute Falak’s API (the number after the slash symbol (/) is the arity of the given function):
-
printi
/1 -
printc
/1 -
prints
/1 -
println
/0 -
readi
/0 -
reads
/0 -
new
/1 -
size
/1 -
add
/2 -
get
/2 -
set
/3
-
-
Each function has its own independent namespace for its local names. This means that parameter and local variable names have to be unique inside the body of each function. It’s valid to have a parameter or local variable name with the same name as a global variable. In that case the local name shadows the global variable.
-
It’s an error to refer to a variable, parameter or function not in scope in the current namespace.
-
The
break
statement can only be used inside the body of awhile
ordo
-while
statements. -
Values of integer literals should be between
-2147483648
and2147483647
(\(-2^{31}\) and \(2^{31} - 1\), respectively).
4.2. Runtime Behavior
-
A function returns zero by default, except if it executes an explicit
return
statement with some other value. -
The value returned by the
main
function must be the exit code returned by the program to the operating system. -
For the conditional and loop statements (
if
,while
, anddo
-while
) the number 0 is the only value considered false, everything else is considered true. -
The assignment, function call, conditional, loops, break, and return statements behave like their C# counterparts. The increment statement (
inc
) adds one to the provided variable, while the decrement statement (dec
) subtracts one from it. -
The Falak syntax supports string and array literals. Both of these are represented in memory as arrays accessible through API managed 32-bit handles.
-
The following are the supported operators. Precedence and associativity are established in the language grammar.
Table 1. Arithmetic operators Operator Syntax Description Unary minus
− x
Produces x negated (with its sign changed).
Unary plus
+ x
Produces x.
Multiplication
x * y
Produces x times y.
Division
x / y
Produces x divided by y truncating the result towards zero. An exception is thrown if y is zero.
Remainder
x % y
Produces the remainder of dividing x by y. An exception is thrown if y is zero.
Addition
x + y
Produces x plus y.
Subtraction
x − y
Produces x minus y.
Table 2. Logical operators Operator Syntax Description Not
! x
Evaluates x and produces 1 if its result is equal to 0. Otherwise produces 0.
And
x && y
Produces 1 if both x and y evaluate to non-zero values. Otherwise, produces 0. The operation is short circuited, this means that if x evaluates to 0 then y is not evaluated.
Or
x || y
Produces 1 if either x or y evaluate to a non-zero value. Otherwise, produces 0. The operation is short circuited, this means that if x evaluates to a non-zero value then y is not evaluated.
Xor
x ^ y
Produces 1 only if x evaluates to a zero value and y evaluates to a non-zero value, or vice versa. Otherwise, produces 0.
Table 3. Comparison and relational operators Operator Syntax Description Equal to
x == y
Produces 1 if x is equal to y, otherwise produces 0.
Not equal to
x != y
Produces 1 if x is not equal to y, otherwise produces 0.
Greater than
x > y
Produces 1 if x is greater than y, otherwise produces 0.
Less than
x < y
Produces 1 if x is less than y, otherwise produces 0.
Greater than or equal to
x >= y
Produces 1 if x is greater than or equal to y, otherwise produces 0.
Less than or equal to
x <= y
Produces 1 if x is less than or equal to y, otherwise produces 0.
Table 4. Function call Operator Syntax Description Function call
f (arg1, arg2, …, argn)
Invoke f with the given arguments and return its result. All arguments are fully evaluated before the call.
5. API
This section documents all the functions from the Falak application programming interface (API).
Signature | Description |
---|---|
|
Prints |
|
Prints a character to stdout, where |
|
Prints |
|
Prints a newline character to stdout. Returns 0. |
|
Reads from stdin an optionally signed decimal integer and returns its value. Does not return until a valid integer has been read. |
|
Reads from stdin a string (until the end of line) and returns a handle to a newly created array containing the Unicode code points of all the characters read. |
Signature | Description |
---|---|
|
Creates a new array object with |
|
Returns the size (number of elements) of the array referenced by handle |
|
Adds |
|
Returns the value at index |
|
Sets to |
6. Code Examples
Source File | Description |
---|---|
The classical "hello, world" program. |
|
Converts decimal numbers into binary. |
|
Determines if a string is a palindrome. |
|
Computes factorials using iteration and recursion. |
|
Implementation of typical array operations. |
|
Given the date of a certain day, determines the date of the day after. |
|
Verifies that the implementation of literal values meet the specified requirements. |
|
Example using global and local variables. |
|
Verifies that the implementation of all the operators meet the specified requirements. |
|
Verifies that the implementation of the |