Métodos computacionales

Context-Free Grammars

Introduction

A formal grammar is a set of rules that are used to generate and identify patterns of strings in a language.

Context-Free Grammars (CFGs) are used to describe context-free languages [2]. A CFG can describe all regular languages and more, but they cannot describe all possible languages. A regular language is a language that can be expressed with a regular expression or a DFA/NFA.

CFGs are studied in fields of theoretical computer science, compiler design, and linguistics. They are used to describe programming languages and parser programs in compilers can be generated automatically from CFGs.

Formally, a CFG \(G\) is defined by the 4-tuple \(G = (V,\Sigma ,R,S)\), where [3]:

Deriving a String from a CFG

Follow these steps [1]:

CFG Example

We want a CFG that generates the language \(\{ a^n b^n \; | \; n \ge 0 \}\). In other words, we want to accept strings that contain a sequence of symbols a followed by the exact same number of symbols b.

The formal definition of the CFG that solves this problem is as follows:

NOTE: The above production rules can also be written as: \(A \to aAb \; | \; \varepsilon\)

Let’s now verify the following derivation: \(S \stackrel{*}{\Rightarrow} aaabbb\)

$$ \begin{align*} S &\Rightarrow \underline{ A } & \\ &\Rightarrow a \underline{ A } b & (\textrm{result of applying rule 1}) \\ &\Rightarrow a a \underline{ A } b b & (\textrm{result of applying rule 1}) \\ &\Rightarrow a a a \underline{ A } b b b & (\textrm{result of applying rule 1}) \\ &\Rightarrow a a a \varepsilon b b b & (\textrm{result of applying rule 2}) \\ &\Rightarrow a a a b b b & \\ \end{align*} $$

The following derivation tree is an alternative way of generating the desired string:

Instaparse: A Parse Generator

In Clojure, we can use the Instaparse package to create a parser from a CFG.

First, we must import the required dependencies and define some auxilary functions:

(ns instaparse-example
  (:require [instaparse.core :refer [parser]])
  (:import (instaparse.gll Failure)))

(defn fails? [r] (instance? Failure r))
(defn succeeds? [r] (not (fails? r)))

Given a grammar, turning it into an executable parser is as simple as typing the grammar. Using the grammar from the previous example:

(def example (parser "A = 'a' A 'b' | epsilon"))

The parser object behaves like a function that returns the corresponding derivation tree when applied with an input string as argument:

(example "aaabbb")
 [:A "a" [:A "a" [:A "a" [:A] "b"] "b"] "b"]

If the provided string is not syntactically correct it returns an instaparse.gll.Failure object that shows the corresponding error message when displayed:

(example "aaabb")

Parse error at line 1, column 6:
aaabb
     ^
Expected:
"b" (followed by end-of-string)

References