Python Development: Writing Small Interpreters



Introduction

Writing parsers or interpreters or compilers isn't too common of a topic but it's an extremely useful design to be familiar with. For example, sometimes we want to do more than parse options for input or we might want a custom configuration file format. Another example would be using another language to do custom processing. What if we have a strict data store but want to have a little flexibility with how we access this data store? We could write an elaborate model library that provides nice easy access to the data store or we could write a structured query language (SQL ) parser that does this translation for us and is far easier to adapt to other needs.

Note

This article assumes some prior knowledge about context free grammer .

Creating a Grammar

Creating a parser is far easier once the grammar has been created and thought about. This is not only true because it's proper planning but also because most parser generators (including the one we're looking at) read a grammar as their input.

We'll be looking at the expression grammar from tiny basic as an example. This is a simple grammar, safe for LR or LL parsing (which is important to note when we talk about other languages like SQL).

Tiny Basic Expression Grammar

expression ::= (+|-|ε) term ((+|-) term)*
term ::= factor ((*|/) factor)*
factor ::= number | (expression)

The parser generator we'll utilize is pyparsing , an easy to use LL parser. The following is the pyparsing code to implement the above grammar:

expr = Forward()
factor = ( Word(nums) | Group(Suppress('(') + expr + Suppress(')') )
term = Group(factor + ZeroOrMore((Literal('*') | Literal('/')) + factor))
expr << Group(Optional(Literal('-') | Literal('+')) + term + ZeroOrMore((Literal('-') | Literal('+')) + term))

This pyparsing snippet turns sentences like the following:

System Message: ERROR/3 (<string>, line 56)

Cannot find pygments lexer for language "math"

.. code-block:: math

    5 + 5 * 6 / 3 - (47 + 56) * 34

Into easily consumable lists like the following:

[[['5'], '+', ['5', '*', '6', '/', '3'], '-', [[[['47'], '+', ['56']]], '*', '34']]]

We could improve this parser by having it autoreduce expressions and other expression handling code but it suffices for demonstration purposes.

We still need to invoke the parser in order to get the parsed set of tokens. The following snippet shows the relevant python line:

expr.parseString('5+5*6/3-(47+56)*34')

Parser Testing

Proper testing includes unit tests and other strategies but sometimes the easiest way to see what's going on with parsers is to write a small interpreter that loops over expressions so you can see the effects. Using python and readline we can create a convenient environment for live interaction of this sort:

import rlcompleter
import readline
import so

if not os.access(".history", os.F_OK):
    open(".history", "w").close()

readline.read_history_file(".history")

buffer = ""

while True
    try:
        line = raw_input(pycolorize.light_blue("BASIC$ "))
    except EOFError:
        readline.write_history_file(".history")
        print
        break

    if line.lower() == "exit" or line.lower() == "quit":
        readline.write_history_file(".history")
        break

    buffer += line
    result = ACTION_ON_BUFFER
    buffer = ""

Complete Reference Script

import rlcompleter
import readline
import os
import pprint
import pycolorize

from pyparsing import *

if not os.access(".history", os.F_OK):
    open(".history", "w").close()

readline.read_history_file(".history")

class ExpressionParser(object):
    def __init__(self):
        self._expr = Forward()
        factor = ( Word(nums) | Group(Suppress('(') + self._expr + Suppress(')')) )
        term = Group(factor + ZeroOrMore(Literal('*') | Literal('/')) + factor)
        self._expr << Group(Optional(Literal('-') | Literal('+')) + term + ZeroOrMore((Literal('-') | Literal('+')) + term))

    def _calculate(self, l):
        while any([ isinstance(x, list) for x in l]):
            for n, i in enumerate(l):
                if isinstance(i, list):
                    l[n] = self._calculate(i)
            return str(eval(" ".join(l)))

    def __call__(self, string):
        return self._calculate(self._expr.parseString(string.asList()))

print pycolorize.green("Enter your commands to tokenize:")
print pycolorize.green("Enter a blank line to exit.")

while True:
    try:
        line = raw_input(pycolorize.light_blue("BASIC$ "))
    except EOFError:
        readline.write_history_file(".history")
        print
        break

    if line.lower() == "exit" or line.lower() == "quit":
        readline.write_history_file(".history")
        break

    buffer += line
    result = None

    try:
        result = ExpressionParser()(buffer)
    except ParseBaseException, e:
        buffer = ""

        pycolorize.error(e.line)
        pycolorize.error(" "*(e.col - 1) + "^")
        pycolorize.error(str(e))

        continue

    pycolorize.status("Result: %s", result)

    buffer = ""

Conclusion

Writing parsers with pyparsing is quite simple but remember that any non-LL grammars will need to have the left-recursion factored out. Python isn't the only language with available parser generators: C and C++ have bison and lemon , and other languages are sure to have them as well.

Comments


Comments powered by Disqus