Python Development: Writing Small Interpreters
Introduction
Writing parsers or interpreters or compilers isn't too common of a topic but it's an extremely useful design to be familiar with. For example, sometimes we want to do more than parse options for input or we might want a custom configuration file format. Another example would be using another language to do custom processing. What if we have a strict data store but want to have a little flexibility with how we access this data store? We could write an elaborate model library that provides nice easy access to the data store or we could write a structured query language (SQL ) parser that does this translation for us and is far easier to adapt to other needs.
Note
This article assumes some prior knowledge about context free grammer .
Creating a Grammar
Creating a parser is far easier once the grammar has been created and thought about. This is not only true because it's proper planning but also because most parser generators (including the one we're looking at) read a grammar as their input.
We'll be looking at the expression grammar from tiny basic as an example. This is a simple grammar, safe for LR or LL parsing (which is important to note when we talk about other languages like SQL).
Tiny Basic Expression Grammar
expression ::= (+|-|ε) term ((+|-) term)* term ::= factor ((*|/) factor)* factor ::= number | (expression)
The parser generator we'll utilize is pyparsing , an easy to use LL parser. The following is the pyparsing code to implement the above grammar:
expr = Forward() factor = ( Word(nums) | Group(Suppress('(') + expr + Suppress(')') ) term = Group(factor + ZeroOrMore((Literal('*') | Literal('/')) + factor)) expr << Group(Optional(Literal('-') | Literal('+')) + term + ZeroOrMore((Literal('-') | Literal('+')) + term))
This pyparsing snippet turns sentences like the following:
Into easily consumable lists like the following:
[[['5'], '+', ['5', '*', '6', '/', '3'], '-', [[[['47'], '+', ['56']]], '*', '34']]]
We could improve this parser by having it autoreduce expressions and other expression handling code but it suffices for demonstration purposes.
We still need to invoke the parser in order to get the parsed set of tokens. The following snippet shows the relevant python line:
expr.parseString('5+5*6/3-(47+56)*34')
Parser Testing
Proper testing includes unit tests and other strategies but sometimes the easiest way to see what's going on with parsers is to write a small interpreter that loops over expressions so you can see the effects. Using python and readline we can create a convenient environment for live interaction of this sort:
import rlcompleter import readline import so if not os.access(".history", os.F_OK): open(".history", "w").close() readline.read_history_file(".history") buffer = "" while True try: line = raw_input(pycolorize.light_blue("BASIC$ ")) except EOFError: readline.write_history_file(".history") print break if line.lower() == "exit" or line.lower() == "quit": readline.write_history_file(".history") break buffer += line result = ACTION_ON_BUFFER buffer = ""
Complete Reference Script
import rlcompleter import readline import os import pprint import pycolorize from pyparsing import * if not os.access(".history", os.F_OK): open(".history", "w").close() readline.read_history_file(".history") class ExpressionParser(object): def __init__(self): self._expr = Forward() factor = ( Word(nums) | Group(Suppress('(') + self._expr + Suppress(')')) ) term = Group(factor + ZeroOrMore(Literal('*') | Literal('/')) + factor) self._expr << Group(Optional(Literal('-') | Literal('+')) + term + ZeroOrMore((Literal('-') | Literal('+')) + term)) def _calculate(self, l): while any([ isinstance(x, list) for x in l]): for n, i in enumerate(l): if isinstance(i, list): l[n] = self._calculate(i) return str(eval(" ".join(l))) def __call__(self, string): return self._calculate(self._expr.parseString(string.asList())) print pycolorize.green("Enter your commands to tokenize:") print pycolorize.green("Enter a blank line to exit.") while True: try: line = raw_input(pycolorize.light_blue("BASIC$ ")) except EOFError: readline.write_history_file(".history") print break if line.lower() == "exit" or line.lower() == "quit": readline.write_history_file(".history") break buffer += line result = None try: result = ExpressionParser()(buffer) except ParseBaseException, e: buffer = "" pycolorize.error(e.line) pycolorize.error(" "*(e.col - 1) + "^") pycolorize.error(str(e)) continue pycolorize.status("Result: %s", result) buffer = ""
Comments
Comments powered by Disqus