Pratt Parsing Python
29 December 2020

I have long been a fan of Pratt parsing or as it is commonly called Top down operator precedence parsing. In part because I find most of the theoretical discussion of parsers quite boring, so the simplicity appeals. I have also never really bothered to learn machine generate parserng tools such as Yacc/Bison/many others.

I see parsing as a pragmatic thing rather than an academic. Each to their own I guess.

While out for a jog I started to think about operator overloading. Not the usual + and - style but a more liberal one where you can create new operators. I might want to add :+: operator to do something. That is the language would say that any token starting with a +,-,:,*,/ and so on can be an operator. The language could provide the expected ones and you could provide the rest as you build a program.

Pratt parser seem to be ideally suited to this as they look up the symbols from a table that maps strings to symbols. It is easy to dynamically add to the table while you are parsing. A stack of tables allow symbols to be scoped should modules be required.

At this point it is probably worth working through this or some other tutorial.

The Raku programming language (previously known as Perl 6) is a language that loves to overload operators. While skimming though its documentation it became clear they had a very good handle on operators. It is a slight simplification to classify operators into the following categories but it is useful when thinking about them.

Operator Type Example
Prefix -10
Postfix x++
Infix 4+8
Infix_r 4^8
Circumfix [1,2,3]
PostCircumfix arr[100]

I would imagine the first three are fairly obvious if you have done a bit of programming. The infix_r is where you evaluation the expressions from right to left rather than left to right, important if you add a symbol to raise to the nth power.

Circumfix is a little less common but they are the idea of an operator containing both a prefix and post fix part. Our example will, in many languages, create an array. The [ and the ] are required parts of the operator. Braces would be another example for instance using them to control the evaluation order (6+4)*10. Not many languages allow you to create new ones of this type of operator with Raku being the only example I know off.

Post circumfix is where the first part of the operator is put after the initial symbol. In our case the [,] is being used as an array access operator. A brace is often used an invoke function operator, fn(a,b,c). Push it a little further and you can view if (condition) true-expr false-expr as one as well. Although braces after in an if expression have fallen out of favour. Many languages allow for overloading of these type of operators but usually for a limited set. I am not sure what Raku allows in this instance.

Circumfix and Postcircumfix have strong analogies with prefix and infix operators when viewed though the lens of the Pratt parser. They are also mutually exclusive (in my implementation) so perhaps calling them unary-circumfix and infix-circumfix may have been a better idea.

In summary, I wrote some very bad python code to experiment with these ideas and the Pratt parser handled them all very well. It is bad enough code that I don't want to release it, plus there is a slight feeling the the journey is important here to gain an understanding of Pratt parsers. There are many good Pratt parser tutorials out there.

Is operator overloading a good idea? The position you can take on this can vary and personally I shift around a bit in this space. One observation (with no data to back it up) is the more the language is trying to empower the individual over a team the more likely it is to allow operator overloading of some kind.

This was my first return to writing some proper Python code in a fair while. I found it fun but suspect a python programmer would be face palming if they saw the code. I really need to dig out my copy of Fluent Python and give that another read.