I have long been a fan of Pratt parsing or as it is commonly called Top down operator precedence parsing. In part because I find most of the theoretical discussion of parsers quite boring, so the simplicity appeals. I have also never really bothered to learn machine generate parserng tools such as Yacc/Bison/many others.
I see parsing as a pragmatic thing rather than an academic. Each to their own I guess.
While out for a jog I started to think about operator overloading.
Not the usual
- style but a more
liberal one where you can create new operators. I might want to add
:+: operator to do something. That is the language would
say that any token starting with a
+,-,:,*,/ and so on can
be an operator. The language could provide the expected ones and you
could provide the rest as you build a program.
Pratt parser seem to be ideally suited to this as they look up the symbols from a table that maps strings to symbols. It is easy to dynamically add to the table while you are parsing. A stack of tables allow symbols to be scoped should modules be required.
At this point it is probably worth working through this or some other tutorial.
The Raku programming language (previously known as Perl 6) is a language that loves to overload operators. While skimming though its documentation it became clear they had a very good handle on operators. It is a slight simplification to classify operators into the following categories but it is useful when thinking about them.
I would imagine the first three are fairly obvious if you have done a bit of programming. The infix_r is where you evaluation the expressions from right to left rather than left to right, important if you add a symbol to raise to the nth power.
Circumfix is a little less common but they are the idea of an
operator containing both a prefix and post fix part. Our example will,
in many languages, create an array. The
[ and the
] are required parts of the operator. Braces would be
another example for instance using them to control the evaluation order
(6+4)*10. Not many languages allow you to create new ones
of this type of operator with Raku being the only example I know
Post circumfix is where the first part of the operator is put after
the initial symbol. In our case the
being used as an array access operator. A brace is often used an invoke
fn(a,b,c). Push it a little further and
you can view
if (condition) true-expr false-expr as one as
well. Although braces after in an
if expression have
fallen out of favour. Many languages allow for overloading of these
type of operators but usually for a limited set. I am not sure what
Raku allows in this instance.
Circumfix and Postcircumfix have strong analogies with prefix and
infix operators when viewed though the lens of the Pratt parser. They
are also mutually exclusive (in my implementation) so perhaps calling
have been a better idea.
In summary, I wrote some very bad python code to experiment with these ideas and the Pratt parser handled them all very well. It is bad enough code that I don't want to release it, plus there is a slight feeling the the journey is important here to gain an understanding of Pratt parsers. There are many good Pratt parser tutorials out there.
Is operator overloading a good idea? The position you can take on this can vary and personally I shift around a bit in this space. One observation (with no data to back it up) is the more the language is trying to empower the individual over a team the more likely it is to allow operator overloading of some kind.
This was my first return to writing some proper Python code in a fair while. I found it fun but suspect a python programmer would be face palming if they saw the code. I really need to dig out my copy of Fluent Python and give that another read.