Weekend hacking
This weekend I decided rewriting the parser of bitbake to be token based. A natural choice for this was flex and bison, I used the work of Marc Singer. He produced fantastic lexical analyzer rules (flex) and used lemon for describing the grammar.
I started with pybison as it promised reaching the goal rapidly. I reworked the lexer and the grammar to be bison and pybison compatible but the issues were just too big to continue using pybison.
- No distribution is shipping pybison
- The lexer may not be reentrant which is a huge problem for bitbake as most files inherit or include other files.
- pybison produced syntax errors when exporting the verbatim of the lexer script
- bison2py can not handle comments in the bison files and produce syntactical wrong python modules
- errors are hard to debug
I reiterated over the possibilities and looked at lemon more closely and my conclusion is lemon simply rocks. I find it more natural that the scanner/lexer feeds the parser, the produced parser is reentrant and thread-safe. As I’ve not done anything with bison before I can not judge if lemon’s syntax is less error prone but I’m confident I’ll never ever use bison again.
So the new and optional parser is heavily based on flex and lemon but it will use the bb.data module to store the data and implement the bb.parse interface. For this to work I will need to make Python call C++ code and the C++ code python. Besides never having implemented a python module in C my biggest obstacle is how I will call flex and lemon from the distutils package when creating the module. If all goes wrong I will put the generated files into the distribution and will compile them.
I hope the parser will be a lot faster than the regexp based python implementation we currently have.