Brain dump – what fascinates me
A small brain dump of topics that currently fascinate me. These are mostly pointers and maybe it is interesting to follow it.
Books/Reading:
My kobo ebook reader has the Site Reliability Engineering book and I am now mostly done. It is kind of a revelation and explains my interest to write code but also to operate infrastructure (like struggling with ruby, rmagick, nginx…). I am interested in backends since… well ever. The first time I noticed it when we talked about Kolab at LinuxTag and I was more interested in the backend than the KDE client. At sysmocom we built an IoT product and the backend was quite some fun, especially the scale of one instance and many devices/users, capacity planning and disk commissioning, lossless upgrades.
It can be seen in my non FOSS SS7 map work on traffic masquerading and transparent rewriting. It is also clear to see which part of engineering is needed for scale (instead of just installing and restarting servers).
Lang VM design
One technology that made Java fast (Hotspot) and has seen its way into JavaScript is dynamic optimization. Most Just in Time Compilers start with generating native code per method, either directly or after the first couple of calls when the methods size is significant enough. The VM records which call paths are hot, which types are used and then can generate optimized code (e.g. specialized for integers, remove type checks). A technique pioneered at Sun for the “Self” language (and then implemented for Strongtalk and then brought to Java) was “adaptive optimization and deoptimization” and was the Phd topic of Urs Hoelzle (Google’s VP of Engineering). One of the key aspects is inlining across method boundaries as this removes method look-up, call stack handling and opens the way for code optimization across method boundaries (at the cost of RAM usage).
In OpenJDK, V8 and JavaScriptCore this adaptive optimization is typically implemented in C++ and requires quite some code. The code is complicated as it needs to optimize but also need to return to a basic function (deoptimize, e.g. if a method changed or the types passed don’t match anymore), e.g. in the middle of a for loop with tons of inlined code (think of Array.map being inlined but then need to be de-inlined). A nice and long blog post of JSC can be found here describing the On Stack Replacement (OSR).
Long introduction and now to the new thing. In the OpensmalltalkVM a new approach called Sista has been picked and I find it is genius. Like with many problems the point of view and approach really matters. Instead of writing a lot of code in the VM the optimizer runs next to the application code. The key parts seem to be:
- Using branches taken/not-taken as indicator how hot a path is. The overhead of counting these seem to be better than counting method calls/instructions/loops.
- Using the Inline Caches for type information on call sites (is that mono-, poly- or megamorphic?)
- Optimize from one set of Bytecode to another set of Bytecode.
The revelation is the last part. By just optimizing from bytecode to bytecode the VM remains in charge of creating and managing machine code. The next part is that tooling in the higher language is better or at least the roundtrip is more quick (edit code and just compile the new method instead of running make, c++, ld). The productivity thanks to the abstraction and tooling is likely higher.
As last part the OSR is easier as well. In Smalltalk thisContext (the current stack frame, activation record) is an object as well. At the right point (when the JIT has either written back variables from register to the stack or at least knows where the value is) one can just manipulate thisContext, create and link news ones and then resume execution without all the magic in other VMs.
Go, Go and escape analysis
Ken Thompson and Robert Pike are well known persons and their Go programming language is a very interesting system programming language. Like with all new languages I try to get real world experience with the language, the tooling and which kind of problems can be solved with it. I have debugged and patched some bigger projects and written two small applications with it.
There is plenty I like. The escape analysis of the compiler is fun (especially now that I know it was translated from the Plan9 C compiler from C to Go), the concurrency model is good (though allowing shared state), the module system makes sense (but makes forking harder than necessary), being able to cross compile to any target from any system.
Knowing a bit of Erlang (and continuing to read the Phd Thesis of Joe Armstrong) and being a heavy Smalltalk user there are plenty of things missing. It starts with vague runtime error messages (e.g. panicslice not having parameters) and goes to runtime and post-runtime inspection. In Smalltalk thanks to the abstraction a lot of hard things are easy and I would have wished for some of them to be in Go. Serialize all unrecovered panics? Debugging someone else’s code seems like pre 1980…
So for many developers Go is a big improvement but for some people with a wider view it might look like a lost opportunity. But that can only be felt by developers that have experienced higher abstraction and productivity.
Unsupervised machine learning
but that is for another dump…