Like many others I enjoy being in Las Palmas at the Gran Canaria Desktop Summit. It is great to see new and old friends, put faces to IRC nicknames… While sitting in talks I started to feel the performance itch. What code is the moc actually generating, how fast is it… Luckily Qt Software released their internal tests and you will find some benchmarks in the tests/benchmark directory.
Looking at QMetaObject and generated code
When using QObject::connect currently the following happens. For the sender QMetaObject::indexOfSignal gets called and for the receiver the QMetaObject::indexOfSlot or QMetaObject::indexOfSignal is being invoked. Now the various QMetaObject::index* (for properties, methods, signals, slots) work in the same way. You will start with the current QMetaObject and go through the list of all methods, if you didn’t find anything you go to the parent QMetaObject and do the same. What you are doing is a linear search for a signature across the inheritance hierachy. When having found the index of the method for a given QMetaObject you will add the number of methods from your parent QMetaObjects and this will be the id used by QObject::qt_metacall. The first thing the generated code in ::qt_metacall will do is to call the parents qt_metacall to subtract the id.
What can be done to improve it
Hashing and such comes into mind, or having a trie for the whole inheritance chain. With things like gperf you could create a perfect hash for the inheritance chain. The problem with having metadata over the whole inheritance chain is that maybe your baseclass is in a different library and they added a new method, now your hash might not be unique anymore… and obviously you will require more memory when you have the whole inheritance tree…
The easy solution is to sort methods/signals/slots so you can do binary search in the various indexOf* methods in QMetaObject. And so far I have only implemented this, but I have some other ideas from “self” and javascript how to cheat a bit to make recuring actions like QObject::invokeMethod a lot faster (there is no need to search again for the “slot”).
Another thing is when having searched/found the index of the slot/signal/method you might just safe the QMetaObject and the id instead of adding the offset and when emitting the signal you avoid going through the hierachy because you actually know which ::qt_metacall we want to call…
Non academic benchmark
The code can be seen in a branch on gitorious and for some tests in the QObject::connect/QObject::disconnect benchmark the new code is 30% faster. This happens when you have to go down in the inheritance tree to find the signal… For some other cases there is no difference. What is missing is code to deal with old generated code…