My work on QtWebKit performance came to a surprising end late last month. It might be interesting for others how QtWebKit compares to the various other WebKit ports, where we have some strong points and where we have some homework left todo and where to pickup from where I had to leave it.
Memory consumption
Before I started our ImageDecoderQt was decoding every image as soon as the data was complete. The biggest problem with that is that the ImageSource we are embedded into does not tell the WebCore::Cache about the size of the images we already have decoded.
In this case there was no need to decode the whole image as soon as the date comes in but wait for the ImageSource to request the image size and the image data. This makes a noticable difference on memory benchmarks and allows us to have the WebCore::Cache control the lifetime of decoded image data.
We still have one case where we have more image data allocated than the WebCore::Cache thinks. This is the case for GIF images as we are decoding every frame to figure out how many images we have there.
To fix that we should patch the ImageSource to ask the ImageDecoder for “extra” allocated data, and we should fix/verify the GIF Image Reader so we can jump to a given GIF frame and decode it. This means we should remember where certain frames begin…
Performance
Networking
Markus Götz and Peter Hartmann are busy working on the QNetworkAccessManager stack. Their work includes improving the parsing speed of HTTP headers, making sure to start HTTP connections after the first iteration of the mainloop instead of the third.
In one of my tests wget is still twice as fast as the Qt stack to download the same set of files. And wget is using one connection at a time, no pipelining… and Qt is attempting to have up to 6 connections in parallel. This means there is still some work to do in reducing latency and improving scheduling of requests. I’m pretty confident that Markus and Peter will work on this!
Images
The biggest limitation of the Qt Image decoders is that in general progressive loading is not possible and unless I have messed up my reduction the Qt Image decoders are faster than the ones we have in WebCore.
With some of my reductions I can make some stuff twice as fast for the pattern QtWebKit is having on QImageReader. Currently when asking the QImageReader for the size, the GIF decoder will decode the full frame (size + image data). For the GIF decoder we start the JPEG decompression separately for getting the size, the image and the image format.
A proof of concept patch for the JPEGReader to reuse the decompression handler showed that I can cut the runtime of the image_cycling reduction by 50%.
Misc
One misc. performance goal is to remove temporary allocations. E.g. remove QString::detach() calls from the paint path, to not copy data when moving from QString to WebCore::String, QByteArray to WebCore::String. Some of these include not using WebCore::String::utf8(), but have a zero cost conversion of WebCore::String to QString and use Qt’s utf8()…
Text
But the biggest problem of QtWebKit performance is text and I statzed to work on this. For Qt we always have to go through the complex text path of WebCore which means we will end at QTextLayout, which will ask harfbuzz to shape the text.
There are two things to consider here. For QtWebKit we are using Lars’s QTextBoundaryFinder instead of ICU. I’m not sure if we have ever compared how ICU and QTextBoundaryFinder split text. We might do more work than is necessary, at least it would be good to know. Specially for Japanese and Korean we might split words too early creating more work for our complex text layout path.
The second part is to look at our QTextLayout usage pattern and start to optimize for it… the quick solutions of asking QFont to not do kerning, and not to do font merging (to not use the QFontEngingeMulti) didn’t really make a noticable difference… To get an idea of the size of the problem, on loading pages like the Wikipedia Article of the Maxwell Equations we are spending so much time in WebCore::Font::floatWidthForComplexText that other ports like WebKit/GTK+ takes to load the entire page. This also seems to be the case for sites like google news.
And this is exactly where I would have loved to continue to work on it, but that is now pushed back to my spare time where it needs to compete with the other hobby projects.