Painting on ARM
I’m currently work on making QtWebKit faster on ARM (hopefully later MIPS hardware) and in my current sprint I’m focused on the painting speed. Thanks to Samuel Rødal my work is more easy than before. He added a new paintengine and graphicssystem that allows to trace the painting done with QPainter and then later replay that. Some of you might feel reminded of Carl Worth’s post that mostly did the same for cairo.
How to make painting faster? The Setup
- Record a paint trace of your favorite app with tst_cycler -graphicssystem trace, do the rendering and on exit the trace will be generated
- Use qttracereplay to replay the trace on your hardware (I had some issues on my target hardware though)
- Use OProfile to look where the time is spent and do something about it…
- Change code go back to qttracereplay..
What did I do so far?
Most samples are recorded in the comp_func_SourceOver routine. With some searching in the MMX optimized routines and talking to the rasterman I’m doing the following things to improve things on the const_alpha=255 path. In the qttracereplay I go from about 17.4 fps to around 26 fps on my beagleboard with Qt Embedded Linux on the plain OMAP3 fb but I still need to do a more careful visual inspection of the result.
- Handle alpha=0x00 on the source special by not doing anything
- Handle alpha=0xff on the source special by simply copying it to the dest
- Unroll the above block eight times interleaved with preloads…
I will have to clean all this up, merge it with the symbian optimized copies (which sometimes require armv6 or later)… I will probably look at BYTE_MUL now and see if I can make it faster without taking a armv6 or later instruction… or honestly first understand how the current BYTE_MUL is working…