Hacking on OpenBSC
I was invited to visit the On-Waves (they have a shiny new website) office in Paris this week and I was quite busy hacking away on the OpenBSC codebase. On-Waves allows me to play a bit with their MSC and learn more about GSM and in exchange OpenBSC gains a more and more complete and stable GSM A-Interface.
When developing code for OpenBSC we are mostly sitting very close to the BTS, only have one active subscriber, test one thing, restart, test another thing, restart but with any piece of software I’m writing, I want OpenBSC to be rock solid, run unattended for years, have no memory leaks, deal with the nanoBTS going away and coming back, the MSC going away and coming, all this at any point in time. So far events like Hacking at Random and the Congress are the ideal testing ground as many different handsets, subscribers, etc are the ideal playground.
My testing was limited to a small set of handsets connected via USB and executing AT commands for call handling and sending SMS. I’m addressing subscribers on the same cell. That means whenever I do a call I have mobile originated and mobile terminated testing covered and this is done by funny chat scripts that work most of the time. The next thing is to simulate failure, for some stuff where a specific layer3 message would be send, we have to wait for a more complete OsmocomBB, so what I can easily do is to cut off TCP connections. I have done this with another piece of weird shell magic. I use the output of $RANDOM and treat it as seconds and then use a kill -SUGUSR2 `pidof bsc_msc_ip` to close the MSC connection at a random time. And then I let it running and wait for failures.
I have fixed a bug/issue in the way we do release a channel. There are multiple things involved. First of all is instructing the BTS that a given channel on a timeslot is open or closing it (RF Channel Release of RSL), the other part is that on the channel one can have logical applications running (SAPI), this can be call control (SAPI=0) and SMS (SAPI=3). When opening a connection to a Mobile Station (MS) the SAPI=0 is always established, when attempting to deliver a SMS we need to open SAPI=3 first. Now our issue with bringing this down was that whenever we got a SAPI release confirm (we asked for the release and it was released) or release indication (the MS closed it) and we used to respond with a RF Channel Release. Now when trying to bringdown a connection were we delivered a SMS we would issue a RF Channel Release twice and the nanoBTS ACKed it twice! To make matter worse, whenever we get a RF Channel Release ACK we mark it as free. We had this small window when we got the first RF Channel Release ACK, allocated the channel again, and then get the second RF Channel Release ACK. I have fixed this issue in multiple ways. The first is to use the T3111 timer to wait until we issue the RF Channel Release, the second is to handle (RF) failures by “blocking” the lchan for a short second to receive multiple errors and release acks and the last bit is to properly bring down the channel. When we have SAPI!=0 we bring that down first, then we send SACH deactivate, followed by SAPI=0 release and then finally we send the RF Channel Release. This makes things more reliable on our side but we need to fix some more things. There is a FIXME inside the gsm_04_08_utils.c that mentions the start of a T3109 timer. In any case when sending a SAPI release the BTS will answer with success or a timeout and we handle both.
Today I addressed losing the RSL or OML connection to the nanoBTS and making sure we are reconnecting and not leaking any memory. This took me most of the day to get stable and I have found a bug or such inside the osmocore/select.c when releasing a bsc_fd that is the last one of the list. The difficulty here is making sure we do not leak memory, close all file descriptors, close all channels that take place on the RSL connection and make sure that when the BTS is up again we can use the channels that were allocated during the failure. To help with testing I added two commands to our vty interface to drop the OML or the RSL connection on a given BTS. The other part that was helpful is to use Linux’s Netfilter and drop packets on a TCP connection and to wait for a failure. Now I can simulate most of the network failures easily and could build some trust.
And my final wishlist item would be to have like 16 GTA02 boards, use FS0 on each and run a simple script to dial, send SMS, pickup phonecalls this would allow me to heavily test the networking in an automated way. On top of that would be to have a OsmocoreBB enabled Calypso or C123 and then I could even send messages that are normally not send at all. And thanks to FreeSoftware development I’m sure we are going to reach that goal.