Kubernetes on Microsoft Azure

Kubernetes on Microsoft Azure

The recent Amazon S3 outage should make a strong argument that centralized services have severe issues, technically but from a business point of view as well(you don’t own the destiny of your own product!) and I whole heartily agree with “There is no cloud, it’s only someone else’s computer”. 

Still from time to time I like to see beyond my own nose (and I prefer the German version of that proverb!) and the current exploration involves ReactJS (which I like), Tensorflow (which I don’t have enough time for) and generally looking at Docker/Mesos/Kubernetes to manage services, zero downtime rolling updates. I have browsed and read the documentation over the last year, like the concepts (services, replication controller, pods, agents, masters), planned how to use it but because it doesn’t support SCTP never looked into actually using it.

Microsoft Azure has the Azure Container Services and since end of February it is possible to create Kubernetes clusters. This can be done using the v2 of the Azure CLI or through the portal. I finally decided to learn some new tricks.

Azure asks for a clientId and password and I entered garbage and hoped the necessary accounts would be created. It turns out that the portal is not creating it and also not doing a sanity check of these credentials and second when booting the master it will not properly start. The Microsoft support was very efficient and quick to point that out. I wish the portal would make a sanity check though. So make sure to create a principal first and use it correctly. I ended up creating it on the CLI.

I re-created the cluster and executed kubectl get nodes. It started to look better but one agent was missing from the list of nodes. After logging in I noticed that kubelet was not running. Trying to start it by hand shows that docker.service is missing. Now why it is missing is probably for Microsoft engineering to figure out but the Microsoft support gave me:

sudo rm -rf /var/lib/cloud/instances

sudo cloud-init -d init

sudo cloud-init -d modules -m config

sudo cloud-init -d modules -m final

sudo systemctl restart kubelet

After these commands my system would have a docker.service, kubelet would start and it will be listed as a node. Commands like kubectl expose are well integrated and use a public IPv4 address that is different from the one used for ssh/management. So all in all it was quite easy to get a cluster up and I am sure that some of the hick-ups will be fixed…

State of structured text for technical documentation

State of structured text for technical documentation

A long time ago I wrote the OpenEmbedded User Manual and back then the obvious choice was to make it a docbook. In my community there were plenty of other examples that used docbook and it helped to get started. The great thing of docbook was with one XML input one could generate output in many different formats like HTML, XHTML, ePub or PDF. It separated the format from the presentation and was tailored for technical documents and articles with advanced features like generating a change history, appendix and many more. With XML entities it was possible to share chapters and parts between different manuals.

When creating Sysmocom and starting to write our usermanuals we have continued to use docbook. After all besides the many tags in XML it is a format that can be committed to git, allowing review and the publishing is just like a software build and can be triggered through git.

On the other hand writing XML by hand, indenting paragraphs to match the tree structure of the document is painful. In hindsight writing a docbook feels more like writing xml tags than writing content. I started to look for alternatives and heard about asciidoc, discarded it and then re-evaluated and started to use it as default. The ratio of content to formatting is really great. With a2x we continued to use docbook/dblatex to render the document. With some trial and error we could even continue to use the docbook docinfo (-a docinfo and a file manual-docinfo.xml). And finally asciidoc can be used on github as well. It works by adding .adoc to the filename and will be rendered nicely.

So with asciidoc, restructured text (rst), markdown (md) and many more (textile, pillar, …) we have great tools that make it easier to focus on the content and have an okay look. The downside is that there are so many of them now (and incompatible dialects). This leads to rendering tools having big differences, e.g. not being able to use a docinfo for PDF generation, being able to add raw PDF commands, etc.

I am currently exploring to publish documentation on readthedocs.org and my issue is that they are using Python sphinx which only works with markdown or restructure text. As github can’t

In the attempt to pick-up users where they are I am exploring to use readthedocs.org as an additional channel for documents. The website can integrate with github to automatically rebuild the documentation. One issue is that they exclusively use Python sphinx to render the documentation and that means it needs to use rst or markdown (or both) as input.

I can go down the xkcd way and create a meta-format to rule them all. Try to use pandoc to convert these documents on the fly (but pandoc already had some issues with basic tables rst) or switch the format. I looked at rst2pdf but while powerful seems to be lacking the docinfo support and markdown. I am currently exploring to stay with asciidoc and then use asciidoc -> docbook -> markdown_github for readthedocs. Let’s see how far this gets.

Starting with a Diameter stack

Starting with a Diameter stack

Going from 2G/3G requires to learn a new set of abbreviations. The network is referred to as IP Multimedia Subsystem (IMS) and the HLR becomes Home subscriber server (HSS). ITU ASN1 to define the RPCs (request, response, potential errors), message structure and encoding in 2G/3G is replaced with a set of IETF RFCs. From my point of view names of messages, names of attributes change but the basic broken trust model remains.

Having worked on probably the best ASN1/TCAP/MAP stack in Free Software it is time to move to the future and apply the good parts and lessons learned to Diameter. The first RFC is to look at is RFC 6733 – Diameter Base Protocol. This defines the basic encoding of messages, the requests, responses and errors, a BNF grammar to define these messages, when and how to connect to remote systems, etc.

The core part of our ASN1/TCAP/MAP stack is that the 3GPP ASN1 files are parsed and instead of just generating structs for the types (like done with asn1c and many other compilers) we have a model that contains the complete relationship between application-context, contract, package, argument, result and errors. From what I know this is quite unique (at least in the FOSS world) and it has allowed rapid development of a HLR, SMSC, SCF, security research and more.

So getting a complete model is the first step. This will allow us to generate encoders/decoders for languages like C/C++, be the base of a stack in Smalltalk, allow to browse the model graphically, generate fancy pictures, …. The RFC defines a grammar of how messages and grouped Attribute-Value-Pairs (AVP) are formatted and then a list of base messages. The Erlang/OTP framework has then extended this grammar to define a module and relationships between modules.petitparser_diameter

I started by converting the BNF into a PetitParser grammar. Which means each rule of the grammar becomes a method in the parser class, then one can create a unit test for this method and test the rule. To build a complete parser the rules are being combined (and, or, min, max, star, plus, etc.) with each other. One nice tool to help with debugging and testing the parser is the PetitParser Browser. It is pictured above and it can visualize the rule, show how rules are combined with each other, generate an example based on the grammar and can partially parse a message and provide debug hints (e.g. ‘::=’ was expected as next token).

After having written the grammar I tried to parse the RFC example and it didn’t work. The sad truth is that while the issue was known in RFC 3588, it has not been fixed. I created another errata item and let’s see when and if it is being picked up in future revisions of the base protocol.

The next step is to convert the grammar into a module. I will progress as time permits and contributions are more than welcome.

Analyze cellular problems using Quectel modules

Analyze cellular problems using Quectel modules

Introduction

Previously I have written about connectivity options for IoT devices and today I assume that a cellular technology (e.g. names like GSM, 3G, UMTS, LTE, 4G) has been chosen. Unless you are a big vendor you will end up using a module (instead of a chipset) and either you are curious what the module is doing behind its AT command interface or you are trying to understand a real problem. The following is going to help you or at least be entertaining.

The xgoldmon project was a first to provide air interface traces and logging to the general public but it was limited to Infineon baseband (and some Gemalto devices), needed special commands to enable and didn’t include all messages all the time.

In the last months I have intensively worked with modules of a vendor called Quectel. They are using Qualcomm chipsets and have built the GSM/UMTS Quectel UC20 and the GSM/UMTS/LTE Quectel EC20 modules. They are available as a variant to solder but for speeding up development they provide them as miniPCI express as well. I ended up putting them into a PCengines APU2, soldered an additional SIM card holder for the second SIM card, placed U.FL to SMA connectors and put it into one of their standard cases. While the UC20 and EC20 are pretty similar the software is not the same and some basic features are missing from the EC20, e.g. the SIM ToolKit support. The easiest way to acquire these modules in Europe seems to be through the above links.

The extremely nice feature is that both modules export Qualcomm’s bi-directional DIAG debug interface by USB (without having to activate it through an undocumented AT command). It is a framed protocol with a simple checksum at the end of a frame and many general (e.g. logging and how regions are described) types of frames are known and used in projects like ModemManager to extract additional information. Some parts that include things like Tx-power are not well understood yet.

I have made a very simple utility available on github that will enable logging and then convert radio messages to the Osmocom GSMTAP protocol and send it to a remote host using UDP or write it to a pcap file. The result can be analyzed using wireshark.

Set-up

You will need a new enough Linux kernel (e.g. >= Linux 4.4) to have the modems be recognized and initialized properly. This will create four ttyUSB serial devices, a /dev/cdc-wdmX and a wwanX interface. The later two can be used to have data as a normal network interface instead of launching pppd. In short these modules are super convenient to add connectivity to a product.

apu2_quectel_uc20_ec20
PCengines APU2 with Quectel EC20 and Quectel UC20

Building

The repository includes a shell script to build some dependencies and the main utility. You will need to install autoconf, automake, libtool, pkg-config, libtallocmake, gcc on your Linux distribution.

[shell]
git clone git://github.com/moiji-mobile/diag-parser
cd diag-parser
./build/build_local.sh
[/shell]

Running

Assuming that your modem has exposed the DIAG debug interface on /dev/ttyUSB0 and you have your wireshark running on a system with the internal IPv4 address of 10.23.42.7 you can run the following command.

[shell]
./diag-parser -g 10.23.42.7 -i /dev/ttyUSB0
[/shell]

Exploring

Analyzing UMTS with wireshark. The below shows a UMTS capture taken with the Quectel module. It allows you to see the radio messages used to register to the network, when sending a SMS and when placing calls.

diag-parse_rantrace
Wireshark dissecting UMTS
Using docker at the Osmocom CI

Using docker at the Osmocom CI

As part of the Osmocom.org software development we have a Jenkins set-up that is executing unit and system tests. For OpenBSC we will compile the software, then execute the unit tests and finally run a bunch of system tests. The system tests will verify making configuration changes through the telnet interface, the machine control interface, might try to connect to other parts, etc.

In the past this was executed after a committer had pushed his changes to the repository and the build time didn’t matter. As part of the move to the Gerrit code review we execute them before and this means that people might need to wait for the result… (and waiting for a computer shouldn’t be necessary these days).
sysmocom is renting a dedicated build machine to speed-up compilation and I have looked at how to execute the system tests in parallel. The issue is that during a system test we bind to ports on localhost and that means we can not have two test runs at the same time.
I decided to use the Linux network namespace support and opted for using docker to achieve it. There are some hick-ups but in general it is a great step forward. Using a statement like the following we execute our CI script in a clean environment.

$ docker run –rm=true -e HOME=/build -w /build -i -u build -v $PWD:/build osmocom:amd64 /build/contrib/jenkins.sh

As part of the OpenBSC build we are re-building dependencies and thanks to building in the virtual /build directory we can look at archiving libosmocore/libosmo-sccp/libosmo-abis and not rebuild it all the time.
Collecting network traffic, ØMQ and packetbeat

Collecting network traffic, ØMQ and packetbeat

As part of running infrastructure it might make sense or be required to store logs of transactions. A good way might be to capture the raw unmodified network traffic. For our GSM backend this is what we (have) to do and I wrote a client that is using libpcap to capture data and sends it to a central server for storing the trace. The system is rather simple and in production at various customers. The benefit of having a central server is having access to a lot of storage without granting too many systems and users access, central log rotation and compression, an easy way to grab all relevant traces and many more.
Recently the topic of doing real-time processing of captured data came up. I wanted to add some kind of side-channel that distributes data to interested clients before writing it to the disk. E.g. one might analyze a RTP audio flow for packet loss, jitter, without actually storing the personal conversation.
I didn’t create a custom protocol but decided to try ØMQ (Zeromq). It has many built-in strategies (publish / subscribe, round robin routing, pipeline, request / reply, proxying, …) for connecting distributed system. The framework abstracts DNS resolving, connect, re-connect and exposes very easy to build the standard message exchange patterns. I opted for the publish / subscribe pattern because the collector server (acting as publisher) does not care if anyone is consuming the events or data. The message I sent are quite simple as well. There are two kind of multi-part messages, one for events and one for data. A subscriber is able to easily filter for events or data and filter for a specific capture source.
The support for Zeromq was added in two commits. The first one adds basic zeromq context/socket support and configuration and the second adds sending out the events and data in a fire and forget manner. And in a simple test set-up it seems to work just fine.
Since moving to Amsterdam I try to attend more meetups. Recently I went to talk at the local Elasticsearch group and found out about packetbeat. It is program written in Go that is using a PCAP library to capture network traffic, has protocol decoders written in go to make IP re-assembly and decoding and will upload the extracted information to an instance of Elasticsearch.  In principle it is somewhere between my PCAP system and a distributed wireshark (without the same amount of protocol decoders). In our network we wouldn’t want the edge systems to directly talk to the Elasticsearch system and I wouldn’t want to run decoders as root (or at least with extended capabilities).
As an exercise to learn a bit more about the Go language I tried to modify packetbeat to consume trace data from my new data interface. The result can be found here and I do understand (though I am still hooked on Smalltalk/Pharo) why a lot of people like Go. The built-in fetching of dependencies from github is very neat, the module and interface/implementation approach is easy to comprehend and powerful.
The result of my work allows something like in the picture below. First we centralize traffic capturing at the pcap collector and then have packetbeat pick-up data, decode and forward for analysis into Elasticsearch. Let’s see if upstream is merging my changes.

 

Connectivity options for mobile M2M/IoT/Connected devices

Connectivity options for mobile M2M/IoT/Connected devices

Many of us deal or will deal with (connected) M2M/IoT devices. This might be writing firmware for microcontrollers, using a RTOS like NuttX or a full blown Unix (like) operating system like FreeBSD or Yocto/Poky Linux, creating and building code to run on the device, processing data in the backend or somewhere inbetween. Many of these devices will have sensors to collect data like GNSS position/time, temperature, light detector, measuring acceleration, see airplanes, detect lightnings, etc.The backend problem is work but mostly “solved”. One can rely on something like Amazon IoT or creating a powerful infrastructure using many of the FOSS options for message routing, data storage, indexing and retrieval in C++. In this post I want to focus about the little detail of how data can go from the device to the backend.

To make this thought experiment a bit more real let’s imagine we want to build a bicycle lock/tracker. Many of my colleagues ride their bicycle to work and bikes being stolen remains a big tragedy. So the primary focus of an IoT device would be to prevent theft (make other bikes a more easy target) or making selling a stolen bicycle more difficult (e.g. by easily checking if something has been stolen) and in case it has been stolen to make it more easy to find the current location.

Architecture

Let’s assume two different architectures. One possibility is to have the bicycle actively acquire the position and then try to push this information to a server (“active push”). Another approach is to have fixed installed scanning stations or users to scan/report bicycles (“passive pull”). Both lead to very different designs.

Active Push

The system would need some sort of GNSS module, a microcontroller or some full blown SoC to run Linux, an accelerator meter and maybe more sensors. It should somehow fit into an average bicycle frame, have good antennas to work from inside the frame, last/work for the lifetime of a bicycle and most importantly a way to bridge the air-gap from the bicycle to the server.

Push architecture

 

Passive Pull

The device would not know its position or if it is moved. It might be a simple barcode/QR code/NFC/iBeacon/etc. In case of a barcode it could be the serial number of the frame and some owner/registration information. In case of NFC it should be a randomized serial number (if possible to increase privacy). Users would need to scan the barcode/QR-code and an application would annotate the found bicycle with the current location (cell towers, wifi networks, WGS 84 coordinate) and upload it to the server. For NFC the smartphone might be able to scan the tag and one can try to put readers at busy locations.
The incentive for the app user is to feel good collecting points for scanning bicycles, maybe some rewards if a stolen bicycle is found. Buyers could easily check bicycles if they were reported as stolen (not considering the difficulty of how to establish ownership).
Pull architecture

 

Technology requirements

The technologies that come to my mind are Barcode, QR-Code, play some humanly not hearable noise and decode in an app, NFCZigBee6LoWPANBluetooth, Bluetooth Smart, GSM, UMTS, LTE, NB-IOT. Next I will look at the main differentiation/constraints of these technologies and provide a small explanation and finish how these constraints interact with each other. 

World wide usable

Radio Technology operates on a specific set of radio frequencies (Bands). Each country may manage these frequencies separately and this can lead to having to use the same technology on different bands depending on the current country. This will increase the complexity of the antenna design (or require multiple of them), make mechanical design more complex, makes software testing more difficult, production testing, etc. Or there might be multiple users/technologies on the same band (e.g. wifi + bluetooth or just too many wifis).

Power consumption

Each radio technology requires to broadcast and might require to listen or permanently monitor the air for incoming messages (“paging”). With NFC the scanner might be able to power the device but for other technologies this is unlikely to be true. One will need to define the lifetime of the device and the size of the battery or look into ways of replacing/recycling batteries or to charge them.

Range

Different technologies were designed to work with sender/receiver being away at different min/max. distances (and speeds but that is not relevant for the lock nor is the bandwidth for our application). E.g. with Near Field Communication (NFC) the workable range is meters while with GSM it will be many kilometers and with UMTS the cell size depends on how many phones are currently using it (the cell is breathing).

Pick two of three

Ideally we want something that works over long distances, requires no battery to send/receive and the system is still pushing out the position/acceleration/event report to servers. Sadly this is not how reality works and we will have to set priorities.
The more bands to support, the more complicated the antenna design, production, calibration, testing. It might be that one technology does not work in all countries or that it is not equally popular or the market situation is different, e.g. some cities have city wide public hotspots, some don’t.
Higher power transmission increases the range but increases the power consumption even more. More current will be used during transmission which requires a better hardware design to buffer the spikes, a bigger battery and ultimately a way to charge or efficiently replace batteries.Given these constraints it is time to explore some technologies. I will use the one already mentioned at the beginning of this section.

Technologies

TechnologyBandsGlobal coverageRangeBattery neededScan Device neededCost of deviceArch.Comment
Barcode/QR-CodeOpticalYesCentimetersNoApp scanning barcode requiredextremely lowPullSticker needs to be hard to remove and visible, maybe embedded to the frame
Play audioNon human hearable audioYesCentimetersYesApp recording audiomoderatePullButton to play audio?
NFC13.56 MhzYesCentimetersNoYesextremely lowPullPrivacy issues
RFIDManyYes, but not on single bandCentimeters to metersYesReceiver requiredlowPullMany bands, specific readers needed
Bluetooth LE2.4 GhzYesMetersYesYes, but commonlowPull/PushCompetes with Wifi for spectrum
ZigBeeMultipleYes, but not on single bandMetersYesYesmidPushNot commonly deployed, software more involved
6LoWPANLike ZigBeeLike ZigBeeMetersYesYeslowPushUses ZigBee physical layer and then IPv6. Requires 6LoWPAN to Internet translation
GSM800/900, 1800/1900Almost besides South Korea, Japan, some islandsKilometersYesNomoderatePushAlmost global coverage, direct communication with backend possible
UMTSManyLess than GSM but South Korea, JapanMeters to Kilometers depends on usageYesNohighPushHigher power usage than GSM, higher device cost
LTEManyLess than GSMDesigned for kilometersYesNohighPushExpensive, higher power consumption
NB-IOT (LTE)ManyNot deployedKilometersYesNohighPushNot deployed and coming in the future. Can embed GSM equally well into a LTE carrier

Conclusion

Both a push and pull architecture seem to be feasible and create different challenges and possibilities. A pull architecture will require at least Smartphone App support and maybe a custom receiver device. It will only work in regions with lots of users and making privacy/tracking more difficult is something to solve.
For push technology using GSM is a good approach. If coverage in South Korea or Japan is required a mix of GSM/UMTS might be an option. NB-IOT seems nice but right now it is not deployed and it is not clear if a module will require less power than a GSM module. NB-IOT might only be in the interest of basestation vendors (the future will tell). Using GSM/UMTS brings its own set of problems on the device side but that is for other posts.
Leaving Berlin, saying hello to Amsterdam

Leaving Berlin, saying hello to Amsterdam

Berlin continues to gain a lot of popularity, culturally and culinarily it is an awesome place and besides increasing rents it still remains more affordable than other cities. In terms of economy Berlin attracts new companies and branches/offices as well. At the same time I felt the itch and it was time to leave my home town once again. In the end I settled for the bicycle friendly (and sometimes sunny) city of Amsterdam.

My main interest remains building reliable systems with Smalltalk, C/C++, Qt and learn new technology (Tensorflow? Rust? ElasticSearch, Mongo, UUCP) and talk about GSM (SCCP, SIGTRAN, TCAP, ROS, MAP, Diameter, GTP) or get re-exposed to WebKit/Blink.
If you are in Amsterdam or if you know people or companies I am happy to meet and make new contacts.
C++, Qt and Treefrog to build user facing web applications

C++, Qt and Treefrog to build user facing web applications

In the past I have written about my usage of Tufao and Qt to build REST services. This time I am writing about my experience of using the TreeFrog framework to build a full web application.

You might wonder why one would want to build such a thing in a statically and compiled language instead of something more dynamic. There are a few reasons for it:

  • Performance: The application is intended to run on our sysmoBTS GSM Basestation (TI Davinci DM644x). By modern standards it is a very low-end SoC (ARMv5te instruction set, single core, etc, low amount of RAM) and at the same time still perfectly fine to run a GSM network.
  • Interface: For GSM we have various libraries with a C programming interface and they are easy to consume from C++.
  • Compilation/Distribution: By (cross-)building the application there is  a “single” executable and we don’t have the dependency mess of Ruby.
The second decision was to not use Tufao and search for a framework that has user management and a template/rendering/canvas engine built-in. At the Chaos Computer Camp in 2007 I remember to have heard a conversation of “Qt” for the Web (Wt, C++ Web Toolkit) and this was the first framework I looked at. It seems like a fine project/product but interfacing with Qt seemed like an after thought. I continued to look and ended up finding and trying the TreeFrog framework.
I am really surprised how long this project exists without having heard about it. It is using/built on top of Qt, uses QtSQL for the ORM mapping, QMetaObject for dispatching to controllers and the template engine and resembles Ruby on Rails a lot. It has two template engines, routing of URLs to controllers/slots, one can embed any C++ in the template. The documentation is complete and by using the search on the website I found everything I was searching for my “advanced” topics. Because of my own stupidity I ended up single stepping through the code and a Qt coder should feel right at home.
My favorite features:
  • tspawn model TableName will autogenerate (and update) a C++ model based on the table in the database. The updating is working as well.
  • The application builds a libmodel.so, libhelper.so (I removed that) and libcontroller.so. When using the -r option of the application the application will respawn itself. At first I thought I would not like it but it improves round trip times.
  • C++ in the template. The ERB template is parsed and a C++ class will be generated and the ::toString() method will generate the HTML code. So in case something is going wrong, it is very easy to inspect.
If you are currently using Ruby on Rails, Django but would like to do it with C++, have a look at TreeFrog. I really like it so far.
osmo-pcap capture on the edge and store in the center

osmo-pcap capture on the edge and store in the center

Imagine you run a GSM network and you have multiple systems at the edge of your network that communicate with other systems. For debugging reasons you might want to collect traffic and then look at it to explore an issue or look at it systematically to improve your network, your roaming traffic, etc.

The first approach might be to run tcpdump on each of these systems, run it in a round-robin manner, compress the old traffic and then have a script that downloads/uploads it once a day to a central place. The issue is that each node needs to have enough disk space, you might not feel happy to keep old files on the edge or you just don’t know when is a good time to copy it.

Another approach is to create an aggregation framework. A client will use libpcap to capture the traffic and then redirect it to a central server. The central server will then store the traffic and might rotate based on size or age of the file. Old files can then be compressed and removed.

I created the osmo-pcap tool many years ago and have recently fixed a 64bit PCAP header issue (the timeval in the header is 32bit), collection of jumbo frames and now updated the README.md file of the project and created packages for Debian, Ubuntu, CentOS, OpenSUSE, SLES and I made sure that it can be compiled and use on FreeBSD10 as well.

If you are using or decided not to use this software I would be very happy to hear about it.