Mantz Tech: 2014

Freitag, 31. Oktober 2014

Performance optimizations of the demodulation code of RF Analyzer

When I started implementing AM and FM demodulation for the RF Analyzer, I first built a receiver in GNU Radio Companion and then tried to rebuild it in Java. The basic blocks in the receiver are pretty simple and soon I had working code. But I had to recognize how poorly it performs on an Android device with limited CPU power. I was far from performing demodulation in real time and so I had to re-build many parts and optimize them.

Here's a little post about some of the things I did to optimize the demodulation process and get it running in real time. By the way: The version I am talking about is RF Analyzer 1.07. Available on Google Play (https://play.google.com/store/apps/details?id=com.mantz_it.rfanalyzer) or, if you want to have a look into the source code, on GitHub (https://github.com/demantz/RFAnalyzer).

<UPDATE>
I've got a hint from Michael Ossmann to use single precision floating point variables instead of doubles. This this turned out to be a very significant optimization since it speeds up every operation performed on the signal samples. Version 1.08 contains these changes along with some other, rather minor optimizations.
</UPDATE>

Channel Selection

In order to shift the interesting signal down to base band it has to be multiplied by a complex cosine. This has to be done before any sample rate decimation is possible and therefore this will always run at the highest sample rate in our receive chain.

I was already using a lookup table to convert the signed interleaved IQ bytes from the HackRF to floating point values. So I decided to extend this table to also include the mixed values:

2-dimensional array as lookup table for samples multiplied by a cosine

By using this lookup table I don't have to do any multiplication to mix the signal down to baseband. The table has to be recreated as soon as the user changes the channel frequency:

cosineRealLookupTable = new float[bestLength][256];
cosineImagLookupTable = new float[bestLength][256];
float cosineAtT;
float sineAtT;
for (int t = 0; t < bestLength; t++) {
cosineAtT = (float) Math.cos(2 * Math.PI * mixFrequency * t / (float) sampleRate);
sineAtT = (float) Math.sin(2 * Math.PI * mixFrequency * t / (float) sampleRate);
for (int i = 0; i < 256; i++) {
cosineRealLookupTable[t][i] = (i-128)/128.0f * cosineAtT;
cosineImagLookupTable[t][i] = (i-128)/128.0f * sineAtT;
}
}

This lookup table strategy effectively gets rid of any multiplications needed for downmixing and speeds things up a lot.

Sample rate decimation

The next block is a decimating low pass filter used to slice out the interesting signal and decimate the sample rate. It enables the actual demodulation process to be performed in real-time at a much lower sample rate. However, the decimating low pass will still run at the high sample rate and is therefore our next target for optimization.

I tried many different things to speed up this low pass filter and I ended up with splitting it into a cascade of decimating low pass filters. Each filter decimates the sample rate by two which means only the first filter will run at highest sample rate. Decimation by two enables us to implement the low pass filter as a half-band filter. A half-band filter is a low pass filter with cut-off frequency at fs/4 and the positive characteristic of having every second filter tap equal to zero:

A half-band low pass filter. Graphic from Richard G. Lions "Understanding Signal Processing"

Because every second filter tap is zero we can implement this filter to require only half the multiplications needed for a standard FIR filter. We will also take advantage of the symmetry of the filters taps, effectively reducing the number of multiplications again by factor two!

Finally I implemented the filter in a very un-flexible way by hard-coding the filter tap values into the filter method. This gets rid of some conditional structures and array lookups in the code and makes it very fast at the cost of very ugly, un-flexible code.

By cascading a variable number of these filters we can decimate the sample rate by every integer power of 2. Note that the last filter in the line should always be a regular FIR filter with cut-off frequency < fs/4 to avoid aliasing.

Multithreading

It is obvious that we should take advantage of today's smartphones having multi-core CPUs. That's why we separate each block in our receiving chain into its own thread.

Biggest problem for multi-threading is always synchronization. I chose to use ArrayBlockingQueues to connect the threads together (sorry for the over wide image^^):

Every blue rectangle is a separate thread. The blocking queues help to synchronize them.

It is very important that we reuse the buffers by passing them back to the previous block every time we finished processing them. This will avoid memory allocations at runtime and the garbage collector going crazy.

Conclusion

The above design and implementation choices helped me to get demodulation running in real time. But this is only true if you run it on a device with a decent quad-core CPU. I don't have an old phone to test it on a weak CPU but I'm pretty sure it won't work on a dual- or single-core device.

If you tested the application on your phone, please leave a comment and tell me the device type and your experience.

I will try to further optimize the demodulation chain in order to get it running on older devices too! And of course to save battery! Any tips or hints are very welcome!

Sonntag, 19. Oktober 2014

RF Analyzer - Explore the frequency spectrum with the HackRF on an Android device

Over the last week I've been working on a new project, trying to build a spectrum analyzer for Android that works with my hackrf_android library. Now I finally reached the point were it is stable enough to be useful and I created the GitHub repository today:

https://github.com/demantz/RFAnalyzer

It is still very basic and I have a lot of ideas to extend its functionality, but I thought it's better to have testers involved as early as possible. Eventually it should evolve in something similar to GQRX, supporting different modes and devices. But that will take some time!

<UPDATE>

The new version of RF Analyzer (1.07) has now support for AM/FM demodulation! It is now also available on Google Play:

https://play.google.com/store/apps/details?id=com.mantz_it.rfanalyzer

See the readme on GitHub for a more detailed description!

</UPDATE>

RF Analyzer running on a Nexus 5

In this blog I'm going to show what you can do with the app and in the end I explain how it is working internally for those who like to play with the source code. I also tried to document the code as good as possible, but it is always easier if the basic flow of the program is explained before looking at it.

What you can do with it

Right now there aren't many fancy features. The app will present you with a simple UI showing the frequency spectrum including a waterfall plot. Here is a list of what you can do right now with version 1.00:

Browse the spectrum by scrolling horizontally
Zoom in and out, both horizontally and vertically
Adjust the sample rate and center frequency to match the current view of the screen by double tapping
Auto scale the vertical axis
Jump directly to a frequency
Adjust the gain settings of the HackRF
Select a pre-recorded file as source instead of a real HackRF
Change the FFT size
Setting the frame rate either to a fixed value or to automatic control
Activate logging and showing the log file

I'm planning to also support the rtl-sdr in the future and of course I want to include the actual demodulation for common analog modes like AM, FM, SSB, ... But so far you can only browse the spectrum. Here is how you get it to work:

Plug the HackRF into your Android device using an OTG (on-the-go) cable. You can get those cables for around 3$ and you can also find them as Y-version which enables external powering the HackRF, for those phones/tablets that don't deliver enough power. After you start RF Analyzer you can hit the start button in the action bar and it should prompt you for the permission to access the USB device. Once you did that the FFT will start:

FFT at 20 Msps showing FLEX pagers at 931 MHz

Use common gestures to zoom and scroll both vertically and horizontally. Note that the vertical axis of the FFT plot also affects the colors of the waterfall plot:

Zoomed in (both vertical and horizontal) view

If you scroll outside the current range of the FFT or if you zoom so that the resolution of the FFT is too low you can simply double tap the screen. RF Analyzer will re-tune the HackRF to the frequency currently centered on the screen and also ajust the sample rate so that the FFT covers exactly the frequency range that is currently visible:

The resolution of the FFT is too low when zoomed in too closely. And we scrolled to far right that we can see the end of the FFT on the right site

After double tapping the HackRF is tuned to 931,61 MHz (note the DC offset peak!) and the sample rate is now adjusted to about 2.5 Msps so that we see the full FFT resolution again

You can also use the autoscale button in the action bar to adjust the vertical scale so that it ranges from the minimum to the maximum of the currently visible values of the FFT:

If you want to jump to a certain frequency, use the 'set frequency' button and it will prompt you to enter a new frequency:

The gain settings of the HackRF (both VGA and LNA gain) can be accessed through the 'set gain' button in the overflow menu:

In the settings activity you can:

Select other source types (currently only HackRF or file source)
Set the FFT size
Set the screen orientation (auto / landscape / portrait)
Turn autostart on and off (so that you don't have to hit the start button every time)
Set the frame rate to auto or a fixed value (useful if you want a linear time axis in the waterfall plot)
Deactivate vertical zoom and scrolling (so that you don't accidentally alter the vertical scale while scrolling through frequencies)
Turn on logging and set the location of the log file.
Show the log file

Settings Activity of RF Analyzer on a Nexus 7

Implementing the file source was helpful for debugging the application. It is also a way to test the app if you don't have an OTG cable or your phone/tablet doesn't output enough power for the HackRF. Selecting the file source type will allow you to use RF Analyzer with recorded samples from hackrf_transfer or Test_HackRF. I've uploaded a short capture of some FLEX pager signals for testing: FLEX Pager at 931MHz (2Msps)

How it works

For those who want to play with the sources of RF Analyzer (GPLv2) I want to quickly explain the internal structure of the app:

(Uncomplete) class diagram of RF Analyzer. Underlined classes are running in seperate threads. Gray elements are external modules.

To support different devices I defined a common interface that is implemented by all classes which represent sources of IQ samples. The Scheduler will continuously read samples from the source to prevent the receive buffers of the device to fill up. It forwards samples in packets of the size of the FFT to the AnalyzerProcessingLoop by inserting them in a queue. If the queue is full, the samples are thrown away in order to not block the input device. The AnalyzerProcessingLoop also runs in a separate thread and reads the sample packets from the queue, processes them with the help of the FFT class and then calls draw() on the AnalyzerSurface. This method draws the given FFT samples on a SurfaceView and also draws a new line of the waterfall plot as well as the horizontal and vertical axis.

For a more detailed impression of how the app works, have a look into the sources on GitHub. I tried my best to add helpful comments to understand the flow of the program.

If you have any questions, comments or any other input, don't hesitate to leave a comment or contact me directly on Twitter: @dennismantz

Have fun testing it! ;)

Here is the video were I demonstrate the old version of RF Analyzer:

Dienstag, 7. Oktober 2014

hackrf_android - Using the HackRF with a Android device

Since I received my HackRF last month I wanted to use it with an Android device. I own a Nexus 7 and a Nexus 5 and both are able to act as USB host. I also have an OTG cable to connect other USB devices to them. So I started to port Michael Ossmann's libhackrf library to Android and now I want to present the first alpha version. There is still a lot to do and to implement, but at least it's possible to get some samples out of the air and into the phone/tablet ;)

Receiving at 15 Msps on a Nexus 5

OTG cable (available for ~ 3$)

JNI vs. Pure Java

So the hackrf_android library is entirely written in Java. I thought about using Java Native Interface (JNI) to just reuse the original code from hackrf.c without modifications, but I decided not to do so. The advantage of a pure Java library is, that it is very easy to use (no need to care about NDK and JNI stuff). However, using the JNI approach would have advantages too, e.g. the hackrf.c code is completely tested and it fully implements all features. So maybe in the future I will give it a try.

The hackrf_android library

You can find the library on github: https://github.com/demantz/hackrf_android
The repository contains the library sources and an example application. Binaries of both are also in the rep if you don't want to build them on your own.

Basically the library consists only of the main class: Hackrf. In addition there is an HackrfUsbException class and a HackrfCallbackInterface which both don't contain much code. The Hackrf class has a static method called initHackrf(). This method will try to enumerate and open the HackRF on the USB port. This will require the user to hit OK when he is asked for the permission to do so. Therefore initHackrf() works asynchronously. It takes an instance of HackrfCallbackLibrary as argument and will call the onHackrfReady() method of this instance as soon as the device is ready to use. This call contains an instance of the Hackrf class which is then used for all operations on the device.

There are methods to set and get parameters of the device. Right now it is possible to:

Read Board ID from HackRF
Read Version from HackRF
Read Part ID and Serial Number from HackRF
Set Sample Rate of HackRF
Set Frequency of HackRF
Set Baseband Filter Width of HackRF
Compute Baseband Filter Width for given Sample Rate
Set VGA Gain (Rx/Tx) of HackRF
Set LNA Gain of HackRF
Set Amplifier of HackRF
Set Antenna Port Power of HackRF
Set Transceiver Mode of HackRF

Receiving Samples

The Hackrf class will put the samples into an ArrayBlockingQueue as they arrive over the USB connection. A reference to this queue will be returned to the application when it calls startRx(). The application can then grab the packets from the queue. Each queue element is a byte array containing the samples. The packets (arrays) all have a fixed length, which can be determined by calling getPacketSize(). These packets are the raw packets that are received through the USB connection. Make sure you pass packets you don't need anymore back to the buffer pool of the hackrf library. Otherwise the performance will drop because of the many memory allocations and garbage collection runs. Inside the packets the samples are stored as signed 8-bit values in pairs of two (first the quadrature sample then the in-phase sample - took me some time to find that out^^). Right now I set the packet size to 16KB. This might not be ideal. If it is to low (I started with 512B) the samples aren't getting fast enough from the HackRF to the Android device. If it is to high (I tried 256KB - same as in hackrf.c) then most of the bytes in the packets will be zero. Feel free to experiment with this, all you have to do is change the packetSize attribute in Hackrf.java.

Note that the size of the queue (measured in bytes!) is defined by the application as an argument to initHackrf(). Make sure you choose a large enough queue to buffer the samples between the hackrf library and your application. If the application doesn't pull the samples fast enough out of the queue, it will ran full pretty quickly and the hackrf will stop receiving. What I noticed on the Nexus 7: When writing the samples to a file (using a BufferedOutputStream), you can't set the sample rate to values higher than 2Msps. The write procedure to file will be too slow to get the samples fast enough out of the queue. Can anyone confirm this? It seems strange to me and I don't have this problem with my Nexus 5 (15Msps to a file - works like a charm!).

How to use it - a quick example

The example application I want to show you is also in the git repository. I used Eclipse with the ADT plugin to write it. If you want to do so too, just create two new Android Projects from existing sources and choose the root directory of the repository for the library and the example directory for the example App. Make sure that the example App is either linked against the library project (this should be the default case) or against the hackrf_android.jar file.

The example App shows the user this screen:

After opening the Hackrf the user can show some information about the device (this is equivalent to run the hackrf_info command from the hackrf tools) and he can start receiving. Right now the only parameters that can be adjusted through the GUI is sample rate and frequency. I will extend this as soon as I have time for it. The received samples will be written to the external memory (/storage/sdcard0/Test_HackRF/hackrf_receive.iq). You can view a nice fft of this file by copying it on a linux machine and running baudline with the following settings (Don't forget to adjust the samplerate and baseband frequency):

cat hackrf_receive.io | baudline -stdin -quadrature -channels 2 -flipcomplex -format s8 -overlap 100 -memory 512 -fftsize 4096 -record -basefrequency 97000000 -samplerate 2000000

Baudline showing the fft of the recorded samples

Now I just want to point out the basic code snippets which are showing how to use the library in your App. Try to compare them to the example App in the repository, but notice, that I stripped off most of the code that is not related to using hackrf_android. So here is a minimal working example of how to receive samples in an App:

public class MainActivity extends Activity implements Runnable,
HackrfCallbackInterface {

private boolean stopRequested = false;

// Called when the button "Open HackRF" is clicked
public void openHackrf(View view) {
// at 2 Msps we will buffer for 1 second
int queueSize = 2000000 * 2; // each sample is 2 bytes

if (!Hackrf.initHackrf(view.getContext(), this, queueSize))
{
System.out.println("No HackRF could be found!\n");
}
}

// Called when the button "RX" is clicked
public void rx(View view) {
stopRequested = false;
// Run RX in separate thread to keep GUI responsive:
new Thread(this).start();
}

// Called by hackrf_android library when device was opened successfully
public void onHackrfReady(Hackrf hackrf) {
System.out.println("HackRF is ready!\n");
this.hackrf = hackrf;
}

// Called by hackrf_android library when device could not be opened
public void onHackrfError(String message) {
System.out.println("Error while opening HackRF: "+message+"\n");
}

// Runs in a separate thread
public void run() {
try {
System.out.print("Setting Sample Rate to 2000000 Sps...");
hackrf.setSampleRate(2000000, 1);
System.out.print("ok.\nSetting Frequency to 97000000 Hz...");
hackrf.setFrequency(97000000);
int bbFilter = Hackrf.computeBasebandFilterBandwidth(
(int)(0.75*2000000));
System.out.print("ok.\nSet BB Filter to "+bbFilter+" Hz...");
hackrf.setBasebandFilterBandwidth(basebandFilterWidth);
System.out.print("ok.\nSetting RX VGA Gain to 20...");
hackrf.setRxVGAGain(20);
System.out.print("ok.\nSetting LNA Gain to 8...");
hackrf.setRxLNAGain(8);
System.out.print("ok.\nSetting Amplifier to 'off'...");
hackrf.setAmp(false);
System.out.print("ok.\nSetting Antenna Power to 'off'...");
hackrf.setAntennaPower(false);
System.out.print("ok.\n\n");

File file = new File(Environment.
getExternalStorageDirectory(), "hackrf_receive.io");
BufferedOutputStream bufferedOutputStream =
new BufferedOutputStream(new FileOutputStream(file));

// This starts receiving:
ArrayBlockingQueue<byte[]> queue = hackrf.startRX();

while(!this.stopRequested)
{
// Grab a packet from the queue:
byte[] packet = queue.poll(1000, TimeUnit.MILLISECONDS);
bufferedOutputStream.write(packet); // write it to file
hackrf.returnBufferToBufferPool(receivedBytes);
}
bufferedOutputStream.close();
System.out.print( String.format("Finished! (Avg Rate: " +
"%4.1f MB/s)\n", hackrf.getAverageTransceiveRate()/1000000.0));
} catch (HackrfUsbException e) {
System.out.print("error (USB communication)!\n");
} catch (IOException e) {
System.out.print("error (File IO)!\n");
} catch (InterruptedException e) {
System.out.print("error (Queue Interrupt)!\n");
}
}

Porting the hackrf library was fun and easier than I thought. I also learned a lot about USB :) At this point I want to say thank you to Michael Ossmann for creating such a great open source SDR platform. I hope hackrf_android helps to make it even more useful / mobile than it already is!

Let me know if you have issues with the library or if you successfully used it in your own App! Doing DSP on Android devices won't be easy without something like GNU Radio, but I'm curious what you guys will implement ;)

UPDATE:
I made a short video presenting the example application:

Montag, 29. September 2014

Receiving FLEX Pager with the HackRF and GNU Radio 3.7

This weekend I was browsing through the RF spectrum with my HackRF and found some pretty strong FSK signals:

FLEX Pager in GQRX

They showed up frequently and on various channels. My first thought was: POCSAG pagers. It turned out that I was wrong, not POCSAG but FLEX pagers. After some more research it seems like TELUS is using FLEX pagers here on Vancouver Island (http://www.nettwerked.net/FLEX_Frequencies.txt).

I also found an example GNU Radio script from Parker Thompson (https://github.com/mothran/flex_hackrf). He is pointing out that it is a modification of the original script from Johnathan Corgan, who wrote the GNU Radio blocks for FLEX. Unfortunately it was incompatible with GNU Radio 3.7, so I had to change some pieces.

Here can you find my modified version:
https://github.com/demantz/flex_hackrf

Except for some scrambled messages it works like a charm ;) See for yourself:

running the flex.py script to receive pager messages. Somebody spilled urine ^^

As you can see, there are also lots of bit errors... I might have to work on the tuning. Also the error correction mechanism isn't implemented in the GNU Radio FLEX blocks yet.

But nevertheless, I had some fun ;) Hope someone finds this useful. Feel free to leave a comment!

Samstag, 20. September 2014

Airprobe with GNU Radio 3.7

I'm very excited right now.. I ordered a HackRF and can't wait for it to be delivered to me now.

Since I heard of the HackRF project from Michael Ossmann (http://greatscottgadgets.com/hackrf/) I knew at some day I will buy one. I've started my way to SDR last year by buying a RTL-SDR stick and also did a little project with an USRP1, which I borrowed from my university.

So now I'm trying to set up my GNU Radio environment again and prepare it for the HackRF. And by doing so I've stumbled across a little problem:

Airprobe (a software to decode GSM) wouldn't compile with the new GNU Radio version 3.7+. The problem is that GNU Radio changed the API with the 3.7 version and therefore breaking the compatibility with airprobe. Fortunately, I found out somebody has already patched airprobe to compile and run (didn't do extensive testing, since my HackRF has not arrived yet) with GNU Radio 3.7. Nevertheless, there where some difficulties and therefore I wrote this post the next day. I hope I remembered every step I did. Please write me a comment if you find mistakes or if you have problems in following the steps...

Installing GNU Radio 3.7

GNU Radio 3.7 comes with PyBombs (which is awesome). That means we don't need the build-gnuradio script anymore (you can still use it though). With PyBombs you do it like this:
(here is the detailed tutorial: http://gnuradio.org/redmine/projects/pybombs/wiki)

$ cd /opt

$ sudo mkdir pybombs target

$ sudo chown dennis:dennis pybombs target

$ git clone https://github.com/pybombs/pybombs.git
$ cd pybombs
$ ./pybombs install gnuradio
$ /opt/target/setup_env.sh

Now it will ask you some questions (e.g. which install prefix to use; I use /opt/target) and then it will start installing all dependencies (first by looking for .deb packets; only if no packets where found it uses the sources) and finally download and compile GNU Radio 3.7.

Note that I don't have to run the installation as root, since the two directories '/opt/pybombs' and '/opt/target' are belonging to my user. GNU Radio will install in /opt/target and not under /usr/local.

That is also the reason for the setup_env.sh script. It sets the environment variables correctly. You will have to run this script every time you restart your machine and want to use gnu radio.

That was easy. On my system (Ubuntu 14.04) this worked without any problems (it took some hours though^^). But note that my system wasn't a 'fresh' Ubuntu, but one with all kinds of stuff already installed on it. So you might run in some errors I didn't had. Just write a commend if you stuck at this point...

By the way:

PyBombs can be used to install all kinds of stuff, just run

/opt/pybombs$ ./app_store.py

to have a look what other modules can be installed. Some of them might not work though...

For example airprobe -.-

So we have to do that the old fashion way.

Installing libosmocore

Airprobe depends on libosmocore, so we have to install that first:

$ cd /opt/pybombs/src
$ git clone git://git.osmocom.org/libosmocore.git
$ cd libosmocore/
$ ./configure --prefix=/opt/target
$ make
$ make install
$ sudo ldconfig

Installing Airprobe

When I first tried to install airprobe, I did it via the app_store. What this does is just downloading the
sources from git://svn.berlin.ccc.de/airprobe to /opt/pybombs/src/ and that's it. Unfortunately I found out, that the patch I found online, doesn't match with this version of airprobe. So if you also tried it this way, delete the airprobe directory in /opt/pybombs/src. We'll use another repository.

First we download the sources:

$ cd /opt/pybombs/src
$ git clone git://git.gnumonks.org/airprobe.git
$ cd airprobe

Now we download and apply the patch from zmiana. You can find the patch on github at this link: https://github.com/scateu/airprobe-3.7-hackrf-patch. It is called zmiana.patch. A howto is also provided at the page, but you can also read on here.. Btw a big thanks to zmiana for doing all the work for us!

/opt/pybombs/src/airprobe$ patch -p1 < zmiana.patch
/opt/pybombs/src/airprobe$ cd gsmdecode

/opt/pybombs/src/airprobe/gsmdecode$ ./bootstrap

/opt/pybombs/src/airprobe/gsmdecode$ ./configure --prefix=/opt/target

/opt/pybombs/src/airprobe/gsmdecode$ make

/opt/pybombs/src/airprobe/gsmdecode$ cd ../gsm-receiver

/opt/pybombs/src/airprobe/gsm-receiver$ ./bootstrap

/opt/pybombs/src/airprobe/gsm-receiver$ ./configure --prefix=/opt/target

/opt/pybombs/src/airprobe/gsm-receiver$ make

Now we should be able to do a quick test. Download this capture file: cfile

Also start a instance of wireshark and start listening on the loopback interface. Then we start decoding the cfile:

$ cd src/python

$ ./go.sh ~/Downloads/capture_941.8M_112.cfile

The result should be decoded packets flushing down the terminal and you should also be able to see them in your wireshark trace.

That's it. I didn't test anything else since I don't have my HackRF yet. However, note that on https://github.com/scateu/airprobe-3.7-hackrf-patch there is also a python program called gsm_receive_hackrf_3.7.py that hopefully enables GSM capturing with the HackRF. Somebody out there who can confirm that?

Have fun and leave a comment!

Samstag, 10. Mai 2014

Suspend on low battery in Ubuntu GNOME 14.04

I run 14.04 since it came out in April 2014. And I love it. They improved the touch input which is awesome on my Thinkpad x220 Tablet/Convertable. There was only a little thing that drove me nuts:

The suspand action on critical battery seemed to stop working in this Ubuntu version. Instead my machine just shut down every time the battery was critical low.. without asking. That's horrible when you have documents open. Today I finally found the solution: pm-utils was missing.

But let me start from the beginning..

Beside the symptom that the laptop shuts down instead of suspending, the following things seemed a bit weird:

upower tells me that suspend is not possible:

$ upower -d

Device: /org/freedesktop/UPower/devices/line_power_AC

... < I stripped of the uninteresting outputs. Here is what matters: >

Daemon:

daemon-version: 0.9.23

can-suspend: no

can-hibernate: no

on-battery: yes

on-low-battery: no

lid-is-closed: no

lid-is-present: yes

is-docked: no

Don't get me wrong: Suspending is working when I just click on the suspend button. So I tried it via a dbus command:

$ dbus-send --system --print-reply --dest="org.freedesktop.UPower" /org/freedesktop/UPower org.freedesktop.UPower.Suspend

it outputs:

Error org.freedesktop.UPower.GeneralError: No kernel support

Now thats really weird. Than I found this on the internet: https://bbs.archlinux.org/viewtopic.php?id=147272&p=7

There they guess that upower still uses 'pm-is-supported' from the pm-utils package. Even though the upower package doesn't have this package as a dependency. So I installed it:

$ sudo apt-get install pm-utils

$ sudo reboot

And voilà: upower is saying that I'm now able to suspend and its working as critical low battery action as well. I hope this will help somebody out there who has the same problem ;)

Montag, 7. April 2014

Measuring the Latency of a Linux Bridge

As you can see I am working at the moment with a Linux Bridge and try to figure out how good it does its job ;) Therefore I had to measure the latency (or simply the time) a packet needs to go from one port of the bridge (IN) to the other (OUT). My test setup looks like this:

To measure the latency I just have to send a single packet from my test machine's first Ethernet interface and wait until it arrives at the other one. So I started two instances of tcpdump to listen on both interfaces and compared the timestamps when they catch the packet. It's pretty straight forward:

First timestamp (eth0): 1396835791.679071s
Second timestamp (eth1): 1396835791.679389s
Latency is 1396835791.679389s - 1396835791.679071s = 318 microseconds

Notice that you have to start tcpdump with the option "-tt" to get this format of timestamps. If you just want the time the packet spends in the bridging device, you just have to do this measurement a second time but without the device (eth0 and eth1 short-circuit). Then you can estimate the time spend in the device by again subtracting your 2 measurements.

To get meaningful results you definitely want to do this procedure more than once and calculate the average. And then you want to vary the size of the packet, the protocol and so on to see how it's effecting the result. So at this point I decided to automate the whole procedure with a shell script. This has the big advantage that you can do this measurement again in the exact same way after doing some optimization on the bridge.

As always I started with a simple script but after tweaking this and that it got bigger and way more complex xD Nevertheless I want to share it, so maybe someone will find it useful too. Tools needed are of course tcpdump to capture the packets, as well as hping3 and arping to generate the packets.

I split the script into two: One that does exactly one measurement and another that calls this script in a loop to get the average values.

measureLatency.sh:

#!/bin/bash
#
# Simple script to send an arbitrary packet via an interface
# and measure how long it takes til it arrives on another
# interface (so both interfaces have to be connected to the same
# subnet. Of course you want to put a bridge or something
# similar in between to test its performance!
#
# Dennis Mantz <dennis.mantz@googlemail.com> - 2014 Lantzville

# function to translate the timestamp into microseconds:
function tstampToMicro()
{
# Don't forget to remove leading zeros or you do octa ^^
seconds=`echo $1 | cut -d "." -f 1 | sed 's/^0*//'`
microseconds=`echo $1 | cut -d "." -f 2 | sed 's/^0*//'`

echo $(($seconds * 1000000 + $microseconds))
}

# Function to show the usage:
function show_usage()
{
echo "Usage: $cmdname -s <src iface> -d <dst iface> [-m <arp|ip|icmp|udp|tcp>] [-p <size>] [-v] [-h]"
exit -1
}

# Set default values:
cmdname=$0
ifs=
ifd=
size=0
mode="tcp"
verbose=
arpingVerbose="-q"
hping3Verbose="-q"

# Parse cmd line args:
while getopts "s:d:m:p:vhc:" opt; do
case "$opt" in
s)
ifs=$OPTARG
;;
d)
ifd=$OPTARG
;;
m)
mode=$OPTARG
;;
v)
verbose="-v"
arpingVerbose=
hping3Verbose="-V"
;;
h)
show_usage
;;
p)
size=$OPTARG
;;
c)
# This is just for compatibility with the averageLatency script. Do nothing with it...
;;
esac
done

# Check args:
if [ -z "$ifs" -o -z "$ifd" ]
then
show_usage
fi

# Check root
if [ `whoami` != "root" ]
then
echo "Must be root!"
exit -1
fi

# Parse mode
case $mode in
arp)
filter='arp'
;;
ip)
filter='ip'
;;
icmp)
filter='icmp'
;;
tcp)
filter='tcp port 5000'
;;
udp)
filter='udp port 5000'
;;
*)
echo "$mode is not allowed for mode"
show_usage
;;
esac

# Start tcpdump two times:
(tcpdump -tt -i $ifs -c 1 $filter 2> /tmp/measureLatencyTcpdump1.err > /tmp/measureLatencyCap1.tmp)&
pid1=$!
(tcpdump -tt -i $ifd -c 1 $filter 2> /tmp/measureLatencyTcpdump2.err > /tmp/measureLatencyCap2.tmp)&
pid2=$!

# wait for them to get ready:
sleep 3

# Send a packet on the first interface:
case $mode in
arp)
arping -I $ifs -c 1 111.111.111.111 -s 127.0.0.1 $arpingVerbose
;;
ip)
ifconfig $ifs 111.111.111.1 netmask 255.255.255.0 broadcast 111.111.111.254 # add a route
arp -i $ifs -s 111.111.111.111 00:01:11:11:11:11 $verbose # dummy arp entry..
hping3 --rawip 111.111.111.111 -H 99 -c 1 -d $size $hping3Verbose > /tmp/measureLatencyHPing3.out 2> /tmp/measureLatencyHPing3.err
;;
icmp)
ifconfig $ifs 111.111.111.1 netmask 255.255.255.0 broadcast 111.111.111.254 # add a route
arp -i $ifs -s 111.111.111.111 00:01:11:11:11:11 $verbose # dummy arp entry..
hping3 --icmp 111.111.111.111 -c 1 -d $size $hping3Verbose > /tmp/measureLatencyHPing3.out 2> /tmp/measureLatencyHPing3.err
;;
tcp)
ifconfig $ifs 111.111.111.1 netmask 255.255.255.0 broadcast 111.111.111.254 # add a route
arp -i $ifs -s 111.111.111.111 00:01:11:11:11:11 $verbose # dummy arp entry..
hping3 -s 5000 111.111.111.111 -c 1 -d $size $hping3Verbose > /tmp/measureLatencyHPing3.out 2> /tmp/measureLatencyHPing3.err
;;
udp)
ifconfig $ifs 111.111.111.1 netmask 255.255.255.0 broadcast 111.111.111.254 # add a route
arp -i $ifs -s 111.111.111.111 00:01:11:11:11:11 $verbose # dummy arp entry..
hping3 --udp -s 5000 111.111.111.111 -c 1 -d $size $hping3Verbose > /tmp/measureLatencyHPing3.out 2> /tmp/measureLatencyHPing3.err
;;
esac

# wait for both tcpdumps to finish:
(sleep 10; kill $pid1 2> /dev/null; kill $pid2 2> /dev/null; echo "Didn't receive the packet!")&
killerpid=$!
wait $pid1
wait $pid2
kill $killerpid 2> /dev/null
wait $killerpid 2> /dev/null

# get the timestamps:
time1=`cat /tmp/measureLatencyCap1.tmp | sed 's/$[0-9]*\.[0-9]*$$.*$/\1/'`
time2=`cat /tmp/measureLatencyCap2.tmp | sed 's/$[0-9]*\.[0-9]*$$.*$/\1/'`

if [ "a$verbose" = "a-v" ]
then
echo "time1 = $time1"
echo "time2 = $time2"
cat /tmp/measureLatency*
fi

if [ -z "$time1" -o -z "$time2" ]
then
echo "Timestamps aren't set. exiting..."
exit -1
fi

# Check if it's the same packet!
pack1=`cat /tmp/measureLatencyCap1.tmp | sed 's/$[0-9]*\.[0-9]*$$.*$/\2/'`
pack2=`cat /tmp/measureLatencyCap2.tmp | sed 's/$[0-9]*\.[0-9]*$$.*$/\2/'`
if [ "${pack1:0:60}" != "${pack2:0:60}" ]
then
echo "Seems like we've captured two different packets. Is the link free?"
exit -1
fi

# calculate the difference:
diff=$((`tstampToMicro $time2` - `tstampToMicro $time1`))
echo "latency is $diff microseconds"

# clean up:
rm /tmp/measureLatency*

avgLatency.sh:

#!/bin/bash
# Simple script to send an arbitrary packet via an interface
# and measure how long it takes in AVERAGE til it arrives on another
# interface (so both interfaces have to be connected to the same
# subnet. Of course you want to put a bridge or something
# similar in between to test its performance!
#
# Dennis Mantz <dennis.mantz@googlemail.com> - 2014 Lantzville

# Function to calculate the average:

function calc_avg() {
avg=$(($sum / $((i-1))))
range=$(($max - $min))
echo "Average Latency is $avg microseconds!"
echo "Min: $min Max: $max Range (max-min): $range"
}

# Signalhandler for SIGINT calls calc and exits
trap "echo 'signal causing interrupt..'; calc_avg; exit" SIGHUP SIGINT SIGTERM

# Set Vars:
sum=0
min=10000
max=0
cmdline=$*
n=5

# Parse cmd line args:
while getopts "s:d:m:p:vhc:" opt; do
case "$opt" in
c)
n=$OPTARG
;;
*)
# All other args are passed with the $cmdline to measureLatency. Do nothing with them..
;;
esac
done

# Run measureLatency
for i in `seq $n`
do
echo -n "Round $i: "
latency=`./measureLatency.sh $cmdline | grep "latency" | cut -d " " -f 3`
echo "$latency us"
sum=$(($sum + $latency))
if [ $latency -lt $min ]
then
min=$latency
fi
if [ $latency -gt $max ]
then
max=$latency
fi
done

i=$((i+1))
calc_avg

As I said, they are a bit more complex than I wanted them to be. After I finished my measurements I also thought about optimizing this whole procedure with a c program using raw sockets. Because as you will discover by yourself: tcpdump is so damn slow when you tell it to just capture one packet and exit. A self written program would be much faster and 'nicer' :) But unfortunately until now I didn't had time to start working on it and so it goes straight do my ToDo list for the future ;)

Freitag, 28. März 2014

Profiling kernel modules with oprofile

== Description ==

oprofile is a system-wide profiler which can be used to profile any user-space application as well as the kernel itself and its modules. It has the capability to produce annotated source from binaries which are compiled with debug symbols (gcc -g).

In this blog I want to describe how to set up oprofile to profile the linux ethernet bridge module.

== Used System ==

I used an Ubuntu 12.04 LTS, but the steps should be the same on other systems. In order to setup a bridge you'll need to physical network interfaces. I used the laptop's build in nic and a usb-to-nic adapter. The build in nic (eth0) is connected to the network and the usb-to-nic adapter is connected to a second laptop which connects to the network via the bridge.

== Preparations ==

In order to profile the bridge module we need to recompile the kernel with debug symbols and profiling support. We also need to compile and install oprofile itself if it's not already installed.

== Recompile the Kernel ==

Download the kernel sources for your kernel version (or a newer version if you like).
I used git (if git is not yet installed on your system use: $ sudo apt-get install git)

$ git clone git://kernel.ubuntu.com/ubuntu/ubuntu-precise.git
$ cd ubuntu-precise
$ git tag -l

With the last command we printed out a list of tags we can choose from. I chose the one which is closest to my running kernel to keep complications to a minimum:

$ git checkout -b mybranch Ubuntu-lts-3.8.0-34.49_precise1

To compile the kernel we need additional packages (I may have forgotten some)

$ sudo apt-get install build-essential binutils libncurses5-dev

Then we copy the config of our running kernel into the source root:

$ cp /boot/config-`uname -r` .config

Now we can activate the config options we need for profiling to work:

$ make menuconfig

-> Set the following options:
General setup -> Profiling support = y
General setup -> OProfile system profiling = y
Kernel hacking -> Strip assembler-generated symbols during link = n
Kernel hacking -> Compile the kernel with debug info = y
Networking support -> Networking options -> 802.1d Ethernet Bridging = m

Exit and save changes. Then we build and install the new Kernel (this will take a while).
If you have a cpu with multiple cores, you can specify -j 1+<number of cores> to speed the build up. I have a quad-core cpu, so I'll build with -j 5:

$ make -j 5
$ sudo make deb-pkg
$ cd ..
$ sudo dpkg -i linux-image-3.8.13.12_3.8.13.12-2_amd64.deb
$ sudo dpkg -i linux-headers-3.8.13.12_3.8.13.12-2_amd64.deb
$ sudo dpkg -i linux-firmware-image_3.8.13.12-2_amd64.deb
$ sudo dpkg -i linux-libc-dev_3.8.13.12-2_amd64.deb

Finally we can reboot our system and start the new kernel!

== Install and setup the Bridge ==

First we need to install the bridge utilities:

$ sudo apt-get install bridge-utils

Then we add a new logical bridge interface (br0) and add the two physical interfaces (eth0 and eth1) to it. By the way: I strongly recommend to deactivate the network manager before setting up the bridge!

$ sudo ip addr flush eth0
$ sudo ip addr flush eth1
$ sudo brctl addbr br0
$ sudo brctl addif br0 eth0 eth1
$ sudo ip link set dev br0 up

Now the bridge should be up and running. If you have a dhcp-server in your network, you can run dhclient on the bridge-interface to assign it an IP address (otherwise do a static IP configuration on the bridge):

$ sudo dhclient br0

== Install oprofile and start the profiling ==

To install oprofile we will download the sources of the newest version from the website (at the time this howto was written this was version 0.9.9) and compile them. Before we do so, we have install some more dependencies.

$ sudo apt-get install libpopt-dev binutils-dev
$ wget http://prdownloads.sourceforge.net/oprofile/oprofile-0.9.9.tar.gz
$ tar -xvf oprofile-0.9.9.tar.gz
$ cd oprofile-0.9.9
$ ./configure
$ make
$ make install

Now we're finally ready to start the profiling:

$ sudo opcontrol --init
$ sudo opcontrol --start --vmlinux=/home/user/ubuntu-precise/vmlinux

If the watchdog service is using the NMI on the machine, oprofile will exit with an error and tell you to deactivate watchdog.
If this happens do the following:

$ sudo opcontrol --deinit
$ su root
$ echo 0 > /proc/sys/kernel/nmi_watchdog
$ exit

Then do the init and start commands again. The profiling is now running in the background. In order to output the results you should run these two commands:

$ sudo opcontrol --dump # This dumps all collected profiling data to the hard drive.
$ opreport # This generates a profiling report of the whole system.

Running the opreport command with -l will generate a more detailed report with all symbols separated. To get only the information about the binary you care about, you have to specify the name of the binary:

$ opcontrol -l /usr/bin/firefox

or in case of a kernel module (you have to give the path to the kernel modules):

$ opcontrol -l --image-path=/lib/modules/`uname -r`/kernel /lib/modules/`uname -r`/kernel/net/bridge/bridge.ko

There is also the possibility to generate annotated source code with the opannotate command:

$ opannotate --image-path=/lib/modules/`uname -r`/kernel --output-dir=~/profiling-output

After this completes the profiling-output directory contains all source files with annotations.

== Examples and Tips for getting started ==

OProfile could be configured in many different ways, to profile exactly the things you really want to see.
The next view lines generate some interesting and basic outputs.

Callgraph
To generate a callgraph, run opcontrol and opreport with the --callgraph option:

$ sudo operf --start --callgraph --vmlinux /home/dxm02271/ubuntu-precise/vmlinux
$ opreport --callgraph /lib/modules/3.8.13.12/kernel/net/bridge/bridge.ko --merge all --image-path=/lib/modules/3.8.13.12/kernel/

This will give you a callgraph that looks like this:

---------------------------------------------------------------------
90 100.000 bridge.ko br_handle_frame
90 14.2631 bridge.ko br_handle_frame
3680 95.3121 vmlinux nf_hook_slow
90 2.3310 bridge.ko br_handle_frame
90 2.3310 bridge.ko br_handle_frame [self]
1 0.0259 vmlinux nf_iterate
---------------------------------------------------------------------
....

Notice there's one line, that isn't intended. That's the function which is in focus. All lines above are functions calling it and all lines beneath are functions getting called by it.

I hope this was interesting for you. Please leave me a comment if ;)