Freitag, 31. Oktober 2014

Performance optimizations of the demodulation code of RF Analyzer

When I started implementing AM and FM demodulation for the RF Analyzer, I first built a receiver in GNU Radio Companion and then tried to rebuild it in Java. The basic blocks in the receiver are pretty simple and soon I had working code. But I had to recognize how poorly it performs on an Android device with limited CPU power. I was far from performing demodulation in real time and so I had to re-build many parts and optimize them.

Here's a little post about some of the things I did to optimize the demodulation process and get it running in real time. By the way: The version I am talking about is RF Analyzer 1.07. Available on Google Play (https://play.google.com/store/apps/details?id=com.mantz_it.rfanalyzer) or, if you want to have a look into the source code, on GitHub (https://github.com/demantz/RFAnalyzer).


<UPDATE>
I've got a hint from Michael Ossmann to use single precision floating point variables instead of doubles. This this turned out to be a very significant optimization since it speeds up every operation performed on the signal samples. Version 1.08 contains these changes along with some other, rather minor optimizations.
</UPDATE>

Channel Selection

In order to shift the interesting signal down to base band it has to be multiplied by a complex cosine. This has to be done before any sample rate decimation is possible and therefore this will always run at the highest sample rate in our receive chain.

I was already using a lookup table to convert the signed interleaved IQ bytes from the HackRF to floating point values. So I decided to extend this table to also include the mixed values:

2-dimensional array as lookup table for samples multiplied by a cosine

By using this lookup table I don't have to do any multiplication to mix the signal down to baseband. The table has to be recreated as soon as the user changes the channel frequency:

cosineRealLookupTable = new float[bestLength][256];
cosineImagLookupTable = new float[bestLength][256];
float cosineAtT;
float sineAtT;
for (int t = 0; t < bestLength; t++) {
  cosineAtT = (float) Math.cos(2 * Math.PI * mixFrequency * t / (float) sampleRate);
  sineAtT = (float) Math.sin(2 * Math.PI * mixFrequency * t / (float) sampleRate);
  for (int i = 0; i < 256; i++) {
  cosineRealLookupTable[t][i] = (i-128)/128.0f * cosineAtT;
  cosineImagLookupTable[t][i] = (i-128)/128.0f * sineAtT;
  }
}

This lookup table strategy effectively gets rid of any multiplications needed for downmixing and speeds things up a lot.


Sample rate decimation

The next block is a decimating low pass filter used to slice out the interesting signal and decimate the sample rate. It enables the actual demodulation process to be performed in real-time at a much lower sample rate. However, the decimating low pass will still run at the high sample rate and is therefore our next target for optimization.

I tried many different things to speed up this low pass filter and I ended up with splitting it into a cascade of decimating low pass filters. Each filter decimates the sample rate by two which means only the first filter will run at highest sample rate. Decimation by two enables us to implement the low pass filter as a half-band filter. A half-band filter is a low pass filter with cut-off frequency at fs/4 and the positive characteristic of having every second filter tap equal to zero:

A half-band low pass filter. Graphic from Richard G. Lions "Understanding Signal Processing"

Because every second filter tap is zero we can implement this filter to require only half the multiplications needed for a standard FIR filter. We will also take advantage of the symmetry of the filters taps, effectively reducing the number of multiplications again by factor two!

Finally I implemented the filter in a very un-flexible way by hard-coding the filter tap values into the filter method. This gets rid of some conditional structures and array lookups in the code and makes it very fast at the cost of very ugly, un-flexible code.

By cascading a variable number of these filters we can decimate the sample rate by every integer power of 2. Note that the last filter in the line should always be a regular FIR filter with cut-off frequency < fs/4 to avoid aliasing.


Multithreading

It is obvious that we should take advantage of today's smartphones having multi-core CPUs. That's why we separate each block in our receiving chain into its own thread.

Biggest problem for multi-threading is always synchronization. I chose to use ArrayBlockingQueues to connect the threads together (sorry for the over wide image^^):

Every blue rectangle is a separate thread. The blocking queues help to synchronize them.
It is very important that we reuse the buffers by passing them back to the previous block every time we finished processing them. This will avoid memory allocations at runtime and the garbage collector going crazy.


Conclusion

The above design and implementation choices helped me to get demodulation running in real time. But this is only true if you run it on a device with a decent quad-core CPU. I don't have an old phone to test it on a weak CPU but I'm pretty sure it won't work on a dual- or single-core device.

If you tested the application on your phone, please leave a comment and tell me the device type and your experience.

I will try to further optimize the demodulation chain in order to get it running on older devices too! And of course to save battery! Any tips or hints are very welcome!

3 Kommentare:

  1. Could you share a recorded sample from the HackRF transfer application? So more people would be able to test the demodulation on their phones while not (yet) having a HackRF or RTL-SDR.

    AntwortenLöschen
  2. I recorded a small snippet of the FM band and uploaded it on Google Drive:
    https://drive.google.com/file/d/0B3KPeCGDm7cCMHRGSk1iNlZ3TXM/view?usp=sharing

    I also have a recording at around 410 MHz where you can hear some taxi drivers (NFM at 410.285MHz and another at 410.51MHz):
    https://drive.google.com/file/d/0B3KPeCGDm7cCYUR5WkpLTEpWams/view?usp=sharing

    Cheers,
    Dennis

    AntwortenLöschen
  3. got a hint from Michael Ossmann to use single precision floating point variables instead of doubles. This this turned out to be a very significant optimization since it speeds up every operation performed on the signal samples. Version 1.08 contains these changes along with some other, rather minor optimizations.


    Channel Selection
    In order to shift the interesting signal down to base band it has to be multiplied by a complex cosine. This has to be done before any sample rate decimation is possible and therefore this will always run at the highest sample rate in our receive chain.

    I was already using a lookup table to convert the signed interleaved IQ bytes from the HackRF to floating point values. So I decided to extend this table to also include the mixed values:

    AntwortenLöschen