Archive for June, 2011

Dynamic Logging Examples

Monday, June 27th, 2011

As I mentioned in an earlier post, the Butterfly Logger now has a dynamic logging feature. This post elaborates on the benefits and costs of the feature using some data gathered during testing. Some examples of the output while using the feature is shown below in figures 1 and 2. In processing the output from the logger I created a couple of scripts to analyse the data and create graphs. These are included below as listings 1 and 2. While the new feature allows us to log for longer without the risk of missing events, it also costs us in no longer being able to predict how long a logger can be deployed for in the field.

A Graphical Overview

Figure 1: Dynamic logging example.

The two part graph shown above in figure 1 demonstrates the increasing and decreasing of logging frequency as the monitored signal changes more or less quickly. The signal being monitored in this example is the ambient temperature of my garage over a week in March 2011. This was done using the AVR Butterfly’s onboard thermistor.

The upper part of the graph shows the temperature plotted in order sampled as it is stored in the memory of the logger. The lower part of the graph plots the temperature against time. This is done using the date and time as recorded by the logger and using the following lines in gnuplot:

set xdata time
set timefmt "%Y/%m/%d %H:%M:%S"
plot 'dataFile.log' using 1:8 with impulses notitle

The temperature in the lower section is plotted as impulses simply to highlight the change in logging frequency.

Figure 2: Comparison of data plotted against time or sample index.

The above graph in figure 2 shows a much larger sample from the same logging session. In this plot you can again see the differences between the samples recorded in memory and the samples plotted against time.

You will notice how the samples actually recorded only ever differ by a fixed amount. This is evident through the constant slope of the upper section of the graph. This constant slope, either increasing or decreasing, is an artefact of the logging threshold value. The threshold value is set at compile time in the software.

The software also has a timeout parameter that establishes the maximum logging interval. If no timeout is set the system will only log when the threshold has been exceeded, however if the parameter has been set then the logger will record a sample after a fixed period time (even if the sample threshold has not been exceeded).

The minimum interval is specified by the sample period parameter as used when logging in the standard mode. The timeout value is simply specified as a multiple of this interval.

Figure 3: Dynamic resolution of the data expressed as number of samples per hour.

Figure 3 is yet further analysis on the same data from the previous two graphs. This plot shows the logging resolution over time. The resolution is expressed as the number of samples recorded per hour. This figure was calculated by processing the data and counting the number of samples recorded that hour. Where no data is recorded for a given hour a zero is recorded. A Perl script to produce this data from the logger output is shown below in listing 2.

This example gives us a good metric for measuring the effectiveness of dynamic logging. If you examine the graph you can see that, at its peak rate, the logger is recording at a resolution of 30 samples per hour. The logging interval for this session was set to once per second, giving us a maximum sample rate of 3600 samples per hour. As the rate is dynamic it makes sense to also look at the average rate over the whole period, which is just over 2 samples per hour. If we compare the average rate to the peak rate we can gauge the efficiency introduced by the dynamic logging feature.

TABLE 1: Numerical summary
Number of hours monitored: 168
Number of samples recorded: 350
Minimum sample rate: none
Maximum sample rate: 3600 samples per hour
Average sample rate: 2.1 samples per hour
Samples needed if using peak rate: 5040
Samples needed if using maximum rate: 604,000

Conclusions

The benefits of dynamic logging

By looking at the number of samples needed at the peak sample rate and comparing this to the number of samples used, we can calculate the theoretical saving in redundant samples. If we were to capture the same data using standard methods we would have used over 14 times more samples. As our storage space is limited, if we were using a standard technique we could only log for a much shorter time.

This reduction in space used can be taken advantage of in two ways. If the temperature characteristics in the garage remain unchanged the logger could record for 3 years without filling up the flash. This is much much more than possible using the standard technique. Another way to utilise this extra space would be to log more sensors over the same period as possible with standard logging.

Other than storage savings the major benefit of the dynamic sample rate is that dramatic events are not missed. If we had wanted to conserve storage space using standard methods we would risk the possibility of sudden changes in the data being overlooked by a much slower sample rate.
Using a dynamic rate we get the benefits of a higher sample rate without the storage cost.

The costs of dynamic logging

There are a couple of disadvantages to this dynamic sample rate technique. One of these is the reduction in resolution in our sensors. This is due to the threshold introduced before a sample is taken. The resolution of the ADC has effectively been reduced from 1024 to approximately 340 levels. For our example this reduces the temperature to a resolution of 0.1°C per level in the temperature range we are looking at. (< Due to the non-linear nature of the thermistor, the effective temperature resolution will change of over the range of the sensor >)

The logger will also consume more power due to being out of sleep mode more often. The logger needs to leave sleep mode often to check if it should be recording a sample to flash. Compared to just reading of the sensorsWriting to flash is by far the most power hungry activity of the logger when c, because of this I don’t anticipate the extra power requirements to be too great, although I have not actually measured the impact.

When using a fixed sample rate technique you can calculate the exact length of time a logger will be able to log for prior to deployment. With the dynamic sample rate you can no longer know how long you can log for when deploying a logger. To mitigate this issue you could either make a prediction based on prior knowledge or develop a method to alert the user when the memory is approaching capacity.

Using data previously gathered in an environment you can predict how long it is reasonably likely to log for. Using the example data used I would calculate the expected logging capacity based on 700 samples a week (< Doubling the average sample rate we saw in the data simply for reasons of contingency. >). Using this value would allow logging for something like 80 weeks or so. This is of course still 7 times longer than standard logging would allow. These types of calculations could potentially be incorporated to the firmware to make using dynamic logging more useful.

Scripts

As promised above I have included the scripts used to produce the data and graphs. They might be useful to anyone getting started with using Gnuplot, BASH and Perl to automate graphing and analysis.

#!/bin/sh

# script to plot logging results temperature against time.
# also plot temperature samples for comparison and analysis of
# the dynamic logging system.

#assume arg 1 is name of file with ^M's ^D's and non data lines already removed.

maxtemp=30

# look at number of points per hour
#cut -d\  -f2 ${1} | cut -d: -f1 | uniq -c > ${1}_res

# TODO: need to pad out hours with zero points
# DONE: use pad.pl to calculate number of samples per hour and pad any zeros
# 	Does not check for an entire day with out samples though.
cat ${1} | ./pad.pl > ${1}_res

gnuplot << EOF
set terminal png size 1024,768 enhanced font "/Library/Fonts/Microsoft/Arial,12"
set output "$1_resolution.png"
set origin 0,0
set grid
set yrange [0:35]
set title 'Logging resolution over time'
set ylabel 'No. of samples per hour'
set xlabel 'Hour of sampling' 
plot '${1}_res' u 2 w i  not

set output "${1}_comparison.png"
set multiplot
set origin 0,0.5
set size 1,0.5
set xdata 
set yrange [0:${maxtemp}]
set title 'Ambient Temperature'
set xlabel 'Sample'
set ylabel 'Temperature (°C)'
plot '$1' u 8 w l not

set origin 0,0
set size 1.0,0.5
set xdata time
set timefmt "%Y/%m/%d %H:%M:%S"
set ylabel 'Temperature (°C)'
set yrange [0:${maxtemp}]
set xlabel 'Time'
set title ''
set format x "%d %b"
plot '$1' u 1:8 w l not

set output "${1}_impules.png"
set multiplot
set origin 0,0.5
set size 1,0.5
set xdata 
set yrange [0:${maxtemp}]
set title '150 Samples of Ambient Temperature'
set xlabel 'Sample'
set ylabel 'Temperature (°C)'
plot '< head -150 $1' u 8 w l not

set origin 0,0
set size 1.0,0.5
set xdata time
set timefmt "%Y/%m/%d %H:%M:%S"
set ylabel 'Temperature (°C)'
set yrange [0:${maxtemp}]
set xlabel 'Time'
set title ''
set format x "%d %b"
plot '< head -150 $1' u 1:8 w i not
EOF

open $1_resolution.png
open $1_comparison.png
open $1_impules.png

Listing 1: BASH script to process the data into graphs.

#!/usr/bin/perl -w

use strict;
# simple script to process temperature logs and show number of samples per hour

# my old script did this...
## look at number of points per hour
##  cut -d\  -f2 ${1} | cut -d: -f1 | uniq -c > ${1}_res
# .. but that didn't account for hours with no samples at all.

# my variables...
my @lines;
my $line;
my $hour;
my $previous_hour;
my $first = 1;
my $count = 0;
my @fields;
my @time;

# read in all lines from stdin
chomp(@lines = <STDIN>); 

#process each line in turn
foreach $line (@lines) {
		
		# extract the hour value from the time, ignoring the date.
		@fields = split /\s+/,$line;
		@time = split /:/,$fields[1];
		$hour = $time[0];
		
		# set up the previous_hour for the first line to enable the count
		if ($first == 1){
			$previous_hour = $hour;
			$first = 0;
		}
		
		if ($hour == $previous_hour){
			#increment the count of samples for this hour 
			$count++;

		} else{
			#  print our total and move on to the next hour
			print "$previous_hour \t $count \n";
			$count = 0;
			$previous_hour++;
			$previous_hour %= 24;
			
			# check for non consecutive hours and pad out with zero totals
			# assuming their hasn't been a total day without samples
			while ($previous_hour != $hour){
				print "$previous_hour \t $count \n";	
				$previous_hour++;
				$previous_hour %= 24;
			}
			
			# remember to count this first new sample in our totals 
			$count=1;
		}	
}

# done

Listing 2: PERL script to process the data logger output and calculate the number of samples per hour.