Friday, 28 August 2015

Huninn Mesh Part 3 - Organisation of Time Series Data

In my previous two posts about Hunnin Mesh, I discussed the nature of the Huninn Mesh network and the architecture, abstractions and constraints of the Huninn Mesh server.

In this post, I will go into more detail about the organisation time series data in the Hunnin Mesh server and will illustrate how the system is designed to achieve high throughput in terms of both measurement ingestion rates and also access to data by end users through a RESTful API.

The Huninn Mesh Network

Huninn Mesh manufactures wireless sensors that measure temperature, pressure, humidity and so on. The sensors are arranged into a mesh network. That is, a network where one sensor 'node' can pass information to another mesh node, and so on, until a gateway device is reached. The gateway then passes measurements to a server that resides in the cloud.

Huninn Mesh is a mesh network of sensor nodes (blue) which communicate using the Huninn Mesh protocol (orange lines) through a gateway (purple) to a server in the cloud (black line).
A Huninn Mesh network is:
  • software defined (routing of messages is under the control of a server in the cloud; 
  • able to efficiently handle large number of hops between network nodes; 
  • resilient (it is able to recover from the failure of a node); and,
  • quiescent for most of the time (nodes are quite literally off line, relying on schedule specified by the server to perform allocated tasks).
A Huninn Mesh sensor is:
  • compact (10cm by 10cm by 1.5cm);
  • typically measures two to three parameters - temperature, pressure and humidity being the most common;
  • battery powered; and,
  • designed to provide a realistic battery lifetime when deployed in the field of five years and is sampling at 15 second intervals.
A Huninn Mesh server is responsible for:
  • managing the mesh (creating and updating the software defined network);
  • managing sensors;
  • ingesting data from sensors;
  • storing data in a time series database; and
  • exposing an API through which the data can be queried.
The characteristics of battery powered sensor nodes coupled with a software defined mesh network topography mean that large numbers of Huninn Mesh sensors can be deployed by non technical people in a very short time and, therefore, at a low cost.  The server architecture leverages the power of cloud based computing and mass storage, again, at a low cost.

Using the Data Stream

To date, Huninn Mesh networks have been installed in office buildings in order to provide the measurement base on which building managers can optimise building comfort and energy usage.

Using a real world example of a fifty floor building in New York, over two hundred sensors were deployed, each of which samples temperature, pressure and humidity at roughly fifteen second intervals. Thus, the server receives approximately 2,400 measurements per minute from this network (this measurement ingestion rate of 40 transactions per second is roughly 1/5,000th the capacity of a single server - we regularly test at 5,000 tx/sec).

Different users have different requirements of this data set. Interviews with building managers tell us that they require:
  • proactive hot/cold/comfort alerts;
  • reactive hot/cold reports;
  • suite/floor/building comfort state visualisation;
  • seasonal heat/cool crossover planning; 
  • plant equipment response monitoring;
  • comparative visualisation of different buildings; 
  • ad-hoc data query for offline analysis (in Excel);
  • and so on.
From the above, it should be clear that:
  • a building manager requires aggregate data sets that represent multiple sensors; and,
  • in most cases, measurements made at fifteen second intervals provide far more information than the requirement demands (a manager does not require 5,760 data points in order to ascertain the trend in ambient air temperature of a building's lobby over a 24 hour period).
At Hunnin Mesh, the question of how to make our data set fit for these many purposes occupied us for some time. The solution was decimation and the creation of Computational Devices and that's what the remainder of this post is about.

Time Series Data Set Decimation

Decimation refers to the (extreme) reduction of the number of samples in a data set (as opposed to one-tenth as implied by deci). Decimation in signal processing is well understood. I'll explain how we use the technique in the Huninn Mesh system.

A Huninn Mesh sensor samples at a set rate and this rate must be a power of two times a base period defined for the entire network.  In this illustration, the sample rate is fifteen seconds.

Samples are obtained at a regular interval.
Our problem is that the number of samples is (often) too large and, therefore, the data set needs to be down-sampled. The Huninn Mesh server does this via decimation - it takes a pair of measurements and calculates a new value from them.

A pair of samples taken at 15 second intervals (black) is used to create a new sample (blue) which contains the maximum, minimum and average of the two raw measurements and has a sample rate of half that of the original pair and with a timestamp half way between the timestamps of the two measurements.
The calculation of the aggregate value depends upon the type of 'thing' being down-sampled. In the case of a simple scalar value like a temperature measurement, the maximum, minimum and average of the two samples is calculated. This aggregate sample is stored in a Cassandra database in addition to the raw data series.

Decimation is also applied to values created by the decimation process. With four samples, the system is able to decimate a pair of decimated values (using a slightly different algorithm).

Decimation (purple) of a pair of decimated values (blue) which are calculated from the raw time series samples (black).
And with eight samples, a third decimation level can be created and so on:

Three 'levels' of decimation (orange) require 2^3 = 8 raw time series samples and each individual derived value and is representative of the entire two minute period.
The calculation of decimated values proceeds as shown in the following animation.
The calculation of aggregate values representing different decimation levels occurs when the full sample set for the aggregation is available.
A Huninn Mesh server is typically configured decimate over 24 levels from the base period of the network with the first level aggregating over 2 samples, the second 4, the third 8, to the 24th which aggregates 16,777,216 base period samples. In fact, for any given decimation level, only the pair of values from the previous level are required to perform the decimation meaning that the process is computationally efficient regardless of the decimation level and requires a small working set in memory.

The raw time series measurements and all decimated data products for every sensor are stored in a Cassandra database. The API to query this information will be discussed in a follow up post.

Recall from an earlier post that every Huninn Mesh device in a network has a sample rate that is a power of two of a base period that is constant for the entire network. Therefore, decimation intervals exactly match sample intervals allowed for sensors.

A fixed base period and the rule that sensors must sample at a rate that is a power of two times the base period provides a guarantee higher order decimations will be available. Samples are all aligned in time. In fact, neither samples nor the decimation time window are aligned. Instead, the actual time that a measurement is made depends upon the optimum network configuration. The system only guarantees that measurements will be made at the required interval.

If the base period were set at 400ms, the decimations would proceed as follows:

Decimation
Level
Samples
Aggregated
Interval
(ms)
Time Span
(D H:M:S.ms)
1 2 800 00:00:00.800
2 4 1,600 00:00:01.600
3 8 3,200 00:00:03.200
4 16 6,400 00:00:06.400
5 32 12,800 00:00:12.800
6 64 25,600 00:00:25.600
7 128 51,200 00:00:51.200
8 256 102,400 00:01:42.400
9 512 204,800 00:03:24.800
10 1,024 409,600 00:06:49.600
11 2,048 819,200 00:13:39.200
12 4,096 1,638,400 00:27:18.400
13 8,192 3,276,800 00:54:36.800
14 16,384 6,553,600 01:49:13.600
15 32,768 13,107,200 03:38:27.200
16 65,536 26,214,400 07:16:54.400
17 131,072 52,428,800 14:33:48.800
18 262,144 104,857,600 1 05:07:37.600
19 524,288 209,715,200 2 10:15:15.200
20 1,048,576 419,430,400 4 20:30:30.400
21 2,097,152 838,860,800 9 17:01:00.800
22 4,194,304 1,677,721,600 19 10:02:01.600
23 8,388,608 3,355,443,200 38 20:04:03.200
24 16,777,216 6,710,886,400 77 16:08:06.400

The Time Span column gives the number of days, hours, minutes, seconds and milliseconds represented by a single decimated value.

Note that earlier, I wrote that the sample period was 'roughly' fifteen seconds. In fact, the network had a base period of 400ms and so the sample period was 12.8 seconds.

The next section provides more detail about the properties of of time periods including alignment with human recognisable intervals.

Properties of the Time Series

At the risk of stating the obvious, the first property of note is that for any given sensor at any moment, the length of time until the next measurement is available is always known irrespective of whether the measurement is raw data or an aggregate decimated value (this property is very useful and its use will be discussed in my next post about the Huninn Mesh RESTful API for sensor data).

The second property arises as a result of the network having a constant base sample period: device sample rates and decimated sample rates align across a network. Individual devices are constrained such that they must only sample at a rate that is a multiple of a power of two of the base rate. For example, on a network with a 400ms base period, one device may sample every 102,400ms and another 12,800ms and another at 800ms: through decimation, data for all of these devices can be accessed at sample rates that are known a-priori.

The third property is that time spans for decimation levels do not map exactly to any human recognisable span: no decimation level that maps exactly to an minute, hour, a day, etc.

This last property means that there will always be an error in when estimating values for one of these 'human' periods. For example, what if the daily average for a temperature sensor is required by a building manager? Since there is no decimation that maps exactly to a day, the trick to using these data products is to select a decimation level that provides the smallest number of samples (with an acceptable error for the period under investigation) and then to calculate the mean from this small set.

Where a single sample is required to summarise a day, the following decimations might be considered:
  • Decimation level 18 provides a single sample that spans one day, five hours, seven minutes, 37 seconds and 600 milliseconds. This is unlikely to be of use because it spans more than a day and individual samples do not align on date boundaries. 
  • Decimation level 6 samples at 25,600ms which divides exactly into 86,400,000 (the number of milliseconds in a day). At this decimation level, there are 3,375 samples per day which would have to be averaged in order to calculate the required data product. This represents 1/64 of the raw data volume but the maximum error of 25.6 seconds may be overkill for the requirement.
  • Decimation level 10 creates a sample every 409,600ms (about every six minutes and fifty seconds), each sample aggregates over 1,024 raw measurements, returns 210 measurements per day that must be averaged to provide the end result and has a sample error of 6 minutes 24 seconds.
  • Decimation level 14 creates a sample every 6,553,600ms (about every one hour and 39 minutes), each sample aggregates over 16,384 raw measurement, returns 13 measurements per day that must be averaged to provide the end result and has a sample error of 20 minutes and 3 seconds. 
In the web based API provided by Huninn Mesh for building managers, we chose decimation level 10 which returns 210 samples that then have to be averaged to provide the mean for the day.  This decimation level was chosen because the sample time error was small (a maximum of six minutes and ten seconds) and this number of samples also accurately catches trends in temperature / pressure / humidity in a building and so the data set is also useful for graphing.

In this discussion, the sample error refers to that part of a day that is under sampled when using the given decimation level (a user could also choose to over sample in order to fully bracket the required time span).

In summary, the properties of the raw time series and decimated data products mean that the system does not provide answers to questions like "what is the average temperature for the day for sensor X". Instead, the system creates and stores multiple data products that can be rapidly queried and post processed by the user to arrive at an answer.

Computational Devices

Up until this point, discussion has been concerned with measurements from physical sensor devices - little boxes that measure temperature, pressure, humidity and so on. One of the use cases presented for the instrumented building discussed earlier was what is the average, maximum and minimum temperature for the entire building, that is, a measurement that represents a number of other measurements.

Computational devices are a special class of virtual device, implemented in software, that sample measurements from other devices (both real and computational) and emit a sample of their own.

For example, an Aggregating Computational Device may be created that takes as its input all of the devices that produce an aggregate temperate value for every floor in a given building. The devices for the floor are themselves computational devices that take as their input all of the raw temperature sensors on the given floor.

Nine 'real' devices (blue) measure temperature on three different floors and are used as inputs to three Computational Devices (tc[1]... tc[3]), each of which aggregates over the temperature for an entire floor and are used as inputs to a single computational device (tc[b]) that aggregates over the temperature for the entire building (right). 
In the diagram, above, the arrows between devices show the logical flow of data, arrows should be interpreted as meaning 'provides an input to'.

Just like every other device, Computational Devices sample - they are scheduled at a set rate. They do not wait for events from dependent devices and, for this reason, circularities can be accommodated - a Computational Device can take as input its own state. Sampling theory says that a Computational Device should sample at twice the frequency of the underlying signal: twice as often as the smallest rate of any of the 'real' sensors.

To date, two types of Computational Device have been created:
  1. Aggregating Computational Devices sample either real devices or other Aggregating Computational Devices and emit the average, maximum, minimum, average-maximum and average minimum; and,
  2. Alerting Computational Devices sample either real devices or other aggregating computational devices and emit either an enumerated value (e.g. hot, cold, normal).
We have a range of other devices in the works, for example, ASHRAE comfort devices for summer. Ultimately, Huninn Mesh intends to release a public API whereby users can plug their own computational devices into the system.

The output from a Computational Device is subject to decimation, just like any other device. Time series data from computational devices is stored in a Cassandra database, just like any other data. A computational device is, however, able to specify its own format for storage in the database and its representation through the Huninn Mesh RESTful API.

Programatically, Computational Devices are described by a simple programming interface in Java. It is a straightforward matter for a competent software developer to implement new types of device - for example, a simple extension to the Aggregating Computational Device is one which weights its inputs.

Time Series Data a Device is Immutable

A final rule worth touching upon is that the raw time series and the associated decimated data set for a device is immutable: once created, it cannot be altered. Data is written once to Cassandra and read many times.

What happens then if there is an error? for example, a device is deployed for some time before it comes to light that the calibration is incorrect? (As it happens, this was our most common error during development.)

The system is architected so that sensors do not perform conversion or calibration in the network. Instead, the raw bitfield is always passed to the server which:
  1. stores the raw bitfield in Cassandra against the ID of the Real Device; and,
  2. converts, calibrates and decimates measurements and stores these against a Virtual Device ID in Cassandra.
Calibration and conversion is a property of a Virtual Device's specification. If the calibration is changed retrospectively, a new Virtual Device is created for the affected Real Device and historical data replayed to create a new time series.

Put another way, fixups are not allowed - time series data really is immutable. This is not to say that corrections cannot be made, they can. A new Virtual Device is created and the raw data stream is replayed, recreating a new, correct data set. The same is then done for all downstream Computational Devices including their descendant Computational Devices.

This begs one question: if Virtual Devices are subject to change, how does an end user discover what device represents what property - which device can tell me the average temperature for building A?

The answer to this question is in a device discovery API and this will be discussed in my next post about the Huninn Mesh RESTful API.

It's About Scale

Fixing the network's base sample period, the power of two times base period rule, decimation, computational devices, device virtualisation, and immutability are all features that have been honed with one need in mind: performance at scale.

Consider that:
  • For any decimation level, the arrival time of the next sample is known in advance, or, put another way, the expiry time of the current measurement is precisely known.
  • Reduction of the time series through decimation has a low computational overhead and can dramatically reduce the amount of data required to answer any given query. 
  • Immutability of any measurement point at any level of decimation for a virtual device means that the sample never expires: it can be safely cached outside of the system forever.
  • Indirection through device virtualisation introduces the need for a discovery API, the results of which are the only part of the system subject to change.
In my next post, I will discuss how all of this comes together in the Huninn Mesh RESTful API to deliver a system with very low query latency times across very large numbers of sensors and multiple orders of magnitude of time ranges.

Thursday, 13 August 2015

Did I Click That? or What's up with the Browser User Experience on Slow Network Connections

My Network is Slow

For some reason, right now, there is an awful lot of time between my clicking on a link in a web page and any content displaying in my browser. Things are wrong, things are going wrong: it seems that my ADSL network is at a crawl, that DNS queries are incredibly slow, and overall round trip latency even to a server just across the city is awful.

How wrong? Well, I'm writing from Sydney and browsing the Sydney Morning Herald home page (251 HTTP requests, 6,106KB) and it is taking 8 seconds to reach a page load event. Following a link from the home page (Commonwealth Bank customers say double charges not refunded), I see (an incredible) 340 requests and a further 10,467KB downloaded - with a 6 second delay until I see content.

The purpose of this missive is not to complain about the absurd weight of the two pages cited, but to discuss the user experience during the (seeming interminable) wait for content. Since I use Firefox as my default browser, I'll (mostly) confine my comments to that browser, although my comments may be applicable to other browsers - mobile and desktop.

Did I Click That?

As noted, what's happening right now when I click on a link is, well, nothing much. I can't do much about my network, but the issue of nothing much happening at all might be improved upon.

Let me explain by walking through my actions and Firefox's response when viewing articles on the SMH website:
  1. I move my mouse over an article link;
  2. my mouse cursor changes, the anchor tag for the link is underlined as I hover over it and the title specified in the anchor is shown (all from HTML/CSS/JS);
  3. the URL associated with the link is shown in a small, animated, floating status bar overlay in the bottom left of the browser window;
  4. I click my mouse to follow the link - nothing changes between the mouse down and mouse up (again, HTML/CSS - i.e., no CSS :active selector on the SMH website);
  5. as I release the mouse button the link turned grey (CSS :visited selector at a guess);
  6. the tab in my browser changes from showing the page title to displaying an animated 'throbber' alongside the word "Connecting" and the status bar changes according to page load state - for example, "Waiting for smh.com.au", "Read smh.com.au" and so on;
  7. a few seconds later, the tab changes, showing the title of the new page as the browser receives and parses the HTML head element from the server (like I said, things are wrong...); and finally,
  8. some time afterwards, the page renders and the 'throbber' icon is replaced by a favicon for the site.
Chrome does something different, but you have to be observant to note that it shows the domain name (www.smh.com.au) when it is in the 'connecting' state and it also shows various 'processing' messages in its status bar during page load.   On closer inspection, Firefox does almost exactly the same thing.

Here's a screenshot of Firefox on a Mac showing a page load in progress, with the tab-bar 'throbber' and status bar both highlighted.
Firefox's loading 'throbber' and status bar during a page load.

Given my slow connection, my experience of clicking a link is poor - it's difficult to understand what's happening.


Why so? My eye and therefore my focus is directed to the link that I clicked, not the 'throbber' in the tab bar and not on the status bar.

Given a slow connection, in order to verify that I have actually clicked on a link, my eye has to scan up to the top of the page and then across in order to locate the tab containing the throbber. Alternatively, my eye can scan to the bottom left of the page. Either way, on my slow connection it is not immediately apparent what the browser is up to, leaving me in a state of cognitive dissonance - I clicked that link, didn't I?

Experimenting on Children

As I write this, I have been careful to control my own behaviour. In particular, I have made sure that I click on links once and only once. That's because I know that, just like a pedestrian crossing where pressing the button multiple times does not make the lights change faster, the page won't load any faster if I click the link lots of times. I tell my kids this, but they press the button at the crossing fifty times and a simple experiment this morning reveals that they do exactly the same thing when faced with links that do not result in near instantaneous page loads.

Unlike pedestrian crossings, clicking the link on the web page multiple times does change the outcome. A quick look in the developer tools (Chrome) shows that when the littlies repeatedly click on the same link, outstanding network requests are cancelled.

In an HTTP/1.1 world, this is bad news. I mean really, really bad news for performance. This is because a cancelled request results in a torn down connection (TCP close and all that). Only to be re-established and the same HTTP request re-sent. (In the interests of balance I should add that, networks being what they are, this strategy sometimes works.) I should also add that in an HTTP/2 world, the cost is nothing like as great but it still exists, particularly in the case of network that's just plain slow.

A Modest Proposal

At the risk of stating the obvious, sites should do what they can with HTML/CSS/JS available to them to provide appropriate cues to the user regarding followed links and progress. That's not what this proposal addresses, however. I am making a suggestion about what the browser might do during a page load and is thus outside of the realm of normal HTML/CSS/JS.

From a product perspective, I believe that changing the experience of page loads can improve the user's understanding of what just happened and, by so doing, reduce the number of false reloads which, in turn, might actually improve performance on slow networks.

As noted earlier, the state of the art is a throbber in the tab bar and a part-time status bar. The key change I suggest is to (drastically) change the way that page loading state is conveyed. If we accept that the user expects something to happen immediately on clicking a link, then it is reasonable to use the entire browser window's drawing area to convey feedback about the page load operation. Users are accustomed to modal progress spinners in their mobile apps and I am proposing something similar, but writ large:

It's clear that a link was clicked, a page is loading, and the most common options are clearly accessible to the user.

Looking past the fact that I just threw this together using Gimp and Powerpoint, there are a number of product requirements here:
  1. The act of clicking on a link that navigates away from the current page should result in immediate unequivocal feedback irrespective of page design, the speed of the network, remote server, etc. (More on 'immediate' in moment.)
  2. Information from the anchor URL and title in the followed link should be used to provide feedback about what page is being loaded.
  3. Navigation away from a page is immediate and so all other links on the page should be disabled.
  4. The user should be provided with a simple means to cancel the page load operation and go back to the prior page.
  5. Because sometimes networks will be networks, a means to cancel a slow request and restart it should be provided. 
Arguably the browser already covers off on points 1, 2, 4 and 5.  Therefore, I am only arguing for a change in how this information is presented.

In practice, this might work as follows:
  • The user clicks a link to navigate away from the current page.
  • An interstitial loading page is inserted into the tab comprised of:
    • a background image created from a screenshot of the page being unloaded, with a Gaussian filter applied, is used to indicate that links cannot be followed during page-load. (I chose a Gaussian filter because, hey, why not?);
    • an animated throbber echoes that in the tab bar;
    • the title of the page being loaded (from the anchor element) allows the user to intervene early in the case where the wrong link is clicked;
    • the domain name of the server being contacted is prominent and may assist the user in an early bail-out;
    • a button that provides the user with a means to go back to the prior page; and,
    • a button that allows the user to reload the page.
  • Immediately on receipt of the head element for the new page, the link title is updated in the interstitial and tab bar (so you know you're being rickrolled, just before you really are).
  • The interstitial page is removed and replaced with the new page contents as soon as enough content has been retrieved from the network.  They key requirement here is to avoid replacing the interstitial with a blank page during slow loads.  I'm aware of how hard this may be to implement, what with CSS + JS but see no harm in asking.
In order to cater for fast page loads, the creation of the interstitial might be staggered. On navigation to a new page:
  • Links on the existing page are immediately disabled.
  • The interstitial is immediately created containing a blurred copy of the prior page but with the other interstitial UI elements hidden.
  • Some time later - say 500ms - the throbber, text and button elements are made visible in the interstitial.
Whatever the new UI does, it should not degrade the user experience of fast loads, that is.

At this point, my imagination is galloping off - timing of the creation of the interstitial might be varied depending on whether a connection to the host is already open, or whether the link is to the same domain as the page being unloaded. I use a range of apps such as JIRA where one page looks very much like another. In this case, the bluring and redrawing of near identical pages during navigation might be distracting. Again, timing is key. Animation may be useful. I should stop - my point is that immediate might mean in the blink of an eye.  In summary, I think that this idea will be useless if the timing of the presentation of the interstitial is wrong.

HTTP GET requests require no further discussion. Since POST requests are not idempotent, they must work as they currently do, that is, if the user retries a request, they must be warned that information will be submitted twice.

For the avoidance of doubt, I believe this might work for normal links but not for JavaScript.

Summary

In the world of mobile apps, it is common to provide a modal 'spinner' that is displayed in response to a user action to signify that the app is busy, for example whilst waiting for an operation like a network request to complete.  This idea expands upon this.

Utilising the whole of the browser window in order to provide feedback during slow page loads may provide an improved user experience and possibly even lead to modest improvements in load time outcomes.  I reason that the idea is equally applicable to mobile as it is for desktop browsers.