In this post, I will explore how we designed and built a (very?) high performance API to the Huninn Mesh server.
Beginnings
The CEO had given the requirement to "just graph it". By "it", he meant a sensor reading - temperature, pressure or humidity - and he meant a day's data. This was the point at which I walked in the door at Huninn Mesh and took over product management.
The team had already decided to build an API on the server that provided a JSON response and to use a JavaScript library on the browser to generate the graph. The API was structured as a classic HTTP GET with query parameters, something like this:
  GET /sensor?id=ae43dfc8&from=1441029600000&to=1441116000000 
The result of the first sprint was a system which defaulted to plotting the prior 24 hours. That is, it used:
- to = Date.now() and 
- from = to - 8640000
The JSON response contained an array of approximately 172,800 arrays of raw measurements from sensors. (The system was sampling twice a second and an array of arrays was the format required for a 
Flot line chart).
This approach - the query design and JSON format - had the benefit of being very quick and easy to implement. Thusly, the CEO got his graph and he duly asked for two more so that temperature, pressure and humidity could be seen in stacked graphs.
The graphs were s l o w to render: running at six plus seconds. A variety of reasons conspired to make the system so slow, the number of data samples (172,800 data points / day), the fact that a query involved a Mongo DB query and then a Cassandra query, and so. In sprint two, therefore, three features were scheduled:
- deploy a GZIP filter in order to compress JSON response size;
- on-the-fly data averaging was used to reduce the number of data points to a manageable number (500 or so); and,
- graphs should show envelopes - maximum, average and minimum - of aggregate data.
These changes resulted in little improvement in the time to draw graph metric. This was because:
- the simplistic nature of the the way the query was created means that the resulting JSON was essentially never cacheable (most calls to Date.now() return a different result every millisecond and thus a different query); and,
- the need for on-the-fly reduction in the number of samples increased the processing time for every single query on the server by a significant amount. 
This then was the point from which a new, high performance RESTful API emerged. The decision was made to sit back, gather requirements, plan and implement a new high performance system.
It should be noted that the design work for the API was undertaken in parallel with the design for time series representation and decimation of sensor measurements - design decisions for time series representation and decimation informed the design for a RESTful API and 
vice versa.
Talking to Users
Whereas our first customer was our CEO, our first real users were building managers.
Huninn Mesh's first customers were the building managers of commercial real estate who loved the idea of an dense wireless network that was quickly and easily deployed, feeding a cloud based back end and providing access to a web based interface through any device.
Our first, tame, building manager asked for a tree view of sensors and a graph. In other words, something very similar to that they were used to from devices like the Trane SC. 
Digging further, we found that simple graphs showing a temperature envelope were useful, but not really what they were after.
The Properties of Sampled Data
In an prior post I discussed how 
time-series data is organised and decimated in the Huninn Mesh server.
In summary:
- the entire network has a fixed base sample period;
- a device is constrained such that it must sample at a power of two times the base period;
- data from devices is decimated by halving and stored in a Cassandra database alongside the raw measurement sample;
- 'virtual' devices are used to provide indirection between raw samples and the outside world; and,
- time series data from a virtual device is fixed, if an error is detected (say a calibration error), then a new virtual device is created from the raw data. 
|  | 
| Sampling is at a set rate which can be any power of two times the base period for the entire network. | 
From a raw time series, a virtual device is created and decimations calculated via a process of halving:
Halving means that decimated sample rates obey the power of two times the base period rule, just like raw measurement time series.
The trick to achieving a (very) high performance RESTful API to access these data is to design around the properties of the time series and to reflect these in a RESTful API in such a way that the workload on the server and associated databases is reduced as far as possible.
A RESTful Time Series API 
The time series API was designed to exploit the properties of predictability , immutability and decimation. The API takes the form of an HTTP GET method in one of the following variants:
- /sensor/id/
- /sensor/id/timezone/tz/count/sc/year/yyyy/
- /sensor/id/timezone/tz/count/sc/year/yyyy/month/mm/
- /sensor/id/timezone/tz/count/sc/year/yyyy/month/mm/day/dd/
- /sensor/id/timezone/tz/count/sc/year/yyyy/month/mm/day/dd/hour/hh/
- /sensor/id/timezone/tz/count/sc/year/yyyy/month/mm/day/dd/hour/hh/min/mm/
 
Where:
- id is the virtual device id for the sensor; 
- tz is either universal or local, where local is the local time of the sensor, not the client;
- sc is the sample count required, of which, more in a moment;
- yyyy is the four digit year; 
- and so on.
 
The first of these API forms returns the latest measurement for the sensor. The rest should, I hope, be self explanatory.
The next sections outline how this API exploits the properties discussed earlier in order to achieve fast response times and scale.
Exploiting the Predictability of Sample Timing in HTTP
For any measurement, be it raw or decimated, the sample interval is known as is the timestamp of the last sample. Therefore, the expiry time of the resource can be calculated more or less exactly. (More or less because it is known when the sample should arrive, but this is subject to uncertainties like network latency, server load, etc.) 
Consider the following request for the latest measurement from a sensor:
  /sensor/ae43dfc8/
Assuming the sensor has a fifteen second sample rate and the last sample was seen five seconds ago, then a Cache-Control header is set thus:
  Cache-Control: max-age=10
Both the client and server can cache these responses. The workings of the server cache are discussed later. For now, it suffices to say that the server and server cache guarantee that the 
max-age value accurately reflects the time until the next sample is due.
The server also adds a strong 
Etag header to provide for the case where a measurement has expired but no new measurement has been processed by the server.
How useful is a ten second expiry time? The answer depends on whether a client or server side cache is considered. On the client, the utility is probably very small. The value becomes apparent when looking at immutability and decimation.
 
 
Exploiting the Immutability of Samples in HTTP
 
Every measurement, be it raw or decimated, is immutable. Put another way, a measurement never expires and the property of immutability can be exposed via HTTP's caching headers. So:
- /sensor/ae43dfc8/timezone/utc/count/210/year/2014/ has a response that becomes immutable as soon at the year reaches 2015.
- /sensor/ae43dfc8/timezone/utc/count/210/year/2015/month/08/ has a response that becomes immutable as soon as the date advances to 1 September 2015.
- /sensor/ae43dfc8/timezone/utc/count/210/year/2015/month/10/day/14/ has a response that becomes immutable as soon as date advances to 15 October 2015.
The result of the GET request is modified thus:
  Cache-Control: max-age=31536000
Note that 31,536,000 is the number of seconds in a year and was chosen because 
RFC 2616 states:
To mark a response as "never expires," an origin server sends an
  Expires date approximately one year from the time the response is
  sent. HTTP/1.1 servers SHOULD NOT send Expires dates more than one
  year in the future.
When considering immutability of responses, the server takes into account timezone and also decimation dependencies - in other words, the expiration is correctly applied (decimation makes this more complex than one would imagine at a first glance).
What happens when a request spans a time period is best answered by discussing 
decimation.
Exploiting Decimation of Time-Series Samples in HTTP
Decimation is 
the means by which the number of samples is reduced. The sample count parameter is the means through which the user requests decimated values.
Consider a request for 200 data samples for 14th August, 2015 where the start and end of the month is defined in UTC:
  /sensor/ae43dfc8/timezone/utc/count/200/year/2015/month/08/day/14/
The server uses the sample count to select a 
decimated time series that provides 
at least as many samples as specified in the sample count for the requested period. In addition to 
predictability and 
immutability, decimation provides two major benefits:
- decimated values are pre-computed; and,
- many requests map onto one response.
Because decimations are created at a power of two times the network base period, decimation levels may not provide the exact number of samples requested, particularly at higher decimation levels.
The API implementation first checks if a decimation level maps exactly to the number of samples the given period and, if so, returns these samples. If not, the server provides an HTTP 301 (Moved Permanently) response with URL pointing at the sample count appropriate for the decimation level.
  HTTP/1.1 301 Moved Permanently
  Location: /sensor/ae43dfc8/timezone/utc/count/210/year/2015/month/08/day/14/
  Cache-Control: max-age=31536000
Although this incurs some additional overhead, it should be noted that 
HTTP 301 redirects are followed transparently by XHR requests and are cacheable by the browser - for this request, the cost of a round trip to the server will be incurred once only. Of course, users of the API who know the network base period can avoid a redirect by looking up the appropriate decimations themselves.
As noted, decimated samples follow the rules of 
predictability and 
immutability, thus, as the decimation level increases, the time between samples increases and the expiry time increases. Consider two requests for a year's worth of data for a sensor where the sample count is varied:
 
 
- /sensor/ae43dfc8/timezone/utc/count/3375/year/2015/ 
- /sensor/ae43dfc8/timezone/utc/count/106/year/2015/
 
Given a 400ms base period for the network, request:
- maps to decimation level 6 where samples are updated every 00:00:25.6; and,
- maps to decimation level 11 where samples are updated every 00:13:39.2.
In our testing, we found that for a building management system, 200 or so samples per day proved to be more than adequate for a graph to capture the trends in temperature, pressure or humidity over a 24 hour period. For a user tracking a building, this results in a maximum of one HTTP request per sensor every 6 minutes 49.6s which represents a minuscule load on the server.
Choosing an appropriate sample count for a time series query has a dramatic effect on the likelihood of a cache hit and also server request loads. 
A Note on HTTP/2
Our API was designed with an eye on SPDY which became HTTP/2 during development.  By 'designed with an eye on SPDY' I mean that the design actively sought to leverage SPDY features:
- Most particularly, the overhead of multiple HTTP requests is negligible in SPDY/HTTP 2.
- SPDY push enables HTTP 302 (redirect) and even cache invalidations to be inserted into the browser cache.
Our server runs on Java and we chose Jetty as our servlet container as it had early support for SPDY. As it happened, this caused us some considerable pain during development as Jetty's support was immature.  As I write, we run on Jetty 9.3 (with HTTP 2) who's SPDY growing pains seem to be long behind it.
The switch from HTTP 1.1 to HTTP 2 gave us instant and fairly dramatic performance improvements with respect to throughput.  I'll talk more about these when I discuss the client side JavaScript API that sits atop of the RESTful API discussed here which will be in another post.
Strengths and Weaknesses of the Design
The obvious weakness in the API is that the end user has to make multiple requests from the server in order to assemble a time-series. This problem can rapidly become quite complex:
- A user wishes to plot a time series spanning three entire days, all of which fall in the past, and so three separate HTTP requests must be made and the results spliced together. 
- It is 1pm and a user wishes to plot a time series the prior 24 hour period. The user must choose whether to make two requests (as above) or 14 requests and splice them together.
- A user wishes to plot the last two weeks of data which span a month boundary. The user can choose to make two requests (for months), or one request for the prior month and multiple requests for days for the current month, or make multiple requests for all of the days. 
The general problem of "what is the optimum sequence of requests to fulfil an end user's data requirement?" gets harder if the hour / minute API is used.
More subtly, a (power) user has to be not only smart but also consistent about which requests to make if they wish to maximise the chance of a cache hit and therefore deliver the best interactive performance to the end user.
Finally - and perhaps most importantly - splicing the results from multiple requests that might span multiple time spans and perhaps also have missing data is just painful.
The API is: RESTful; individual requests are fulfilled very quickly; a great match for HTTP/2, but that does not mean it is easier than the old-school API we started with: it isn't.
These are real issues and, at Hunnin Mesh, we immediately ran up against them when dogfooding our UI for building managers. Since I designed the API, I knew from the outset that a JavaScript wrapper would be needed that exposes a higher level API - one that supports date range requests. That API went through two major versions - the second of which was a major undertaking - and I intend to write about it in my next post.
Summary
This architectural approach can be summarised as follows:
- Immutable time series means cacheable HTTP responses which, in turn, means that Cassandra queries stopped being a bottleneck. 
- HTTP responses are cached at the server and may be cached at the client.
- Responses can be pre-computed and, therefore, the server cache can be primed.
- Deployment on an HTTP/2 server (Jetty) meant that the network overhead of many HTTP requests is immaterial to performance. 
- The job of the server then is typically reduced to authentication and serving of static responses from a cache.
We started with a traditional query based API which delivered page load for three graphs (three requests) in approximately six seconds. 
A measure of our success is that with an empty browser cache, in six seconds, we were able to increase the number of objects graphed from three to six hundred. 
Moreover, irrespective of whether the browser was able to displaying a year's data, a month's, a day's or an hour's, the time to render was a nearly constant. 
I'll write another post musing about the services provided by this JavaScript API at when I get the chance...