Monday, 9 May 2016

Sanden "Eco®" Hot Water Heat Pump + Solar Panels Revisited

My Sanden "Eco®" Hot Water Heat Pump wasn't providing me with hot water 100% of the time because of the "blockout" settings I had programmed. The system is an engineering marvel, but, as I wrote about earlier, its software and control system seems pretty weak.

In that post, I discussed how I tried and failed to use my 315L hot water tank as storage for excess energy generated from my solar panels. My problem was that I had a narrow "blockout" time window programmed (starting at 3pm and ending at 11am). During the time window, I have "free" power form my solar panels and, for that reason, I want the system to recharge the tank, thus storing energy that I would have otherwise sold to the grid.

Back to the present. Over the weekend I looked at why my timings did not work. I came across a somewhat dated (2011) Service Technical Manual for the same heat pump model which includes the following description of the logic used to control water heating:
Conditions to start and end heating water are as stated below: 
  1. Tank TH A ≤ 45°C (Lack in residual hot water in tank)
  2. More than 24hrs passed since the last start of operation. (for anti-Legionella purpose)
  3. Memory of last operation is lost. (First run after delivery, ROM writing error, etc
  4. Electrical shutdown is occurred while HP unit is operating.
  5. The current time is 10.00 (24hr clock basis).
  6. HP unit was not operated by the condition (5) due to a limitation of HPU operation. HPU operation starts once the limitation is cancelled. The term limitation defines a condition where HPU cannot be operated due to the electricity cutoff or the blockout time setting.
HP unit starts when either of (1), (2), (3), (4), (5) or (6) above is satisfied. 
Condition to end, GC Inlet TH > 50°C.
If this logic still holds (the document is dated 2011), then I can't explain why we occasionally went without hot water: conditions 2 and 6 should have ensured tank recharge even if condition 1 was not met.

Nevertheless, I went up the ladder again, unscrewed the four screws, took off the top panel and fiddled and re-enabled blockout, widening the time range by two hours so that the heat pump can run between 10am and 4pm.

For the last two days, our heat pump has come on at exactly 10am.

I have no idea which of the six conditions listed above is being satisfied.  But I'll be keeping an eye on the system. With my previous blockout settings, I would expect point 6 to be satisfied and the system to come on at 11am, only it did not seem to.

Not content that I do not understand what's going on, I'm going to perform an experiment and change the system back to 11am through 4pm (best for my solar panels) and check for a regular 11am start.

In summary, the additional information provided in the technical service manual has left me more confused. However, for the present, all appears to be working on a blockout timer and that was my goal.

Tuesday, 3 May 2016

Usability and Browser Reflow - Making the Invidious Less So...

In a previous post, I made the point that browser reflow is the major issue driving me to the adopt an ad-blocker for my iOS 9 devices. I can live with ads in my content - they are the price I pay for all of that content - but, I cannot live with a page whose content scrolls down out of view after page load.

The (hated) scrolling of content after page load, is, in many cases, the direct result of asynchronous advertisements being inserted into the page. Technologies like ad-blockers and 'Reading Mode' help to alleviate this problem but cause the unfortunate effect of pulling the rug out from under the content provider by blocking their only source of revenue.

In this post I make a suggestion for how a browser might mitigate the problem of reflow.

Reflow

A web browser downloads a document, usually written in HTML. The job of the browser is to translate the HTML, creating a Document Object Model (DOM) describing its content and, from this, a visual representation on a device's screen. In order to draw each letter and each image onto the screen, the browser has to calculate the position of each element in the DOM. This computational step is called reflow.

As Paul Irish shows, reflow occurs in response to a bewildering number of reasons, but to keep it simple, I'll illustrate the problem using asynchronous advertisements that alter the DOM.

Asynchronous Advertising = Reflow++

The DOM of a web page can be modified after it has been loaded; asynchronously, that is. Many web pages include advertisements and these are inserted into the page in different positions, asynchronously and unpredictably.

Asynchronous ad delivery was a big thing when it arrived on the scene in 2010. Fast forward to 2016 and everyone's doing it and doing more of it: both the number and the weight of ads has grown enormously. According to websiteOptimization.com, there has been an exponential increase in both the number and combined size of the objects that are used to display a web page.

The growth of an average web page showed a dramatic rise from 2010 onwards 
A direct consequence of the increased number of ads a is decline usability of the content containing the ad.

Reflow is Worse on Mobile

Reflow on mobile creates an awful user experience. Take a look at this video of an iPhone 5 sized screen rendering a page from Forbes with bandwidth throttled to 3G speeds:


I loaded the page and then scrolled down as I read the article. The video begins after I scrolled down and highlights the problems with reflow that occur at about the six and eight second mark: although I did nothing, the paragraph I was reading gets shunted down.

What's going on behind the scenes is that a scripts execute after the page load, creating space in the page for an ad (it modifies the DOM) and then make an asynchronous request for the ad content itself. The space allocated for the ad appears white. The text on the page shunts down and then up as a result of multiple reflow events (as I said earlier, there are many many reasons for reflow).

Here's a before and after view of the page content with blue shading indicating content that is not visible to the user.


In this second image, a script has executed, creating a space for an ad that has not yet arrived over the network. The content I was reading is pushed down and is no longer visible in the viewport.



This scenario is real, is incredibly annoying, is more likely on slower (or rather unpredictable) networks and, is more likely if the display is tall and thin. Reflow is worse on mobile.

Can't We Change the Browser?

Having established that the user experience of reflow is awful, we need to look and see whether there might be a cure. The first place to look is the browser itself. A user story is very simple indeed:
If I scroll the page to view content, then the browser should make all possible efforts to ensure that that content remains visible.
This requirement puts the user experience first and it does not mention reflow - it's just plain common sense. As I write this post, I am tormented by the fact that someone must have thought of this. Why doesn't a browser behave this way already? However, browsers do not handle this situation well.

To implement this, we have to define an anchor point; an element visible in the viewport which is destined to remain visible after reflow. Intuitively, I'd select the element that is at the top-left of the page that has the highest Z-order and is scrollable (assuming an left-to-right language). The anchor element would be updated every time content is scrolled. (R-t-L and vertical writing modes would also have to be accommodated.)

Here's an illustration of my 'anchor' element:


The anchor (red box) is chosen because it is the in the top-left of the viewport after it has been scrolled, does not have a fixed position, is scrollable and has been identified as a 'content element' much as a browser reading mode identifies content elements. Assuming an ad is injected above this anchor element, then the browser would recalculate the height of the content and then align the viewport so that it is in the same position relative to the anchor.


Using a strategy such as this would eliminate a whole class of usability problems caused by content injection and reflow.

I know it Can't be this Simple But...

The strategy currently employed by web browser is to retain the position of the top of the page relative to the viewport and thus when content is injected into the DOM, elements below the new content to shunt downwards.

My suggested strategy is to identify an anchor element that is visible in the viewport before DOM alteration in order to ensure that the element remains visible after DOM alteration. There are many cases where this strategy cannot work. What I'm after is a strategy that improves usability in most cases.

As I said earlier, my sneaking suspicion is that this suggestion is naive. I know that multiple elements on the page can be made scrollable.
I find it hard to believe that this has not been considered by better minds who are closer to the problem than me. 

Because "56.1% of all [ad] impressions are not seen, some ads are injected into the visible viewport. (The best placement for an ad is at the bottom of the page or, just "above the fold", in industry speak.) My strategy would not help much in this scenario.

"Content that holds a user's attention has the highest viewability" - a statement that is about ads but is equally applicable to the content containing the ad. It seems to me that, on mobile in particular, asynchronous ad injection circa 2016 degrades "viewabilty" of the content containing the ad because, at worst, it make reading some pages unendurable and, at best, makes the reading experience merely annoying. Perhaps a change in the browser could alleviate some of the usability problems and making the invidious less so?

Wednesday, 13 April 2016

Sanden "Eco®" Hot Water Heat Pump System, Solar Panels and an Internet of Things

Renovating with a Green Tinge

We have recently renovated our 1915 vintage house in Neutral Bay, Sydney. Both my wife and I have PhDs in Atmospheric Sciences and both felt strongly that the new build should be as energy efficient as possible. The design is thermally efficient, including passive cooling features. We recycled huge numbers of bricks from the demolition in order to limit the carbon-footprint of the build itself. Finally, we installed a heat-pump hot water system and 2.2kW of solar panels so as to limit the ongoing carbon footprint of our family of five. (We considered batteries, but found we could not afford them, so we're still very much connected to the grid for our power needs at night and on cloudy days like today.)

We moved into our renovated home six months ago and this post provides some thoughts on what we have learned - just what shade of green are we?

Our Installation

We have:
  • a 315L Sanden "Eco®" Hot Water Heat Pump System that we purchased through the the Eco Living Centre (who come highly recommended);
  • 2.2kW of solar panels installed on a roof that faces slightly north of west;
  • micro-inverters that report panel generation via ethernet over powerline; and,
  • a "MyEnlighten" monitoring solution that reads power generation data and provides it to a cloud based monitoring solution from Enphase Energy.
This serves a family of five.

Like many others with solar panels, we generate surplus energy during the day that we sell to the grid for 8c per kWh. At night, we draw from the grid, purchasing energy at 40c per kWh. As I said, we do not have a battery system as we could not afford one.

My primary motivation for choosing a Sanden system was as a means of storing excess energy from our solar panels - it is my battery proxy.

Hot Water from Sunshine?

The Sanden heat pump draws 1kW. According to the manual, a tank recharge takes between one and five hours depending on the ambient air temperature, humidity, inlet water temperature, etc.

Since November, we have observed that the compressor runs for an hour each day or even less in January, which should come as no surprise given that we're in Sydney and are discussing the summer months. (The compressor is wall mounted and is so quiet that it's actually difficult to know when it is on: the noise of the drip from the condensation tray is easier to register than the low hum from the compressor itself.)

In terms of power generation, 17 February, 2016 has proven to be our best day, with 9.7 kilowatt-hours produced. On our west facing roof, generation exceeded 1kW between 11am and 4:30pm (AEDT, so about two hours either side of solar noon). At a first blush then, we generate more than enough energy on a sunny day to provide for all of our hot-water needs and much more besides.

Our challenge has been to align water heating with the availability of this "free" electricity.

Controlling the Heat Pump

The Sanden heat pump has a blockout time mode (this is something I looked into before purchasing the unit). This is as simple as it sounds: it sets a range of times where the heat pump is allowed to operate.

So, I set the blockout time to be between 4pm in the afternoon and 11am in the morning in the expectation of hot water production when power was likely to be "free" and, coincidentally, the ambient air temperature was likely warmest (thus ensuring optimum conditions for heat pump operation).

And mostly, this worked. Mostly. In practice, what happened was that we sometimes ended up with no hot water at all.

The reason is simple: the blockout time window is too short and/or does not align with our water usage patterns.

The user and installation guides for the system state that hot water generation starts when either of two conditions is fulfilled:
  1. "The water heating cycle operation starts automatically when the residual hot water in the tank unit becomes less than 150 litres"; or,
  2. "The system will run once the power becomes available and the temperature in the tank drops below the set point of the tank thermistor".
So, for example, if there is 151 litres of water in the tank and this water is 1ÂșC above the thermistor set point, the heating cycle will not start. Consider a tank in this state at 4pm, with the compressor blockout timer set from 4pm through 11am. The 151 litres is not enough last through to 11am and, as a result, the kids do not have enough water for a hot bath and the morning shower is properly cold. Of course, the next morning, at 11am, the tank is fully recharged and so the system never remains in this state for more than one day.

In summary, setting the blockout mode for the heat pump to align with free power generation works most of the time, but not always, and that's a problem when we're talking basics like hot water. For this reason, we have disabled blockout mode as it does not provide us with a reliable hot water supply. Since we disabled blockout, the compressor is typically switching on at about 8pm each night meaning we are paying for hot water generation at 40c per kWh on days when we know we are selling 4 kWh of power to the grid at 8c per kWh. This stings!

Improvements to the Sanden Control System?

The firmware in the Sanden unit is pretty basic. Blockout is just that: the system will not generate hot water outside of the allowed hours. At a guess, I'd say that the firmware is implemented with nothing more than a timer. There are, I think, a couple of approaches that might easily address the problems.

Force a Tank Recharge at a Set Time

On 61 days of the seventy nine days between 20 Jan through 13 April (today) between 1pm and 2pm, our solar panels generated in excess of 1kW. That's enough to drive our heat pump.

From this observation comes a simple requirement: I should be able to tell my system to just switch on at 1pm every day and fully recharge the tank irrespective of the normal rules outlined earlier

Conceptually, this is simple to implement and, for someone like myself (who monitors our power generation), would be ideal because it would:
  • have provided me with "free" hot water for 77% of the days since January 20;
  • eliminate cyclic 'cold shower' days because the tank is recharged every day;
  • run the compressor once a day, just as it does usually does at present; and,
  • mean I would not have to go up a ladder and fiddle with blockout mode.
I am not suggesting that the "start at time" should be the only time the compressor is allowed to run - instead, on heavy usage days, the compressor will also run at other times to deal with demand.

It seems to me that this would be a very simple firmware change and it would have shaved an additional $20 off our Q1 electricity bill.

Implement 'Look-ahead'

In our example, the blockout window - from 4pm through to 11am - is rather long. The longer the blockout window, the greater the chance of a cold shower.

A look-ahead function would check: the current time against the start of the next blockout time; the length of that blockout period; and, the tank state. The idea being that the system could make an educated guess that the hot water will be exhausted and initiate a recharge before the blockout period starts.

The devil here is in the detail with the need for considerably more complex firmware. In our case, this would be neither as simple nor as effective as the simple "switch on at" time.

Make Blockout an Weak Signal

Cold showers could be eliminated by making "blockout" a weak signal: a preference rather than a hard and fast rule. That is, the heat pump would be allowed to operate during the blockout period in order to provide continuity of hot water supply.

A weak blockout mode would have to maintain a balance between the need for hot water, the strong preference to run the system during normal hours and the need to limit as far as possible the number of times the heat pump is started/stopped (so as to maximise the lifetime of the compressor).

So, as a first stab, I suggest that in blockout hours, the compressor be allowed to run only when the hot water supply is critically low and even then, for a limited time only in order to give a partial tank recharge. This should ensure that at the end of the blockout period, a full tank recharge happens.

Once again, the devil is in the detail here: we're talking about firmware with considerably more complexity than the current system. Once again, for us, this would not be as effective as the simple "switch on at" time discussed above.

Wishful Thinking

These suggestions are just musings about how the Sanden unit could be improved. They will not make any difference to us, unfortunately. 

My Water Heater, My Solar Panels and the Internet of Things?

Thus far, I have limited my suggestions to changes that might be made to the Sanden Heat Pump's firmware. In this section, I will briefly discuss the implications of an Internet of Things as it relates to our solar panels, hot water system, and other domestic appliances.

The Sanden unit has no remote control. Neither does the heat pump nor our solar panels, nor our fridge for that matter.  These devices are 'dumb' in the sense that they are not connected to any network and therefore they cannot be remotely monitored or controlled. Although connected to the Internet, the MyEnlighten energy monitoring system tells me what I generate, but not what I am using. In short, there's a huge gap here that will, over time, be filled by the Internet of Things (IoT).

The vision for the IoT in the home is that every device will be connected, usually to a 'hub' that provides means of access and automation. A common example being smart lights that are controllable from a smart phone.

As Nest has proven, the IoT has a massive role to play in home energy management. Sadly, nascent "hub" products such as Nest's Revolv have not proven to be a good investment. Looking past this and other similar examples, the benefits of home automation mean that this will happen.  But not in our household yet.

In our home, we manually program our washing machine to come on in the early afternoon and try to do the same with our dishwasher. We do this in order to maximise the use of our own power, just as we tried with the Sanden Heat Pump.

In future, these devices will tell the home automation system that they need to be switched on at some time and for how long. The system will know how much power each device draws (it will have learned through experience). It will also know how much power is available for free and, again through experience, a knowledge of the weather forecast and data shared devices in the same region, how much power is likely to be available. It will then decide on the optimum order for device activation and, when the time comes, tell each device to start.

Bring it on, I say.

Which Shade of Green?

In my opening, I mentioned shades of green. Well, since installing solar and the Sanden system, our electricity usage for a family of five has fallen to less than the average for single person. That's pretty good, yes?

No. It's not good enough: we are drawing between one and two kilowatt hours per day for the Sanden unit when I can show that on 77% of the days since January 20th, this is unnecessary.

The Sanden "Eco®" Hot Water Heat Pump System is excellent: it is amazingly quiet and very efficient. In other words, the engineering is superb.

It is the "programability" of the Sanden unit that turns out to be a bit of a disappointment.


Postscript

Having written this, I decided to do the obvious thing and re-enable the blockout mode using a wider time window.  I'm going to try 10am through 4:30pm and see how that goes - if we endure cold showers again, I'll make it wider still.

BTW, in April, at 4:30pm our panels get shaded - our generation falls from about a kilowatt to nil in about a minute so that's why I have chosen this time.  

Second Postscript
Tried that, did not work.  It is only possible to set a start and end time for blockout. So, I had to widen the operational time range. Not real happy with that.


Third Postscript More in this post.


Thursday, 17 March 2016

Huninn Mesh Part 5 - A JavaScript API for Sensor Data

Huninnmesh is a vendor that sells battery powered wireless sensors that act in a dense mesh network to deliver large numbers of environmental measurements from mid to large sized buildings to a cloud based server and thence to a manager whose job it is to optimise environmental conditions for comfort and cost.

In this post I will look at the high-performance JavaScript library that I built that sits atop the Huninn Mesh RESTful API for sensor data. It was explicitly designed for us in JavaScript applications (AKA single page apps).

The Need for a Client Side API

Our RESTful API, quite deliberately, limits the form of data that it serves: a user can request a an hour's data, a day's data, a month's data or a year's data for a given sensor. The reasons for this are set out elsewhere in detail (but the answer is performance). Where a user wants to view data that falls outside of one of these intervals, then client side manipulation is usually required. Specifically, the cutting and splicing together of responses from one or more web requests in order to assemble the time-series data set the user does need.

To illustrate, the API does not support the following 'traditional' parameter driven query:

  /getSensorData?from=1451649600&to=1451736000&samples=20

This query represents a request for 20 samples spanning the period noon on January 1, 2016 through noon January 2, 2016 (UTC). Visually:

Huninn Mesh does not support a traditional from-to style API. Note that Raw, d(1), d(2) and d(3) represent the raw sample data and its decimations, as discussed in another post.
As noted, the API supports discrete query intervals (hour, day, month, year). So, the user might either make a single request for the entire month of January's data, or 24 requests (hours of data), or two requests as follows:

There are multiple combinations of API requests that can be made to fulfil a request that spans a day, in this case, two requests are made.
This illustrates the need for a higher level JavaScript API: convenience for the end user.

In its simplest form, our API moves the query parameters into a JavaScript object that's passed to an API with an appropriate callback.

 hmapi.requestTimeSeries(
  { sensor: ae43dfc8, from: 1451649600, to: 1451736000, samples: 20 },
  myCallback
 );



Library Implementation: Our First Attempt

Our first attempt to implement this API took a simple and straightforward approach to the problem. The library:
  • always chose intervals of days when downloading sensor data, regardless of the actual period of interest to the user;
  • ran on the main browser thread; 
  • made XHR requests to the server;
  • rebuilt time series objects whenever an XHR response was received; and,
  • made callbacks whenever data in a time series was updated. 
The library was implemented over the course of a couple of sprints and enabled us to gain a good understanding of how well caching on the client and server worked in practice.

It worked well - page load times were reduced from the order of six seconds to two seconds with a clear browser cache and much less with a primed cache. We were able to ingest HTTP responses at the rate of approximately 60 per second.

More importantly, the library exposed a range of issues on the browser-side.

Weakness Exposed

From the outset, our RESTful API was designed to be deployed on a SPDY capable web server. We chose Jetty 8.x as our servlet container with this in mind. However, at the time we wrote the system, neither Internet Explorer nor Safari supported SPDY. Not surprisingly, we saw worse performance on these browsers than on Chrome or Firefox which had SPDY support.

Regardless of whether SPDY was used or not, with an empty browser cache, we found that performance (the elapsed time to load a complete time series) rapidly degraded as the length of the time series was increased. This was due to the decision to choose intervals of a day for all server requests. Not surprisingly, when the time range was extended to a year for graphs of temperature, pressure and humidity, performance degraded because the library increased the number of HTTP requests to 1,095 HTTP or so.

Furthermore, as I noted earlier, we limit graphs to 200 data points (that is, 200 data points were graphed irrespective of whether the time range was a year or a day). The library varied the number of samples requested according to the time range and, as a consequence the browser cache hit rate declined.

At a small time range, for example an hour, the API might choose the raw time series whereas for a long range of, for example a year, the d(4) decimation might be chosen.
In operation, the library loaded data from the server even when the local browser cache had data for the required day, albeit with the wrong number of samples (at a different decimation). This is something I'll circle back to later.

Additional (substantial) issues were identified including the following, non-exhaustive, list:
  • for each completed XHR request, a graph was redrawn - with a large number of requests, this led to a 'spinning beachball' as the browser's main thread was fully occupied drawing graphs (this was particularly bad on Firefox and Chrome which used SPDY and, therefore, received responses more quickly);
  • the six connections per host limit for the HTTP 1.1 based Safari and IE imposed a significant throttle on throughput leading to longer load times before a graph was fully drawn (but mitigating the spinning beachball problem mentioned earlier);
  • users tended to rapidly navigate back-and-forward between views over different datasets often before graphs were fully drawn leading to very large backlogs of XHR requests building up;
  • probably as a result of the prior point, we managed to crash various browsers (particularly mobile) during testing due to memory issues that, we thing, were related to very large XHR backlogs which were exacerbated by a server bug (aw snap).
In summary, we prototyped our first JavaScript API, but found that it was not viable.

Requirements Real JavaScript Library & API

Learning from our prototype, we arrived at the following requirements for the version 1 library:
  1. Off main thread dispatch and processing of time series requests.
  2. In-memory cache to enable substitution of time-series as well as serving of stale time-series.
  3. Prioritised download of time series absent from the cache.
  4. Active cache management for memory.
  5. Smart(ish) API usage.
  6. Rate limiting of XHR requests (stop SPDY floods).
  7. Debounced callbacks.
  8. Detection of missing data.
The following sections provide more information on how these requirements were implemented in software.

The implementation was driven by one additional overriding requirement: that of cross browser compatibility including on Mobile where iOS 8 and Android 4.x were our baseline.

One final note, I also opted to go vanilla JavaScript with zero external library dependencies. Thus, I was limited to APIs supported across the latest versions of IE, Chrome, Firefox and Safari - desktop and mobile - circa March 2014. That is, no Promises, no Fetch, no ServiceWorkers and no third party shims or polyfills...

How it Works

The final implementation is best explained via an interaction diagram: 

An interaction diagram illustrating the major components of the library. Asynchronous events are shown being initiated by a blue event loop. Areas shaded orange represent the Huninn Mesh library and those in blue represent core browser features. 
This diagram, though simplified, gives a good sense of how data moves through the system. The blue loops indicate entry points for asynchronous events. The system flow, in the case of data that needs to be downloaded, is roughly as follows:
  • The user requests data for a sensor over a time range.
  • The Time Series object checks with a cache object which has no data for the given range.
  • The Time Series object delegates download, making a request via a Dispatcher object.
  • The Dispatcher object, having made sure that there is no outstanding request for the same data, posts a message to a Web Worker with details of the resource requested.
  • The system invokes a callback in Handler object in a worker thread.
  • The Handler creates an asynchronous XmlHttpRequest and sends it.
  • When the XmlHttpRequest completes, a callback in the Handler object is invoked.
  • The Handler then performs various functions including JSON decoding, error checking, parsing of HTTP headers to determine expiry, gap detection in the time series data and finally, generation of statistics about the time series.
  • Having done this, the Handler posts a message back to the main thread, containing the resulting data.
  • The Dispatcher receives a callback containing the result from the Handler and invokes a callback on the Time Series object.
  • The Time Series objects passes the result to the local cache.
  • The Time Series then checks to see whether it has all of the data chunks required and splices them together.
  • The Time Series sets a timer.
  • When the timer fires, the Time Series checks the its freshness and passes the result to the user function that graphs the result.
This, I admit, looks like an awful lot of work in order to improve performance. I'll circle back to the earlier stated requirements, I'll explain what this was all about.

Off Main Thread Processing

All XmlHttpRequest processing was moved to Web Workers.

From the outset, I was concerned that this might be overkill: I reasoned that the XmlHttpRequest is limited by IO and a worker would add enough overhead to cancel out any benefit. In particular, the JSON response from the server has to be parsed in the Web Worker, manipulated, then cloned as it is passed back to the main thread.

In practice, we found that we got a significant, but inconsistent, performance improvement by moving to workers.

In particular, Chrome (Mac OSX) was idiosyncratic to say the least: timing was highly variable and, indeed, browser tabs sometimes fail under load "aw snap". I tracked this down to postMessage from the worker to the main thread: sometimes, successive calls to this API would take exponentially longer than the prior call, ultimately, crashing the browser tab after the last successful call took about eight seconds. (This problem got fixed somewhere around Chrome 41.)

Putting this issue with Chrome aside as a temporary aberration, the addition of a single Web Worker doubled XmlHttpResponse ingestion rate to about 120 responses / second with a clean cache.

As noted above, we perform additional tasks in the worker on the time series beyond parsing the JSON response: we also parse date/time headers, identify and fill gaps in the time series (missing measurements) and also generate statistics about the series (frequency distributions of values, etc). None of this work was done in the prototype library and so I can't comment on how slow it would have been if this were handled on the main thread...

Prioritised XHR Dispatch & Rate Limiting over SPDY / HTTP 2

Switching on SPDY on Jetty 8 led to immediate problems at the server: Jetty (version 8) would stop responding with all of its threads waiting on a lock - we spent a long time trying to work out why and we never got to the bottom of the problem. The clever Jetty folks seem to have fixed this in version 9, however.

We hit the server with an very large number of requests in a very short time from a single browser session. It was not unusual for us to dispatch 1,800 XmlHttpRequests in half a second (to retrieve temperature, pressure and humidity time series for a year for each of 50 floors in a building).

As noted earlier, in testing, we encountered browser instability when we created a large number of XmlHttpRequest objects. Therefore, we had to reduce the number of outstanding requests and manage dispatch ourselves.

The Dispatch object manages tasks performed on one or more web worker threads. Running on the main thread, it maintains two queues: high and, low priority requests, and also the number of tasks dispatched to each of the worker threads it manages. Tasks are posted, round-robin, to each thread in the worker pool in an attempt to balance load. The Dispatcher also tracks the number of messages (tasks) posted to each web worker. This measure acts as a proxy for the maximum number of outstanding XmlHttpRequests and thus is the basis of rate limiting. Only when there are no high-priority tasks pending do low-priority tasks get posted.

Thus, the Dispatcher performs rate-limiting, attempts to keep each worker equally busy and manages download priorities. Profiling showed that the overhead of the Dispatcher was tiny.

For the record, after some experimentation, I found that four worker threads each handling a maximum of 25 tasks gave the best performance and worked reliably across iOS mobile Safari, through Chrome, Firefox, IE and even Mobile IE. Using these settings, a maximum of 100 XmlHttpRequests can be active at any moment.

In-Memory Cache

The value of download prioritisation becomes apparent when discussing the in-memory cache. The cache contains API responses returned from the worker. That is, these are not the compete Time Series objects but the objects that are spliced together to create the Time Series requested by the use.

The biggest 'win' from the cache was that it allowed response substitution and this, in turn, allowed download prioritisation.

Response substitution is simple:

  • if the cache has a response with the wrong number of samples, then this response can be substituted;
  • if the cache has a response that has expired, then this response can be substituted.

Substitution, means spliced into the Time Series response and pushed to the user.  This implies that some time series objects have varying sample rates and/or missing data.  Users can, of course, opt-out and disallow substitution but, in our own apps, nobody ever did - graphs with the wrong number of samples are better than no graphs at all.

From the end-user's perspective, the effects of the in-memory cache were dramatic to say the least with the entire app appearing to become far faster.

Requests for which there is no substitutable item in the cache are immediately requested from the Dispatcher at a high-priority. Requests that have a substitute item are requested at a low-priority. I should note that the cache was managed so that, data for individual sensors was dumped from memory ninety seconds after the last watcher of that sensor un-registered themselves.

Caching and download prioritisation and proved so effective that we experimented with cache pre-population (at a small sample size) - ultimately, the system was so fast that this did not prove to add much to an already fast user experience.


Debounced Callbacks 

Time Series objects were assembled progressively from multiple chunks of data. On availability of a new data 'chunk', this data was spliced into the Time Series and a callback triggered. Callbacks in user-land typically used a requestAnimationFrame and performed DOM manipulations (graph redraws, etc).

The sheer number of Time Series objects in use meant that we ran into an issue where we flooded the UI thread with work - we measured 1,400 callbacks/second at times. For this reason, we debounced callbacks, to 500ms per Time Series object. Again, although somewhat counter intuitive, this proved highly effective at keeping the UI responsive.

Retrospective

The Huninn Mesh JavaScript library weighted in at ~9,000 LOC (commented). It is the by far the largest chunk of JavaScript I have ever written. If I had my time again, I would:
  • choose TypeScript rather than vanilla JavaScript classes;
  • do a better job of unit-testing;
  • do a better job of estimating the project.
I found JavaScript really very nice as a language, I found that Web Worker support in particular was very good and that I could deliver what was IMHO an incredibly fast app into the browser. However, coming from a Java background, I really missed static typing and, in the larger sense, the convenience offered by TypeScript.

I did a poor job of Unit Testing. My day job is as a manager, and I found myself omitting tests due to deadlines. I'm not proud of myself - this is a pitfall I encourage others to avoid. I also under-estimated the effort required to author the library. It took me four months in total - about twice what I expected (although I had something working from about the six-week mark - all else was new features, optimisation and re-factoring as I learned the ropes).

The Results

The results quoted here are for four Web Workers and a throttle of 100 outstanding XMLHttpRequests.

  • My JavaScript library clocked a sustained data ingestion rate of 220 time series chunks/second from a clean browser cache (amazingly, I reached 100/second on an iPhone 4s). 
  • Results for a warm browser cache were surprisingly variable. In some cases, I saw well over 400 tx/sec and in others, no better than an empty cache.
  • Each additional Web Workers delivered incremental performance improvements but these were small after the second worker thread (Amdahl's law at work?)
  • HTTP/2 did exactly what it claimed: running tests on the server with a warm server cache and then switching HTTP/2 on/off led to a sustained improvement in ingestion rates of over 40%. I should note that I expected better! (Of which, more in a moment.)
  • It worked on all modern browsers (which, if my memory serves me well were IE 11, Chrome 39, Firefox 36 and Safari 7). Surprisingly, Chrome proved the least stable - particularly with developer tools open, I saw a lot of crashes. In fairness, I did most of my debugging in Chrome and so there's some serious bias in this observation.
  • Safari held the speed crown - from a clean start of the Safari process and hitting the browser cache, I saw 800/tx sec in one test. More generally, however, Chrome beat all contenders beating Firefox by about 10% and IE by about 40%.
The surprises I alluded to are that even under sustained load during torture tests, my CPU cores were not running at 100%. On my i7 laptop, I have eight cores. With four web-workers none of these cores ran at 100% for a sustained period, instead, sitting at 70% or so (message passing?)

By way of a wrap up, the RESTful API design, client and server caching of HTTP results coupled with a multi-threaded client side library written in JavaScript delivered massive performance improvements to our app taking us from a so-so multi-second refresh to a sub-second blink and you'll miss it.

This done, the JavaScript API allowed us to think more broadly about visualisation. Building managers are not interested in the average temperature for a building: they want to see every aspect of its performance over time scales varying from hours to years over hundreds of sensors. With our new API, it became easy to answer yes to challenges like this.