Replication Update, Smoothing the Rough Edges (pt. 4)

Since my last post about Diffusion’s support for high availability (HA) we have made substantial changes to how we replicate data in our latest release, Diffusion 5.6.

I’ve already written a little about session replication, topic replication and how we use Hazelcast so I will focus on updating you with the latest changes. We’ve done the usual work of upgrading the version of Hazelcast and releasing bug fixes. We’ve also added support for Unified API session failover and improved the handling of update sources for replicated topics. Unified API session failover helps move it closer to parity with the Classic API and makes the Unified API more viable for end-user development. The improved handling of update sources makes it easier to reason about updating replicated topics and behaviour on failover.

Unified API session replication

In 5.6 session replication for Unified API sessions has been completed. When session replication was first introduced Unified API sessions were unable to recover. This was because the session recovery process we planned needed to an existing session to identify itself without needing to reauthenticate and this required a protocol change. Now in 5.6 session roles and properties are replicated through Hazelcast and sessions can recover to different servers without having to reauthenticate.

The changes to session properties also replaces the SessionDetailsListener with the new SessionPropertiesListener. The properties listener enables distinct notifications about newly opened sessions, reconnected and failed over sessions. This was something that the details listener only had limited support for. It allows backend systems to better support failover.

Unified API sessions are frequently used to implement control clients. These clients may need shorter reconnection timeouts then end-user clients. While a session might recover many resources are reserved for it. This can include authentication handler control points. If a session has an authentication handler registered and disconnects, authentication requests may be queued for the session’s recovery which can delay the authentication and connection of new sessions.

final Session session = Diffusion
    .sessions()
    .reconnectionTimeout(60000)
    .open("wss://something.us.reappt.io");

Improved topic replication

Topic replication has been heavily reworked. Update sources are now far more flexible than they used to be, they now work how you expect.

Now when topic replication is enabled a distributed update source registry is created. The previous approach of binding topic paths to specific servers and managing update sources locally on that server has been removed. Servers now share all the information about update sources and make cluster wide decisions. This improves on how Diffusion coordinated update sources in earlier versions of replication by giving all servers a single view. You should remember this applies only to replicated topic paths, update sources registered for non-replicated paths with not be visible to other servers.

Update sources now failover between servers simply by closing the update source or the session that registered it. If an update source closes because of a client request, session failure or server failure the distributed update source registry is able activate another update source on any server.

 A hint at what is next

We are now looking at providing even better support for the clustering of Diffusion servers. This may include changes like the replication of Security and System Authentication stores to ensure that dynamic changes to them are shared by all members of a cluster.

Thanks to

  • Peter Hughes for implementing Unified API reconnection
  • Paddy Walsh for creating the session properties listener
  • Ivan Valeriani for replicating session properties
  • Phil Aston for replicating authentication information