Creating a WebSocket Server for PubSub

A pub-sub (publish-subscribe) server is a messaging system that facilitates communication between different system components in a decoupled manner.

In the pub-sub model, message producers (publishers) send messages to specific topics or channels without knowing who will receive them. Message consumers (subscribers) express interest in one or more topics and receive messages sent to those topics.

Publishers and subscribers do not interact directly with each other. Instead, they interact with the pub-sub server, which handles the distribution of messages.

The pub-sub server receives messages from publishers and ensures they are delivered to all relevant subscribers, via socket or WebSockets.

Pub-sub servers are often used for real-time communication systems where multiple clients need to receive updates simultaneously, such as chat applications, live data feeds, or notification systems.

A pub-sub server allows for scalable, efficient, and flexible communication in systems requiring real-time data dissemination.

Rather than buying a proprietary pub-sub server, many businesses may choose to write their own. This article considers how you might go about doing this, and discusses some of the challenges you may face.

Which Language Should I Use?

There are of course many languages that can be used to develop a server, but for the sake of this article we will assume that Java is chosen as it is widely used for enterprise-grade server implementation, it is portable, and it has access to an extensive range of open source libraries to aid development. With current JIT compilers, there is no need to be concerned about any performance issues you may feel there are with Java.

The Data Model

Pub-sub models typically publish to channels or topics. For simplicity, we will use the term ‘topics’. But what is a topic?

In the simplest sense, a topic is simply a named pipe to which a publisher sends messages, and subscribers to a topic will receive those messages as they are published.

But you may wish to consider the following:

Messages or Values?

A message in the pub-sub sense is just some data and can take any form. But do you want that data to be more strongly typed rather than the consumer always having to figure out what is in the message?

It may be more useful to think of what is in the topic as a ‘value’, albeit possibly a complex structured value (JSON, XML etc). So you may want to strongly type your topics so that consumers know what type of value they contain and can handle them accordingly.

Value Caching

If you want new subscribers to receive the latest value, you will need to cache the value in the server – this may be very important for less volatile topics as otherwise subscribers may receive no value for a topic for some time and will not be able to see the latest value. If values are cached, you may also want to allow clients to simply ‘fetch’ a topic value on an ad hoc basis rather than subscribing to all updates.

Pre-emptive Subscription

You may want topics to be dynamically created by publishers as and when required, rather than having a fixed set of them. In this case, you have to think about what happens if a subscriber requests a topic that is not yet there. You may want to save subscriptions in the server if they cannot be satisfied so that if the topic is created the subscribed client will start to get the updates.

Subscription Models

A subscribing client could simply subscribe to the topics it wants one by one. However, if there are a large number of topics this can be difficult.

By choosing to organise your topics in a structured manner, for example as a ‘tree’ of topics, subscription can be greatly simplified. For example, you can then allow a subscriber to subscribe to a branch of the tree rather than every single topic in the branch, individually.

You may also want to allow a client to subscribe to multiple topics by some simplified means, for example using ‘regular expressions’ which match against the topic names. The server would then need to match topics against such expressions on subscription and also match any newly created topics against existing expressions supplied by subscribers.

Organising topics in a tree also allows topics to be organised in a more granular manner, so that instead of a client subscribing to a single topic with a large structured value, the topics can be broken down into smaller value topics, leading to only the individual values being changed being sent to the client when updated.

Topic Lifecycles

Topics will consume memory in the server, so to make it more memory efficient you may want to consider managing the topic lifecycles. For example, you may want to only create a topic (notifying the feed to start publishing) when somebody subscribes to it. You may also want to consider automatically removing the topic after some time or when nobody has subscribed to, or updated it for some time.

Client-Server Communication

Again, there are several choices here, for example, HTTP, REST, graphQL, gRPC, MQTT, AMQP etc. Some are ‘pull’ based in that the consumer has to periodically request the values from the server but this is very inefficient. Ideally, you need a ‘push’ based paradigm where when the value of any topic changes ta the server, the new value is pushed to the subscribed clients.

Traditionally, web-based clients have used HTTP-based communication, but in general, it is far better to use the WebSocket protocol for the following reasons:-

  • Full-Duplex Communication: Allows simultaneous, bidirectional data exchange between client and server.
  • Reduced Latency: Enables low-latency communication compared to traditional HTTP requests.
  • Efficiency: Reduces overhead by maintaining a single, long-lived connection.
  • Scalability: Supports a large number of concurrent connections efficiently.
  • Simplified State Management: Maintains connection state easily without re-establishing context.
  • Cross-Platform Support: Compatible with all major browsers and multiple server-side technologies.
  • Versatility: Suitable for diverse applications such as online gaming, real-time updates, and collaborative tools.
  • Cost-Effective: Lowers bandwidth and server resource costs due to reduced data transfer and connection overhead.

Creating a very basic WebSockets-based pub-sub server in Java is relatively straightforward, but achieving high performance and scalability introduces many complexities. Below, we highlight the primary challenges developers might face and provide insights into addressing them.

Handling High Numbers of Concurrent Connections

WebSockets maintains persistent connections, which can lead to thousands or even millions of concurrent connections. Managing such a high number of connections efficiently is critical for performance and scalability.

The best way to overcome this is to use Java’s non-blocking IO (NIO) to handle many connections efficiently. This way you do not need to maintain a separate thread for each connection.

Using WebSockets with NIO directly can be quite complex due to the low-level nature of NIO. Directly implementing WebSockets with Java NIO involves managing the WebSocket protocol’s handshake, framing, and state transitions manually.

Writing an efficient multi-threaded server application is also not without its challenges. You need to avoid contention between threads and have efficient locking models. The use of lock-free/wait-free algorithms can be used to minimise such contention, and well-designed internal eventing mechanisms will significantly improve performance.

Efficient Message Broadcasting

In a pub-sub model, messages need to be broadcast to many subscribers simultaneously. Inefficient broadcasting can cause delays and reduce overall throughput.

Some solutions to this challenge are:

Asynchronous Processing

Use asynchronous message handling to avoid blocking the main event loop.

Message Queuing

Implement a message queue to handle bursts of messages efficiently.

You should consider that you will be sending the same messages to many consumers, but the queue of messages for each consumer will be different according to the topics they are subscribed to. For the sake of memory efficiency, it is therefore necessary to devise a mechanism for sharing the same messages across client queues without causing contention.

If a client is not consuming messages as fast as the server is producing them, individual client queues can build up and you will need mechanisms to prevent this causing unacceptable memory overheads in the server, and stale data at the client.

Message Batching

Sending many small messages to a client is very inefficient so where possible as many messages as possible should be batched into a single WebSocket message when sending to a client. This may involve some complex message buffer management.


The size of many data values can be significantly reduced using standard compression algorithms and these should be considered where appropriate. This particularly applies to structured data that has embedded tags that lend themselves to dictionary-based compression algorithms. JSON and XML data compresses well.


In some applications, a client is only interested in the latest value for a topic so you could consider ‘conflating’ the client queues such that only the latest value for any single topic is in the queue. This can involve some complex queue management.


Another way of dealing with slow-consuming clients is to apply ‘throttling’ to them. This means that updates for a client are only queued periodically rather than every time the value is updated on the server. This reduces the size of client queues.


If bandwidth efficiency is important you could organise your topics in a granular form as suggested above. But, if there are large topic values it is very inefficient to send the whole value every time a very small part of it has changed. To overcome this you may want to implement a ‘delta’ mechanism where only the changes between the previous value and the new value are transmitted to the client. Of course, this will need some mechanism at the client end to apply the delta to the previous known value. You need to consider the processing cost of comparing two values to produce a delta and then reconstruct a value at the other end.

Memory Management

WebSocket connections consume memory, and each message needs to be stored and processed. Poor memory management can lead to out-of-memory errors and application crashes.

Because there will be message queues for every connection you need to ensure that identical messages being sent to different clients are not duplicated. Ideally, the value being sent to the client(s) should be the same in memory as that cached for the topic and should only be pinned in memory until the topic has been updated and/or the value has been transmitted to each client that wants it.

Fault Tolerance and Reliability

Ensuring that the server remains operational despite hardware failures, network issues, or software bugs is crucial for reliability.

You could implement clustering to distribute load and provide redundancy. To achieve this you will have to devise a reliable mechanism for distributing state across nodes.

Use data replication techniques to ensure messages are not lost in case of server failures. Clients connecting to different servers in a cluster should see the same topic data for any particular topic.

Implement health checks and auto-recovery mechanisms to detect failures and recover automatically. Kubernetes can manage containerized applications for automated scaling and recovery.


Securing connections to prevent unauthorized access, data breaches, and other security threats is vital.

You will need to implement robust authentication and authorization mechanisms to control access to the server.

If you want fine-grained access control to your topics you will need to develop a robust RBAC (Role Based Access Control) mechanism. You should also consider whether you want these controls to be dynamic, in that they can be changed and applied while the server is running. This would involve dynamically subscribing consumers who now have access and unsubscribing those who no longer have access.

You should use TLS to encrypt WebSocket communications and protect data in transit.

You should also develop mechanisms to prevent your server from being brought down by a DDoS attack. For example, a rogue publisher sends huge amounts of data to cause out-of-memory errors.


Scaling the server to handle increasing loads without degrading performance is essential.

“Horizontal Scaling” is achieved by adding more server instances. You will need to use distributed state management to support horizontal scaling.

There will always be a limit to the number of connections a single server can support. Clustering with data replicated to each server can address this to some degree. Another mechanism is using tiers of servers, where a single primary server (or cluster) feeds updates to many secondary (or edge) servers supporting the client connections.

Other Considerations

The above sections cover what you will need to consider as a minimum when writing your pub-sub server. However, there may be other requirements you would like your server to satisfy, some of which are discussed below.

Delayed Updates

Some applications do not want the data from their publisher feeds to be immediately available to all clients. To achieve this you may want to consider being able to delay updates by some time interval before they are delivered to the clients. This may only be applied to certain clients (see Personalisation below).


You may want different clients to see the data differently according to who they are (or some property of the client – like are they connecting from a mobile device, or what is their authenticated role).

For example:

  • You may want to deliver all messages to most clients but throttle the messages delivered to mobile clients.
  • You may want some clients subscribing to a topic to see the data immediately, but others to receive a delayed feed.
  • You may want some clients subscribing to a topic to see completely different data from other clients.

Different Views of the Data

You will develop the data model held in your server based on the various publisher feeds. However, due to business changes or new applications, you may want to present the data differently without having to change all the feeds to produce a new data model.

To support this you may want to develop a mechanism for presenting the basic data model in different ways, for example, extracting only parts of a larger topic value into a different topic which is not part of the base model.

Historic Data

So far we have only discussed a topic having a single current value (albeit perhaps a structured one), but you may want to keep some history of the updates to a topic. To do this you could support topics that keep a number of updates as history and allow clients to query that history. You should consider the memory impact of such topics, especially if they have large values. You may want to offload older values to some other storage in some way.

Persistence and Recovery

What has been discussed so far is a memory-based pub-sub server, which means that the data within it is maintained only as long as the server is running. If you restart the server for any reason then the publishers will need to rebuild the data model from scratch.

If you want to be able to recover the state of the server when it was closed without having to rebuild the data model you will have to develop some persistence mechanism which writes all updates to disk in a way that the topic state can be restored when the server restarts. You could use a database for this, but that could have a performance impact. A faster technique is to write updates to an append-only log, but such a log would grow indefinitely so you would have to also develop a mechanism to periodically compact such logs, such that only the latest updates for each topic remain.


Depending upon the size of your business you may want to consider the federation of servers, so that independent server instances can communicate with each other, even in different geographic locations.


You will probably want to see how your server is performing and what loads it is dealing with. To do this you will probably want to build in some metrics that can be consumed by a monitoring system like Prometheus.

On-Prem or Cloud

Will you want to deploy your server on your hardware or will you want to deploy it in the Cloud? Perhaps you would want to deploy in multiple clouds or even in a hybrid mode where some servers are on-prem and others in the cloud. There are many different considerations for each possible deployment.


Building a highly performant and scalable pub-sub WebSockets server in Java involves addressing multiple challenges, from handling high concurrent connections to ensuring fault tolerance and security.

Keep in mind that continuous monitoring, profiling, and optimization are essential practices in maintaining the performance and scalability of your WebSocket server.

So, if you feel up to the challenge, hopefully, this article has given you some guidance regarding the things that you may wish to consider before embarking on writing a WebSocket-based pub-sub server.

If you need a Websocket pub sub cache server, take a look at Diffusiontm which is available on premise and in the cloud.