Market Data in the Cloud

By Riaz Mohammed, CTO DiffusionData


Overview and  basic concepts


Market data ecosystems are typically hosted on premise.

  • Market Data Types: Pre-trade and Post-trade Market data.
  • Containing Bid and Ask for various instruments – essential for trading.

Market Data is used for real-time trading, risk analysis, analytics and more.

  • Trading uses real-time Market data .
  • Delivering real-time Market data to consumers:
    • Brokerage firms, Investors, Security dealers, etc.

The challenge:

  • Velocity of data.
  • Data volatility (due to uncertain economic and political state of the world).
  • The need to keep costs down.
This means maintaining best-fit on-prem infrastructure. This is costly with constant upgrades to meet the increasing traffic and delivery requirements.



Need for Market Data in the Cloud


Market data distribution network – The participants

  • Exchanges and trading venues, who produce market data based on trading activity.
  • Market data service providers and vendors, who aggregate and normalize market data from multiple sources, and provide value added services like analytics.
  • Brokerage firms, security dealers and investors who consume the data.

All participants are required to maintain dedicated infrastructure for Market data management.

This is costly with constant upgrades to meet the increasing traffic and delivery requirements.

The end consumers of market data such as the brokerage firms, must process market data in phases:

Increasing the complexity and TCO of market data platforms for all participants including the end consumers.



Market Data in the Cloud Challenges

There are several challenges when successfully delivering market data in the cloud, and we will explore some of the key ones.

  • IP multicast: Traditional on-premise based market data networks rely on IP multicast for optimized delivery to various consumers, which is not natively supported in the cloud.
  • Scaling: The common practice when using cloud infrastructure is to scale horizontally. However, this creates complexity if ordering of market data is to be maintained.
  • Native messaging: Cloud native messaging products are not fast enough as they are primarily built for guaranteed delivery. Market data requires at most once delivery.
  • Cost: If not architected well, the cost can spiral out of control. Data distribution / bandwidth cost, which is negligible in on-premise implementations, can be a major cost in the Cloud, especially when delivering data to multiple consumers in different locations. Resiliency requires that the solution is available across multiple cloud regions, which in turn can also drive costs up.
  • Multi-cloud: Most major enterprises would want a cloud agnostic solution so that they are not tied to a single provider, and this is also mandated by regulation. A carefully selected technology stack this is portable across different cloud providers also adds to the complexity, given that most cloud vendor solutions do not support multi cloud.
  • On-prem cloud integration: In most cases availability of data needs to be consistent across on-prem and cloud. This requires efficient site-to-site replication across cloud and on-premise environments.
  • Entitlements: An extensive entitlement system is required, which can monitor and control access to market data depending on the type of data and user-level permission including the consuming application or device.
  • Filtering & personalization: Not all recipients require all data, nor do they need data in real time, hence different types of customizations can be applied which will simplify consumption of market data for consumer applications as well as reduce the amount of data that is transmitted over the network. This is also vital to provide a zero-footprint solution to consumers.


Introducing Diffusion Intelligent data platform


Diffusion is an advanced cloud ready pub-sub platform for internet scale hyper-personalized messaging with additional features such as:

  • Low code / no code inflight data wrangling
  • In-memory last value and historic value cache
  • Request-response API
  • Data Streaming
  • Real-time remote data view
  • Delta compression
  • Monitoring and administration
  • Security and entitlements
Diffusion Key Capabilities

  • High-Throughput, Low-Latency Pub/Sub
  • Hyper-personalized Data Distribution
  • Data Acceleration w/Last Value & History Cache
  • Highly Scalable clustered Req or API processing
  • Cloud-Native, Multi-Cloud
  • In-Memory Performance

Diffusion Benefits

  • Infrastructure availability as IaaS, PaaS or SaaS.
  • Elasticity, allowing dynamic scale up and scale down of infrastructure depending on load.
  • The possibility of delivering market data with zero infrastructure requirementfor end consumers.
  • And above all, bring the cost down.


Overcoming Challenges with Diffusion



  • Diffusion maintains a TCP connection with client applications via WebSockets, ensuring full visibility and access control of market data.
  • Market data is structured data with price details being the main change between messages, and this allows Diffusion’s delta compression to reduce the bandwidth usage by 70% or more, as it only sends the binary difference between messages for the same instrument.
  • This helps to deliver market data to 10s of 1000s of consumers without incurring huge bandwidth costs.
  • Diffusion provides solutions for all participants of the market data network



Diffusion Zero-footprint consumption of Market Data

No need to host infrastructure to process market data feeds, instead, Diffusion services running in the cloud will filter and adapt market data for direct consumption by end-application or user.


Diffusion Architecture


This is a Diffusion architecture designed to meet the demands of efficient and rapid scalability of servers, where data volumes may be both large and unpredictable.

Automatic scale up and down 

  • Diffusion servers are automatically brought into active service when required
  • and released when no longer needed,
  • scaling decisions are based on your own metrics (message rates, CPU load, number of connections)

Rapid response to demand 

  • Active servers can be brought online in seconds so there is always capacity to manage unexpected loads.


  • Add servers indefinitely to handle large volumes of data ingress.