๐‚๐ก๐š๐ฅ๐ฅ๐ž๐ง๐ ๐ž๐ฌ ๐จ๐Ÿ ๐ƒ๐ข๐ฌ๐ญ๐ซ๐ข๐›๐ฎ๐ญ๐žd ๐ƒ๐š๐ญ๐š ๐’๐ญ๐จ๐ซ๐ž๐ฌ

ยท

3 min read

The need for distributed data storage, I.e. replicated data across multiple machines arises to address problems in performance scalability and reliability:

๐’๐œ๐š๐ฅ๐š๐›๐ข๐ฅ๐ข๐ญ๐ฒ: In certain cases, scaling one layer of architecture may not do much more than pass the bottleneck down to a lower level making the database the ultimate load limiting factor.

๐๐ž๐ซ๐Ÿ๐จ๐ซ๐ฆ๐š๐ง๐œ๐ž: Geographical distance between users becomes a latency factor unless services are in proximity to their users.

๐‘๐ž๐ฅ๐ข๐š๐›๐ข๐ฅ๐ข๐ญ๐ฒ: Availability is often achieved by means of replication to ensure that an entity continues to be available even if an instance of it has failed.

Horizontal scaling of databases may allow for increased load handling and address some of the problems above, however, it will create problems of its own, primarily around data consistency as multiple data stores must be kept in sync to maintain data integrity and consistency.

Thankfully, varying forms of caching can aid in addressing the problems of data store replication:

๐ˆ๐ง-๐Œ๐ž๐ฆ๐จ๐ซ๐ฒ ๐๐š๐ญ๐š ๐ ๐ซ๐ข๐๐ฌ remove the database bottleneck as node (a compute instance such as VM) within a system can have its own local copy of data on the same node as the application and existing data replication engines such as Hazelcast and Oracle Coherence ensures that data remains consistent in between nodes, eventually, data will be written to a database in an asynchronous fashion.

This can be especially useful when multiple nodes rely on the same data set to perform intensive operations as those operations can be performed in isolation of each other and concurrently which is not possible when directly communicating with a database.

Data conflict is potential when dealing with data replication however its impact can be mitigated by utilising conflict resolution through the implementation of consensus algorithms such as Raft and Paxos, one of the key features of Hazelcast is the application of Raft.

A diagram produced by Hazelcast captures in-memory data grids at a high level. Here each node hosting an application will have its own copy of cached data

๐ƒ๐ข๐ฌ๐ญ๐ซ๐ข๐›๐ฎ๐ญ๐ž๐ ๐œ๐š๐œ๐ก๐ข๐ง๐ : in-memory data grids are suitable when dealing with small amounts of memory (<100 MB) therefore when more memory is required then distributed caches are an alternative, in distributed caching the cache can be stored on a dedicated server which makes data consistency to be less of an issue however it does reduce performance as data access calls are to be performed remotely.

A diagram produced by Hazelcast capturing distributed caching at a high level. Here the cache can be shared by multiple applications and servers

๐๐ž๐š๐ซ-๐œ๐š๐œ๐ก๐ž๐ฌ: A hybrid caching model that utilises both in-memory data grids and distributed caches to address different scenarios for example in-memory data grids are used for frequently accessed data while a distributed cache is used to ensure availability.

Space-based architecture style leverages in-memory data grids at its core as highlighted by Mark Richards and Neal Ford in their book "Foundations of Software Architecture"

ย