Modularization of Applications: An Architectural Overview

Published July 17, 2018

What is your stack like? Does it need overhauling, minor changes or an increase in its degree of modularization? This is an overview of the options available to you. Recent efforts to create state-of-the-art designs always focus on allowing parallelism and concurrency to maximize throughout. There are several factors you should consider in relation to this, so let’s get started.

Net Parallelism

The easiest way of achieving a higher throughput is to cluster your application, placing nodes of the same type side-by-side. But the network requires an access address, so only a single resource must explicitly “listen” to this address. The Point-to-Point connection allows the use of external computational resources from outside your computer.

Using an external resource is not good enough. Instead of simply using one machine, we use a proxy to allow the use of more than one resource. This tree branching topology allows horizontal scalability, replication, avoids a single point of failure (in-branch nodes) and blue/green deployments.

By using a Reverse Proxy for Load Balancing, the network makes the call by splitting the load evenly across all available resources, which is cheap and fast enough, making it the most common method nowadays, using round-robin or other content-agnostic algorithms.

You can pinpoint the resource to be accessed by using the information available in level 7 packages of the OSI model, such as the header information, content-type, the URL or the cookie. This App balancing routing function can also be used for capping or throttling purposes. The additional request information allows more complex branching structures, which allow content to be shared heterogeneously across the nodes.

Both load balancing and App balancing can be achieved with software proxies like HAproxy or Nginx, to name a few. The PtP connection doesn’t require anything fancy because it is the default connection.

Summary of Net Factors

N:1 Point-to-point: direct connection of resources.
N:2 Load Balancing: tree branching topology, content-agnostic.
N:3 App Balancing: tree branching topology, content-aware.

States and Transactions

One of the key factors in designing application architectures is how the application will affect users or data, that is, deal with the transactions and states. Examples of this include dealing with user inputs or writing the database.

States will be partial information that must be stored to allow/complete actions later on. Paging, Authorization or Authentication are some of the examples. They require a design decision. Traditionally, stateful applications are the main trend, since they require a smaller amount of information exchange. Modern connections are much more flexible, so stateless applications are the de facto design decision, as required to parallelize modern cloud infrastructures, since they allow easy swapping/updating/deleting of resources without losing partial information.

Purely stateless and without side-effects is quite a Rara Avis, since everything today generates logs. Besides that, you can think about transformational resources as resources that don’t change the data. Examples of this can be stream filters or monitors.

Today’s examples of stateful technologies include Postgres, RabbitMQ brokers or Tensorflow Computational Units. Stateless examples are nginx or logstash.

Summary of State Factors

S:1 Stateful: node with an internal state.
S:3 Stateless: node without an internal state.

Transactional examples are Postgres, Redis, Kafka and all the nodes that store and process data.

Examples of Transformational resources are Amazon Kinesis Firehose, Fluentd or metabase.

Summary of Transactional Factors

T:1 Transactional: node that adds/modifies/deletes stored information.
T:2 Transformational: node that doesn’t interact with stored information.

Data, Data, Data

There are not many options when picking RAM, CPU, GPU or hard drives, so the CPU is used to process and store information in the drive for all transactional nodes. As in the case of the tree branching topology, we like to include more than one resource if possible. But since writing (the right thing, at the right moment) the hard drive has proven to be hard, we can’t just have two transactional nodes write the same file descriptors behind a proxy. Therefore, the single source of truth is usually the baseline in small to medium-size designs.

Multiplexing Read Replicas are a simple solution since writing is still performed by a single node. All the remaining nodes specialize in read-only.

When read replicas aren’t enough, we use multi-master techniques, if possible. Each database vendor provides their own way of doing it, and the process might not be available in the cloud unless a custom installation is used.

Using Unstructured storage alleviates the real problem since it duplicates the scheme alongside the data. This allows faster storage, since there is no need to migrate or handle scheme incompatibilities. This is why these are often associated with webscale or NoSql hashtags.

Sharding is the last of the main techniques discussed here, which splits the data between logical reproducible factors. Database nodes sometimes implement this type of sharding functions internally or require you to create the logic yourself. It can be as simple as having all user data in a single node and all media files split by type of content.

Data Factor

D:1 Single source of truth
D:2 Read Replicas
D:3 Multi-master
D:4 Unstructured storage
D:4 Sharding

Lifecycle Control

The last factor in this overview describes how the code that creates the system is stored and how it interacts.

Applications are complex systems that require maintenance. Therefore, the way in which the maintenance technicians interact with the system is an important factor that must also be taken into account when designing the architecture.

Version Control Systems allows you to store snapshot information about when and why a change has been made in the system. The more cohesive and centralized, the bigger the picture. The Single Repository usually allows a faster pace of development since sync repositories are often tedious.

Splitting the application information between CVSs is a good option to ensure a hard wall between layers. In some cases, a common core facility can be shared between sibling nodes. However, nested repositories require deep maintenance, since this involves the use of an explicit partition of elements that directly depend on each other.

The last technique involves splitting the repositories into Multiple repositories with no direct dependency between each other. This hard wallforces a wall and an interaction contract to be crated with GraphQL, API RESTful, or a GRPC connections are needed to interact with each other.

CVS factor

C:1 Single repository
C:2 Nested
C:3 Multiple

Application Modularization

Using the previous factors as guidelines, we can conclude that there is no simple solution to tailoring your application requirements.

For example, one can think by default that Monolith applications are in disuse and for a good reason, since big tangled code is difficult to maintain. Code ages quickly, so we had to be able to modify or discard pieces as fast as possible. There is still one big “if” in this statement. Intense computational resources are better suited in monoliths. As a rule of thumb, both Database and Machine Learning areas are used as a monolith.

Since the cloud computing boom, parallelized resources using the network layer can be fast and simple to use, so horizontal scalability is a must today. However, dealing with aged and tangled code might not as easy. Therefore, Fake micro-services are used to mimic the need to share resources while publishing the entire code. This can be used as a faster way to change orchestration requirements, if needed, when application usage flaps constantly and the boot time exceeds requirements.

Splitting the responsibilities of the application in additional modules is called Multiliths and resolves both interdependency requirements and network concurrency. It is used as a trade-off between development liberty and horizontal scalability.

Using a single CVS for a micro-service architecture is a Fake Monolith and it’s also a trade-off technique used to create a faster product, but relying on developers to make sure the modules do not become entangled. Since there is no hard wall between services, you can create tests to ensure that no module directly accesses another module.

Splitting the modules into separate CVS repositories is what is known as micro-service applications with a concrete domain that is small enough that it can be changed easily, if necessary. Using the hard wall being used by the different repositories also guarantees that the modules aren’t entangled. The drawback is that the general architecture development pace is slower.

Finally, since the boot times are faster and faster, using Functions as a Serverless architecture service will allow the user to rely on ephemeral machines that are created on demand. For each use. Every time it is called. This allows impressive horizontal scalability and it is the current trend in web applications. In addition, another advantage is that you don’t need to force the app to split into several repositories.

App factor Summary

A:1 Monolith
A:2 Fake micro-service
A:3 Multilith
A:4 Fake monolith
A:5 Micro-service
A:6 Serverless

Final Notes and Conclusions

Can you describe the state and transaction of each component in your stack? Does splitting into domain resources solve your scalability issues? Do you have an issue? These are the first questions you should ask before switching to a different architecture. All discussions around micro-services are based on migrating aged monoliths to micro-services, but when you remove then, you can understand the differences associated with each option more easily and select the best option for each case.

Each time you choose to split a service, review its associated states and transactions. There are a ton of cautionary tales about splitting transactions between services and, as a result, creating an incomplete state of madness, since the entire stack becomes a database. One service prepares the data, another one writes the broker and another service finally sends the information to the database. Any problem arising from this will lead to losing the information about the transaction due to misuse.

Several other techniques have not been covered here, such as vertical escalation or MPI grid computing. These techniques don’t affect adjacent network topologies.

In addition, the Quality of Service, SPOF prevention, reliability, and resilience have not been discussed here. These are just similar techniques that will allow you to think about domain areas and more advanced techniques associated with each factor in relation to the architecture.

Learn more about AI, ML, DL, and more by reading our blog posts.

Read our blog