Encapsulation is an Object Oriented Programming concept that binds together the data and functions that manipulate the data, and that keeps both safe from outside interference and misuse. Data encapsulation led to the important OOP concept of data hiding. — Data Encapsulation in C++

We all know the benefits of encapsulation and the risks of "leaky abstractions". Your grandfather used lexical scope to effect encapsulation with ALGOL back in the '60s and your father was using C++ to prefix class members with private access modifiers 20 years later.

In it's most basic form, encapsulation is about data hiding... but the principle extends well beyond data stored in memory.

Encapsulating Distributed Systems

A fairly typical modern architecture might include an OLTP database, a key-value store, maybe an OLAP database, a couple of APIs, a public web interface and a back end administrative interface... deployed on multiple servers and executing across multiple networks.

Although we don't typically have "private access modifiers" when declaring the interfaces between distributed systems, we still need to ensure proper encapsulation in our design.

This is best explained by way of example. So, enter our fictitious company...

Widgets Inc

Widgets Inc. is a retailer that sells... Widgets. They have a retail store on Main Street with a POS terminal, an online store and a back-office admin tool where they can manage their stock and analyze reports from their nightly ETL process. Their system looks something like this:

Widgets Context.png

There are more ways than one to skin a cat — The Money Diggers.

We could take a couple of approaches to cobbling together this solution.

Integration Database

The approach that will require the fewest neurons initially is to maintain Stock Data in a central shared ERP database. The other applications in the system would then open database connections directly to read and write data from this database.

This results in a system that might look something like the following:

Widgets - Integration Database Architecture.png

On the whole integration databases lead to serious problems because the database becomes a point of coupling between the applications that access it. — Martin Fowler

As you might surmise from Martin Fowler's quote above, this approach has serious drawbacks whenever we need to change the system.

For example, assume that Widgets Inc. has seen great success in their brick and mortar outlet and they want to open another outlet. What used to be a simple "Sku" table in the database will now have to be remodelled as 3 tables.

Widgets Sku table changes.png

Since we've just moved the stock on hand count from the original Sku table to a new Stock table, implementing this change will require modifications to every component in our system.

Worse, imagine if this were a large system maintained by multiple teams. Assume we have a separate BI department, separate teams managing the POS application and the Back Office appliction and we outsourced building the online store to a design and development agency. Such a change would require coordination with every development team and the contractors! Their sprints would need to be aligned and rollout carefully planned and coordinated to minimize downtime, application failure and data inconsistencies.

Integration APIs

An alternative approach to this problem would be to wrap the ERP database in multiple interfaces tailored to the specific consumers (in this case our other applications). So our alternative architecture would look more like the following:

Widgets Integration APIs Architecture.png

Using this approach, we would encapsulate the ERP database in one or more APIs that would provide access to the data via operations that were tailored to the various client applications.

With this architecture, changing the system to support multiple outlets would be much simpler.

  • Back Office - we'll need to add some UI/APIs to manage stock by outlets and we'll need to add an outletId parameter to stock management interface operations.
  • POS - needs outlet specific data. Potentially that just means configuring each POS installation to use an outlet specific endpoint (meaning minimal changes to the POS application itself).
  • Online Store - might not need any modification. We can dispatch online orders from any outlet.
  • Analytics Service - could initially continue to provide reports based on aggregate sales and stock levels.

For an MVP then, we probably wouldn't need to touch the Online Store or the Analytics Service. We could focus almost all of our effort on the internals of the ERP Service and the interface to the Back Office application... then point the POS applications to Outlet specific endpoints on the ERP Service.

Over time, maybe we could gradually roll out changes to the other components so that they were "multi-outlet aware" as well. For example, we might show variable shipping times in the Online Store depending on the outlet that stock would be dispatched from. However, such changes would be optional and could be developed and deployed independently (requiring minimal planning and coordination between teams).

This new architecture is way more adaptive...

Why are some systems still so leaky?

If interfaces like APIs are so great, why do databases continue to be used as an integration point in so many system architectures?

One reason is that there is a significant amount of effort required to scaffold a new API (wiring up dependency injection, unit and integration tests, build and deployment scripts, monitoring, logging etc.). It's way easier just to hand out a DB connection string... especially early in development where all of the components are developed by a single team (or even a single developer).

Over time though, as the project grows in complexity and the team expands, daily stand-ups start to feel like communist five year planning sessions and everyone says, "Enough! Let's split into two 'Agile' teams!" This seems logical enough - divide and conquer right?

The easiest way to do this is simply to walk into separate rooms and each work on separate components of the existing system. However this doesn't work.

Up until that point, one team managed all parts of the system. Although changes might regularly have impacted all of the systems (because they all shared the same data structures), coordinating the herculean efforts required to implement such changes was made possible because all of the developers were on the same team and talked regularly (same sprint planning, same stand-ups etc.).

You're still all working on one big system - the only difference is that now you have poor communication.

However once the team splits, unless they partition the system into properly decoupled components, what you're doing is not "divide and conquer" at all. You're still all working on one big system - the only difference is that now you have poor communication.

This will reduce your ability to make changes to the system - not enhance it.

How to partition responsibilities across multiple teams

So how should we partition systems so that multiple teams can work on them?

The basic problem is that you have multiple components (developed by multiple teams) coordinating with one another by storing and retrieving data from the same data store - a bizzare form of asynchronous communication by convention, with no method signatures or interface contracts.

Firstly then, you have to refactor the data and functionality into a single component that doesn't rely on database integration for convention based messaging. For that you have two basic options:

  1. Select an existing component that will encapsulate the data and provide APIs for all operations on that data.
  2. Refactor the data and operations out into an entirely new component, which provides API to the original components.

Once you have properly encapsulated services, deciding which team should develop the service is no longer a problem. Whichever team you pick, their dependencies on other teams have been minimized and formalized. Normally they would only need to talk to other teams if they make breaking changes to the APIs that they provide (formal and explicit contracts).

Since breaking changes to the interfaces are uncommon (most changes involve extending the interface or can be handled simply by versioned messages/endpoints), such services can typically be changed, tested and deployed independently of one another without requiring complex planning and coordination with other teams.

For fans of SOLID, this should sound familiar... you shouldn't need to modify more than a few files in your appliction for most code changes - if you do then you likely have some structural issues to address. The same should be true of properly designed distributed systems - most changes should only affect one or two components in the system.

Caveats and remaining challenges

The only challenges I've mentioned to proper encapsulation are a bit of upfront planning and some elbow grease to scaffold new APIs.

Your real world applications will no doubt be more complex than the example I've given here and you may run into various challenges that I haven't mentioned. You may need to investigate approaches such as DDD and Event Sourcing and think about various patterns from Microservice Architectures to deal with latency or dynamic scaling if you want the split your monolithic databases into separate services. I wouldn't even attempt to lecture on those topics (much less in a blog). Luckily though, better men than me have written books, recorded videos and offer seminars on those topics...

Summary

I described a trivial ERP system and two potential architectures that could be used to build it. One approach uses Database Integration as a way for the components to coordinate. The second uses APIs, which is ultimately a more adaptive architecture.

Mainly what I wanted to highlight though is that your application architecture needs to grow with your team. It's usually not sufficient to simply split your team up and have them work on different components of the existing architecture.