Composition for teams
A tension in software engineering which has always fascinated me is knowing when to break one into many. There always seems to be this inflexion point, usually triggered by the perception of duplication, understanding conflation of responsibilities or noticing that coupling became a problem.
Similarly, organisations also benefit from this pattern. Signs it might be needed are the same: there's duplication in systems, a conflation of scope and coupling of goals between teams. A common org design solution is establishing a "platform team" responsible for generalising the existing use-cases into one platform, and "application teams", focusing on the specifics of use-cases.
Supporting this type of change can be disruptive for teams, and sometimes frustration comes from defining new boundaries and decoupled purposes. Here are some ideas of how to draw boundaries and support both platform and application teams to succeed.
§What stays in application teams
Defining clear responsibilities for application teams helps defining what a platform could be. A way to understand what needs to be a platform is understanding what shouldn't be one, as this is usually more obvious than the other way around.
Platforms are less likely to solve subjective or nuanced problems well. Given that a platform's role is serving multiple use-cases, they are usually excellent at solving objective problems. Although, for most problems which ebb on the subjective side, application teams shine much more. For a payments processing platform, there are many tough product challenges to be solved as an application team: when and how to authorise a card, when to retry a payment, when and how often to settle funds, the best day of the month to recur a payment, to name a few.
Context-related optimisation should live close to the context. Platforms are great at solving the bulk of shared problems, although some of the most challenging optimisation work, especially on large scale systems, is at the context. For a recommendations platform, the context whereby a recommendation is made is hugely relevant for it to be effective, e.g. a recommendation in a product listing page, with low purchase intent vs a pre-payment or upsell recommendation. Whilst platforms can provide table-stakes for optimisation, all the tuning, calibration and meaning of success tends to live closer to the context. In general, application teams are in a much better position to find high-quality signal which is context-aware for optimisation.
Platformisation may affect your user. Teams can be extremely successful by working very closely with user experience researchers to understand what are their user's specific needs. Sometimes insight generated through research be contradictory in different surfaces or contexts and understanding this can determine how platformisation affects the user. Outcomes from research can range from application teams influencing the platform's direction if the insight is valid across the board, to also reveal that maybe the proposed drawn boundary somewhat impacts the use-case and decide that it is premature to have a platform.
§A new platform emerges
As you understand what stays, what should be generalised gets much clearer. More than a lift and shift of systems, platform teams demand a massive change in mindset. For those unfamiliar with platform teams, they tend to share some characteristics. A notable one is working with multiple customers or stakeholders. That said, a lot of the tension for platform teams is finding the balance between usefulness and staying edge-agnostic – platforms are designed to solve a set of problems well, but not all. Some characteristics I've observed successful platform teams have include:
Think of infra as part of the product – or the product itself. When a team can't control all downstream consequences of different failure modes, they need to have much stricter SLAs, incident management processes, and business continuity practices. For example, a 99% availability requirement might be acceptable for a payments platform with an average ticket of $10 @ 500k transactions per day – meaning a loss of ~$3.5k per hour. Nevertheless, if the average ticket is on the thousands, that could mean different availability requirements. Being in a platform team demands a shift into thinking that infrastructure is a much more relevant part of the product offering.
Be cautious with assuming equivalences in data. Especially for data or machine learning heavy platforms, finding ways to compare meaning, standardise and share protocols for labelling or generating insight across the board can be very hard. Even further, it can sometimes be misleading, if you don't fully understand use-cases before sharing data for analysis or training models. That said, a good way to prevent scenarios like this is being conservative of where to draw the platform's domain boundary and start with the smallest value possible. For instance, a product recommendations platform can start with an API that returns probability scores for multiple recommendation types – buy again, also bought, etc – rather than dealing with product ranking, which is a much more complex and nuanced problem.
Think of the platform as capabilities rather than a one-stop-shop. A way platforms commonly fail is by trying to package all capabilities together rather than providing application teams with a set of interoperable capabilities which they can use as needed. Even though it can be quite convenient to abstract all complexity behind a single API, when application teams don't have access to independent capabilities, they are incentivised to "go around the platform". Instead, breaking logical domains into building blocks can give tenants the ability to extend functionality. Kislay Verma's post about preventing "go-around" with platform thinking explains the problem and solution very well.
As much as change may be positive when you can clearly define each constituent's responsibility, introducing it prematurely can be very harmful. Considering how disruptive a wrong decision can be and how slow the feedback loop is to define whether a reorg has been successful or not, approaching change through experimentation can help to mitigate risks and make the experience less frustrating for everyone.