I Treat Distributed Teams as Architecture Problems

The engineering teams that handled the shift to fully distributed work best had already been designing their team interactions the way they designed their systems: with explicit interfaces, tolerance for partial failure, and no assumption that every node would be available at every moment. Tooling budgets and progressive remote policies had little to do with it.

The Accidental Experiment

Before 2020, most of Mastercard’s engineering organization operated in a mode I’d call “distributed when convenient.” We had teams across time zones and offices, but the implicit assumption was that people could always get in a room if something important came up. Architectural decisions, escalations, and knowledge transfer all leaned on colocation as a backstop, even when we didn’t admit it. The pandemic stripped that backstop away overnight, and what it revealed was an architecture problem masquerading as a remote work problem.

The teams that kept shipping looked, structurally, like well-designed microservices: clear boundaries, published interfaces for how to interact with them, and enough internal state to keep functioning when they couldn’t reach another team for hours. The ones that stalled looked like monoliths, tightly coupled and dependent on synchronous communication, unable to make progress without real-time access to people in other parts of the org.

That observation sent me down a path I’ve been thinking about ever since, mapping distributed systems principles directly onto organizational design. The analogies are concrete enough to generate useful design questions, and Conway’s Law already tells us that organizations ship their own communication structures. I want to push that further: deliberately designing org structures using the same principles we’d apply to the systems themselves.

API Contracts as Team Charters

In a distributed system, services communicate through well-defined APIs. Each service publishes a contract declaring what it accepts, what it returns, and how it handles errors. The contract is the interface, and everything behind it is an implementation detail the caller shouldn’t need to know about.

Team charters work the same way. A well-written team charter tells every other team in the org what it owns, how to request work from it, what SLA to expect, and which decisions it makes autonomously versus the ones requiring cross-team alignment. When this contract is explicit, teams can interact with each other without needing to understand each other’s internal processes, sprint cadences, or technical debt.

A team charter is an API contract for teams: it defines the interface, hides the implementation, and makes the cost of interaction predictable.

Most organizations skip this step entirely. Teams are defined by the code they own, occasionally by a vague mission statement, but the actual interface between teams is left implicit. In a colocated environment, you can get away with that because people fill the gaps with hallway conversations and overheard context. In a distributed organization, implicit interfaces quietly become invisible, and from there the path to coordination failure is short.

Eventual Consistency Over Strong Consistency

Distributed databases give you a choice: strong consistency, where every read reflects the latest write but at the cost of latency and availability, or eventual consistency, where nodes can temporarily diverge but converge over time. Many modern distributed systems choose eventual consistency for workloads where the business logic permits temporary divergence, albeit strong consistency remains the right choice for critical paths like financial transactions.

Organizations face the same tradeoff. Synchronous communication is the organizational equivalent of strong consistency. All those meetings, instant messages expecting immediate replies, and real-time standups across eight time zones amount to a coordination tax that buys alignment at the cost of availability. Everyone has the same information at the same time, but the coordination cost is brutal. I watched teams burn entire mornings in overlapping syncs to maintain the illusion that everyone was on the same page at every moment.

Async-first communication is eventual consistency for organizations. Teams publish decisions, share context in written form, and accept that other teams will absorb that information on their own schedule. The information converges with a delay, and many decisions that feel synchronous-urgent turn out to work fine with a few hours of propagation time. That said, some interactions benefit from real-time engagement: complex design critiques, relationship-building across new teams, and the kind of creative problem-solving where ideas build on each other in rapid succession. The discipline is distinguishing these from the meetings that exist out of habit.

We moved to a model where teams defaulted to async and had to justify synchronous meetings rather than the reverse. Written decision records, recorded architectural walkthroughs, and detailed RFCs replaced a significant share of our recurring meetings. The teams that resisted this shift the hardest were, predictably, the ones that had the most trouble operating across time zones.

Circuit Breakers and Escalation Paths

In distributed systems, a circuit breaker prevents a failing service from cascading failures to its callers. When a downstream service stops responding, the circuit breaker trips: the caller stops retrying and falls back to a degraded mode rather than hammering a dead endpoint until everything collapses.

Escalation paths are the organizational equivalent. Whether a cross-team dependency stalls, a technical disagreement blocks progress, or a critical decision sits in someone’s inbox too long, the team needs a defined circuit breaker: a point at which they stop retrying the normal channel and escalate to a path that can unblock them. The same principle that governs incident response escalation applies here, and it extends well beyond incidents into everyday cross-team coordination.

The failure mode I see most often is teams that retry indefinitely, cycling through follow-up messages, additional meetings, and reposted questions in different channels. Each retry is individually reasonable, but collectively they represent a team stuck in an infinite retry loop with no backoff strategy. The fix is the same as in systems design: define the threshold (how long or how many attempts before escalation), define the fallback (who gets escalated to and what authority they have), and make the whole thing explicit so teams don’t feel like escalating is a political act.

Partition Tolerance and Autonomous Teams

The CAP theorem leaves you choosing between consistency and availability once partitions enter the picture, and partitions always enter the picture.

Organizations experience partitions constantly: a team is heads-down in a release and unreachable for a week, a key person is on vacation, a dependency team is in a different time zone and their working hours barely overlap with yours. Partition-tolerant teams can keep making progress during these gaps because they carry enough local context and decision-making authority to function independently. Teams that require constant access to other teams to make any decision are the organizational equivalent of a system that sacrifices availability for consistency: theoretically correct, but practically useless when the network hiccups.

Building partition tolerance into teams means pushing decision rights down, ensuring teams own enough context to operate independently for meaningful stretches, and accepting that some local decisions will diverge from what a centralized decision-maker would have chosen. That divergence is the cost of availability, and in my experience, teams that made locally suboptimal calls still shipped faster than teams waiting on centralized approval, albeit with a learning curve as teams calibrate how much autonomy they actually need.

Service Discovery: Knowing Who Owns What

Service discovery in microservices lets services find and talk to each other without hardcoded endpoints, keeping the system’s routing current as things scale and shift.

The organizational equivalent maps cleanly: can someone in your org, given a system, a domain, or a class of problem, quickly find out which team owns it and how to engage them? In small organizations this knowledge is ambient; everyone just knows. Past a certain size, usually around the point where new hires can’t meet every team in their first month, ambient knowledge breaks down and you need explicit discovery mechanisms.

We invested in a lightweight ownership registry that mapped systems, services, and domains to teams, with contact information and engagement protocols for each. The volume of “does anyone know who owns X?” messages dropped noticeably within the first month. Service discovery for teams is infrastructure: invisible when it works, devastating when it doesn’t.

Past a certain org size, “everyone just knows who owns what” stops being true. You need service discovery for teams the same way you need it for microservices.

The Framework, Applied

The metaphor has limits worth acknowledging. You can restart a service, but a demoralized team doesn’t recover with a deploy cycle. And organizational partitions are far messier than network partitions, shaped by trust, morale, and politics that have no distributed systems equivalent. Teams build creative chemistry through shared experience in ways that no API contract can capture. The framework works best as a design lens for structural decisions, generating useful questions about interfaces, autonomy, and failure modes, rather than as a literal blueprint where every organizational interaction maps to a systems primitive.

With that caveat, these five principles reinforce each other. Team charters (API contracts) define the interfaces that make async communication (eventual consistency) possible. Escalation paths (circuit breakers) prevent blocked dependencies from cascading. Partition-tolerant teams can function because they own enough context locally, and service discovery ensures teams can re-establish connections when partitions heal. Strip any one out and the others degrade; the framework only works as a system.

The COVID shift forced organizations to confront something that was always true: if your org design treats colocation as load-bearing, you won’t see the cracks until it’s gone. At Mastercard, the teams that adapted fastest had already internalized distributed systems thinking in how they structured their work, intentionally or not. The ones that struggled were running a monolith and calling it microservices.

A year into distributed-by-default, the pattern has held. Engineering leaders who think about their orgs as systems, with all the design discipline that implies, build teams that are resilient to disruption from pandemics, reorgs, and the simple reality of operating across a dozen time zones. The office had always been the deployment environment, not the architecture.