A Grain of Salt

Building Platforms That Accelerate: The Philosophy of Useful Abstraction

· Teddy Aryono

In the previous post, we explored platform architecture - how platforms are structured in layers with distinct capabilities and building blocks. But understanding the structure is only half the battle. The harder question is: how do you build platforms that genuinely accelerate organizations rather than becoming bottlenecks?

This is the central tension in platform engineering: abstraction versus capability. Abstract too much and you create a beautiful, simple interface that can’t do what people actually need. Abstract too little and you’re just wrapping existing tools without adding value. The platforms that succeed find a way to be opinionated without being limiting.

The Abstraction Paradox

Every platform faces this fundamental challenge: you’re building a layer between developers and infrastructure. That layer needs to hide complexity (otherwise why use it?) but maintain capability (otherwise it’s useless).

Consider a common scenario: Your platform provides a “deploy a web service” capability. AWS releases a new feature - perhaps a better load balancer, a new database engine, or improved auto-scaling. Now you face a dilemma:

Option 1: Don’t expose it immediately. Developers who need the new feature are blocked. They either wait for the platform team (bottleneck) or route around the platform entirely (erosion of adoption).

Option 2: Expose it quickly. But this might mean breaking your abstraction, adding complexity to your interface, or creating inconsistency with existing patterns.

This is the abstraction paradox: the very thing that makes platforms valuable (simplification) can make them limiting (inability to access underlying capabilities).

The platforms that thrive don’t try to eliminate this tension - they design for it from the beginning.

Design Philosophy: Opinionated, Not Limiting

The key insight is understanding what platforms should and shouldn’t abstract.

Platforms should abstract:

Platforms should NOT abstract:

The goal isn’t to wrap everything in a neat abstraction. The goal is to make the common case effortless while maintaining escape hatches for the uncommon case.

The Escape Hatch Principle

The first key to building non-limiting platforms: every layer should have escape hatches.

Interface Layer Escape Hatches

Your platform might have a beautiful CLI with commands like platform deploy my-service. But it should also allow developers to drop down when needed:

The high-level interface is the default, but developers who understand the implementation can bypass it when necessary.

Capability Layer Escape Hatches

Standard capabilities should handle 80% of use cases beautifully. For the other 20%, provide “bring your own” options.

Example: Your platform offers managed PostgreSQL with automatic backups, monitoring, and security configuration. This works great for most teams. But one team needs a PostgreSQL extension that’s not in your standard offering, or specific replication configuration.

Instead of blocking them, provide a path: “Here’s how to provision your own PostgreSQL instance while still integrating with the platform’s networking, secrets management, and monitoring.” They lose some convenience but gain the flexibility they need.

Implementation Layer Escape Hatches

The platform uses specific tools - Terraform for infrastructure, Kubernetes for orchestration, ArgoCD for deployment. Teams should be able to use these tools directly when the platform’s abstractions don’t fit.

The platform is additive, not restrictive. It provides value through integration, standards, and automation, but it doesn’t prevent teams from working at lower levels when necessary.

The Psychological Benefit

Escape hatches aren’t just technically important - they’re psychologically crucial. Developers are much more willing to use a platform when they know they’re not trapped. “I can always drop down if I need to” creates comfort that enables adoption.

Conversely, platforms that feel like cages generate resistance. Even if 95% of teams never need to escape, knowing the option exists builds trust.

The 80/20 Design Philosophy

Platforms should optimize for the common case, not try to handle every edge case.

Design your abstractions around what 80% of teams need 80% of the time. Make that path incredibly smooth - great documentation, working examples, automated scaffolding, built-in best practices, full support from the platform team.

For the other 20%, provide mechanisms to work around or extend the platform.

Practical Example: Deployment Capability

Your deployment capability might provide first-class support for:

Someone needs a service that’s web + worker in one container, or has exotic networking requirements, or needs to run on GPU instances? That’s fine - they can write their own Kubernetes manifests. The platform still provides value through GitOps automation, monitoring integration, secret management, and so on.

You’re not trying to predict every possible configuration. You’re making the common patterns effortless and the uncommon patterns possible.

The Anti-Pattern: Feature Parity

Trying to achieve feature parity with the underlying systems (AWS, Kubernetes, etc.) is a trap. You’ll never catch up, and you’ll build a complex interface that provides no real value over using the underlying tools directly.

Instead, focus on the specific patterns your organization actually uses. If 90% of your services are stateless web APIs, make deploying stateless web APIs trivial. Don’t try to also handle every possible stateful workload configuration.

Layered Abstraction Levels

Offer multiple levels of abstraction for the same capability. Let teams choose their level based on their needs and expertise.

Level 1: Highest Abstraction

“I want a web service with a database”

Example: platform create service my-api --runtime python --database postgres

The platform makes all the decisions: instance sizes, scaling configuration, monitoring setup, deployment strategy. Fast and simple.

Level 2: Medium Abstraction

“I want a web service with specific resource limits, custom autoscaling, and a database with particular configuration”

Example: A YAML configuration that exposes important parameters while still providing structure and validation.

Level 3: Low Abstraction

“Here’s my complete Kubernetes manifest / Terraform configuration”

Teams naturally flow between these levels. They might start at Level 1 to validate an idea quickly, move to Level 2 as requirements become clearer, then drop to Level 3 for one exotic service while keeping everything else at Level 2.

The key is that lower levels don’t mean leaving the platform entirely. The platform still provides value at every level, just with different trade-offs between convenience and control.

Handling Underlying Platform Evolution

Back to the original problem: AWS (or Kubernetes, or any underlying system) releases a new feature. How do platforms handle this without becoming bottlenecks?

Strategy 1: You’re Not Wrapping Everything

Fundamental mindset shift: you’re not building an abstraction over all of AWS. You’re building opinions about how your organization uses infrastructure.

When AWS releases a new feature:

Teams that need specialized features can use them directly. The platform provides value through integration and standards, not comprehensive coverage.

Strategy 2: Thin Wrappers with Pass-Through

For infrastructure capabilities, use thin wrappers that pass through most configuration rather than trying to abstract everything.

Instead of creating a highly abstracted “database” concept that maps to specific rigid configurations:

 1database:
 2  engine: postgres
 3  instance_class: db.t3.medium
 4  storage: 100
 5  # Platform ensures this gets monitoring, backups, secrets, networking
 6  
 7  # Any other RDS parameters can be passed through
 8  aws_parameters:
 9    enable_performance_insights: true
10    custom_parameter_group: my-params
11    backup_retention_days: 14

Your platform adds value through consistency (all databases get backups, monitoring, proper networking, secret management) while allowing flexibility through pass-through (teams can use AWS features directly).

When AWS adds a new parameter, teams can use it immediately via aws_parameters. No platform update required. Later, if a parameter is commonly used, you might promote it to a first-class field with validation and documentation.

Strategy 3: Extensibility as First-Class

Design your platform to be extended by teams, not just by the platform team.

Plugin Architectures: Teams can add new capabilities to the platform. If someone needs a new AWS service, they can write a platform module for it following your patterns and conventions. If it proves useful, it can be contributed back to the core platform or shared across teams.

Custom Resources: In Kubernetes environments, this means CRDs (Custom Resource Definitions) and operators. Teams can define their own custom resources that integrate with platform systems. Your platform provides the infrastructure (clusters, networking, monitoring) and patterns (how to write operators, how to integrate with secrets management), but teams can extend it.

Shared Module Libraries: Instead of the platform team being the sole provider of infrastructure modules, maintain a library of blessed modules that teams can contribute to. The platform provides the framework, governance, and core modules; the community provides breadth and specialization.

This shifts the platform team from “build everything” to “enable others to build” - a much more scalable model.

Strategy 4: Feature Flags and Opt-In

When you do add new capabilities or update implementations, use feature flags and opt-in adoption.

AWS releases a better way to do something. Your platform can:

  1. Implement the new approach
  2. Put it behind a feature flag
  3. Let teams opt in when ready
  4. Eventually make it the default for new services
  5. Eventually migrate existing services (with communication and support)

This prevents the platform from being a blocker. Teams that need the new capability can get it immediately by opting in. Teams that don’t aren’t forced to deal with migration churn until they’re ready.

Building for Evolution, Not Perfection

The platforms that last are designed for change from day one:

Versioned APIs

Platform capabilities should be versioned. Version 1 of your deployment API can coexist with version 2. Teams migrate at their own pace. This requires discipline but pays dividends in platform longevity.

Breaking changes are expensive and erode trust. When you must introduce them, versioning allows gradual migration rather than forcing everyone to change simultaneously.

Backward Compatibility as a Value

Invest heavily in maintaining compatibility. When you must break compatibility, provide automated migration tools and clear migration paths. Make it as easy as possible for teams to upgrade.

The platform that frequently breaks backward compatibility trains teams to avoid depending on it. The platform that maintains compatibility builds trust and deep integration.

Observable Internals

Make it easy to see what the platform is doing under the hood. When developers understand the implementation, they can work with it rather than fighting against it.

This might mean:

Transparency builds trust and enables teams to work effectively with the platform.

Documentation of Trade-offs

Be explicit about what your abstractions optimize for and what they don’t handle well.

“This deployment capability is great for stateless web services with standard scaling needs. For stateful applications with complex persistence requirements, you might want to use custom Kubernetes StatefulSets.”

“Our database provisioning handles standard PostgreSQL well. If you need specialized extensions or exotic configuration, here’s how to provision your own while maintaining integration with platform monitoring and networking.”

Honest documentation about limitations prevents frustration and helps teams make informed decisions.

The Governance Balance

Platforms need to enable guardrails, not enforce rigid constraints.

Enablement: “Here’s our standard way to deploy services. It handles security, monitoring, and scaling automatically, and it’s the easiest path.”

Enforcement: “You must use our deployment system. No other approach is allowed, regardless of your requirements.”

The difference is subtle but crucial. Enablement creates value; enforcement creates resentment.

That said, some constraints are necessary:

The key is separating necessary constraints from unnecessary ones. Enforce what truly matters for security, compliance, or cost control. Enable everything else.

If you enforce that all deployments must go through your platform, but your platform is slow or limited, teams will route around it (and break your enforcement anyway). If you enforce that all deployments must be auditable and your platform is the easiest way to achieve that, teams will use it willingly.

The Paved Road Pattern

Spotify popularized this concept, and it’s worth understanding deeply: paved roads, not guardrails.

The Paved Road

The paved road is the easy, supported, well-lit path. It has:

Most teams will stay on the paved road because it’s the path of least resistance. It’s genuinely easier than alternatives.

Off the Paved Road

Going off the paved road is allowed, but:

Most teams stay on the paved road not because they’re forced to, but because it’s better. The few teams that go off-road have good reasons and accept the trade-offs.

Evolving the Road

Here’s the key: when teams repeatedly go off-road for the same reason, that’s a signal the paved road needs improvement.

If multiple teams need GPU instances and are all writing custom Kubernetes configurations for them, maybe GPU instances should become a first-class capability. If teams keep needing more control over database configuration, maybe your database abstraction needs more flexibility.

The paved road evolves based on actual usage patterns, not abstract predictions.

The Anti-Pattern: The Ivory Tower

What kills platforms is building in isolation from actual needs:

Ivory Tower Platform: Platform team decides what developers need based on technical elegance or what seems like best practices. Builds beautiful abstractions. Wonders why adoption is low.

Reality-Grounded Platform: Platform team embeds with product teams, observes actual workflows, identifies friction, measures pain points. Builds capabilities that solve real problems in ways that fit actual usage patterns.

Critical Questions

Before building a capability, ask:

The difference between successful and failed platforms often comes down to starting with user needs rather than technical elegance.

Platform as Product, Revisited

This ties back to the product mindset from the first post, but with a deeper understanding. Products evolve based on user feedback. Products have roadmaps informed by user research. Products balance feature breadth with quality and maintainability.

Your platform doesn’t need to do everything on day one. It needs to:

  1. Solve high-pain problems really well - Focus on the biggest sources of friction
  2. Make the common case easy - Optimize for the 80% use case
  3. Provide escape hatches for the uncommon case - Don’t block edge cases
  4. Evolve based on actual usage patterns - Watch what teams do, not what they say

When AWS releases a feature, evaluate it through this lens:

Not every new capability needs to be abstracted immediately. Some never need to be abstracted at all. Focus on the ones that multiply developer effectiveness.

Measuring Platform Success

How do you know if your platform is accelerating rather than limiting?

Adoption Metrics: Are teams actually using the platform? Is adoption growing or stagnating? If teams are routing around the platform, why?

Velocity Metrics: How long from “I want to deploy a service” to “service is in production”? Has this improved? How does it compare to before the platform existed?

Satisfaction Metrics: Regular developer satisfaction surveys. NPS (Net Promoter Score) for the platform. Qualitative feedback about pain points.

Support Metrics: Volume of support requests, types of issues, time to resolution. Declining support volume often indicates improving platform quality.

Escape Hatch Usage: How often do teams go off the paved road? Why? This is goldmine data for platform improvement.

Capability Coverage: What percentage of deployments use standard platform capabilities vs. custom solutions? Increasing coverage suggests the platform is meeting more needs.

The goal isn’t to drive every metric to 100%. The goal is continuous improvement and understanding where the platform helps versus where it struggles.

Key Takeaways

For Engineering Leaders:

For Platform Engineers:

For Both:

Looking Ahead

We’ve now established the foundations: what platform engineering is, how it relates to SRE, the layered architecture of platforms, and the philosophy of building useful abstractions.

In future posts, we’ll dive deeper into specific platform capabilities - exploring deployment and release management, observability, infrastructure provisioning, and more. We’ll also examine organizational structures, team topologies, and the cultural aspects of platform adoption.

The goal is always the same: building platforms that genuinely multiply developer effectiveness and enable organizations to ship software faster and more reliably.


This is the fourth post in a series exploring platform engineering in depth. Previous posts covered platform engineering fundamentals, the relationship between SRE and platforms, and platform architecture and layers.

#platform-engineering

Reply to this post by email ↪