Building Platforms That Accelerate: The Philosophy of Useful Abstraction
In the previous post, we explored platform architecture - how platforms are structured in layers with distinct capabilities and building blocks. But understanding the structure is only half the battle. The harder question is: how do you build platforms that genuinely accelerate organizations rather than becoming bottlenecks?
This is the central tension in platform engineering: abstraction versus capability. Abstract too much and you create a beautiful, simple interface that can’t do what people actually need. Abstract too little and you’re just wrapping existing tools without adding value. The platforms that succeed find a way to be opinionated without being limiting.
The Abstraction Paradox
Every platform faces this fundamental challenge: you’re building a layer between developers and infrastructure. That layer needs to hide complexity (otherwise why use it?) but maintain capability (otherwise it’s useless).
Consider a common scenario: Your platform provides a “deploy a web service” capability. AWS releases a new feature - perhaps a better load balancer, a new database engine, or improved auto-scaling. Now you face a dilemma:
Option 1: Don’t expose it immediately. Developers who need the new feature are blocked. They either wait for the platform team (bottleneck) or route around the platform entirely (erosion of adoption).
Option 2: Expose it quickly. But this might mean breaking your abstraction, adding complexity to your interface, or creating inconsistency with existing patterns.
This is the abstraction paradox: the very thing that makes platforms valuable (simplification) can make them limiting (inability to access underlying capabilities).
The platforms that thrive don’t try to eliminate this tension - they design for it from the beginning.
Design Philosophy: Opinionated, Not Limiting
The key insight is understanding what platforms should and shouldn’t abstract.
Platforms should abstract:
- Repetitive, undifferentiated work (setting up monitoring for every service)
- Complex integrations (wiring together CI/CD, deployment, monitoring, logging)
- Organizational standards (security policies, compliance requirements, best practices)
- Cross-cutting concerns (authentication, authorization, secrets management)
Platforms should NOT abstract:
- Business logic and application-specific decisions
- Edge cases and specialized requirements
- Emerging capabilities that aren’t yet well-understood
- Things teams are better positioned to handle themselves
The goal isn’t to wrap everything in a neat abstraction. The goal is to make the common case effortless while maintaining escape hatches for the uncommon case.
The Escape Hatch Principle
The first key to building non-limiting platforms: every layer should have escape hatches.
Interface Layer Escape Hatches
Your platform might have a beautiful CLI with commands like platform deploy my-service. But it should also allow developers to drop down when needed:
platform terraform applyfor direct Terraform accessplatform kubectl execfor direct Kubernetes access- Access to raw configuration files and manifests
The high-level interface is the default, but developers who understand the implementation can bypass it when necessary.
Capability Layer Escape Hatches
Standard capabilities should handle 80% of use cases beautifully. For the other 20%, provide “bring your own” options.
Example: Your platform offers managed PostgreSQL with automatic backups, monitoring, and security configuration. This works great for most teams. But one team needs a PostgreSQL extension that’s not in your standard offering, or specific replication configuration.
Instead of blocking them, provide a path: “Here’s how to provision your own PostgreSQL instance while still integrating with the platform’s networking, secrets management, and monitoring.” They lose some convenience but gain the flexibility they need.
Implementation Layer Escape Hatches
The platform uses specific tools - Terraform for infrastructure, Kubernetes for orchestration, ArgoCD for deployment. Teams should be able to use these tools directly when the platform’s abstractions don’t fit.
The platform is additive, not restrictive. It provides value through integration, standards, and automation, but it doesn’t prevent teams from working at lower levels when necessary.
The Psychological Benefit
Escape hatches aren’t just technically important - they’re psychologically crucial. Developers are much more willing to use a platform when they know they’re not trapped. “I can always drop down if I need to” creates comfort that enables adoption.
Conversely, platforms that feel like cages generate resistance. Even if 95% of teams never need to escape, knowing the option exists builds trust.
The 80/20 Design Philosophy
Platforms should optimize for the common case, not try to handle every edge case.
Design your abstractions around what 80% of teams need 80% of the time. Make that path incredibly smooth - great documentation, working examples, automated scaffolding, built-in best practices, full support from the platform team.
For the other 20%, provide mechanisms to work around or extend the platform.
Practical Example: Deployment Capability
Your deployment capability might provide first-class support for:
- Standard web service (the common case) - beautifully simple
- Standard background worker - also simple
- Standard cron job - simple
- Standard batch job - simple
Someone needs a service that’s web + worker in one container, or has exotic networking requirements, or needs to run on GPU instances? That’s fine - they can write their own Kubernetes manifests. The platform still provides value through GitOps automation, monitoring integration, secret management, and so on.
You’re not trying to predict every possible configuration. You’re making the common patterns effortless and the uncommon patterns possible.
The Anti-Pattern: Feature Parity
Trying to achieve feature parity with the underlying systems (AWS, Kubernetes, etc.) is a trap. You’ll never catch up, and you’ll build a complex interface that provides no real value over using the underlying tools directly.
Instead, focus on the specific patterns your organization actually uses. If 90% of your services are stateless web APIs, make deploying stateless web APIs trivial. Don’t try to also handle every possible stateful workload configuration.
Layered Abstraction Levels
Offer multiple levels of abstraction for the same capability. Let teams choose their level based on their needs and expertise.
Level 1: Highest Abstraction
“I want a web service with a database”
- Simple interface with minimal options
- Opinionated defaults based on organizational best practices
- Great for straightforward use cases and teams new to the platform
Example: platform create service my-api --runtime python --database postgres
The platform makes all the decisions: instance sizes, scaling configuration, monitoring setup, deployment strategy. Fast and simple.
Level 2: Medium Abstraction
“I want a web service with specific resource limits, custom autoscaling, and a database with particular configuration”
- More configuration options available
- Still guided by templates and validation
- Covers most production use cases
Example: A YAML configuration that exposes important parameters while still providing structure and validation.
Level 3: Low Abstraction
“Here’s my complete Kubernetes manifest / Terraform configuration”
- Full control over configuration
- Responsibility for integration and maintenance
- Platform still provides infrastructure (clusters, networks) and operational benefits (monitoring, GitOps)
Teams naturally flow between these levels. They might start at Level 1 to validate an idea quickly, move to Level 2 as requirements become clearer, then drop to Level 3 for one exotic service while keeping everything else at Level 2.
The key is that lower levels don’t mean leaving the platform entirely. The platform still provides value at every level, just with different trade-offs between convenience and control.
Handling Underlying Platform Evolution
Back to the original problem: AWS (or Kubernetes, or any underlying system) releases a new feature. How do platforms handle this without becoming bottlenecks?
Strategy 1: You’re Not Wrapping Everything
Fundamental mindset shift: you’re not building an abstraction over all of AWS. You’re building opinions about how your organization uses infrastructure.
When AWS releases a new feature:
- If it’s relevant to a core capability you provide (new database engine that’s clearly better, improved load balancing) - evaluate whether it improves your offering
- If it’s specialized (new ML service, IoT-specific offering, niche feature) - you probably don’t need to wrap it at all
Teams that need specialized features can use them directly. The platform provides value through integration and standards, not comprehensive coverage.
Strategy 2: Thin Wrappers with Pass-Through
For infrastructure capabilities, use thin wrappers that pass through most configuration rather than trying to abstract everything.
Instead of creating a highly abstracted “database” concept that maps to specific rigid configurations:
1database:
2 engine: postgres
3 instance_class: db.t3.medium
4 storage: 100
5 # Platform ensures this gets monitoring, backups, secrets, networking
6
7 # Any other RDS parameters can be passed through
8 aws_parameters:
9 enable_performance_insights: true
10 custom_parameter_group: my-params
11 backup_retention_days: 14Your platform adds value through consistency (all databases get backups, monitoring, proper networking, secret management) while allowing flexibility through pass-through (teams can use AWS features directly).
When AWS adds a new parameter, teams can use it immediately via aws_parameters. No platform update required. Later, if a parameter is commonly used, you might promote it to a first-class field with validation and documentation.
Strategy 3: Extensibility as First-Class
Design your platform to be extended by teams, not just by the platform team.
Plugin Architectures: Teams can add new capabilities to the platform. If someone needs a new AWS service, they can write a platform module for it following your patterns and conventions. If it proves useful, it can be contributed back to the core platform or shared across teams.
Custom Resources: In Kubernetes environments, this means CRDs (Custom Resource Definitions) and operators. Teams can define their own custom resources that integrate with platform systems. Your platform provides the infrastructure (clusters, networking, monitoring) and patterns (how to write operators, how to integrate with secrets management), but teams can extend it.
Shared Module Libraries: Instead of the platform team being the sole provider of infrastructure modules, maintain a library of blessed modules that teams can contribute to. The platform provides the framework, governance, and core modules; the community provides breadth and specialization.
This shifts the platform team from “build everything” to “enable others to build” - a much more scalable model.
Strategy 4: Feature Flags and Opt-In
When you do add new capabilities or update implementations, use feature flags and opt-in adoption.
AWS releases a better way to do something. Your platform can:
- Implement the new approach
- Put it behind a feature flag
- Let teams opt in when ready
- Eventually make it the default for new services
- Eventually migrate existing services (with communication and support)
This prevents the platform from being a blocker. Teams that need the new capability can get it immediately by opting in. Teams that don’t aren’t forced to deal with migration churn until they’re ready.
Building for Evolution, Not Perfection
The platforms that last are designed for change from day one:
Versioned APIs
Platform capabilities should be versioned. Version 1 of your deployment API can coexist with version 2. Teams migrate at their own pace. This requires discipline but pays dividends in platform longevity.
Breaking changes are expensive and erode trust. When you must introduce them, versioning allows gradual migration rather than forcing everyone to change simultaneously.
Backward Compatibility as a Value
Invest heavily in maintaining compatibility. When you must break compatibility, provide automated migration tools and clear migration paths. Make it as easy as possible for teams to upgrade.
The platform that frequently breaks backward compatibility trains teams to avoid depending on it. The platform that maintains compatibility builds trust and deep integration.
Observable Internals
Make it easy to see what the platform is doing under the hood. When developers understand the implementation, they can work with it rather than fighting against it.
This might mean:
- Clear documentation of what the platform does behind the scenes
- Visibility into generated configurations (Kubernetes manifests, Terraform files)
- Ability to see and understand platform-generated resources
- Debugging tools that show platform decisions
Transparency builds trust and enables teams to work effectively with the platform.
Documentation of Trade-offs
Be explicit about what your abstractions optimize for and what they don’t handle well.
“This deployment capability is great for stateless web services with standard scaling needs. For stateful applications with complex persistence requirements, you might want to use custom Kubernetes StatefulSets.”
“Our database provisioning handles standard PostgreSQL well. If you need specialized extensions or exotic configuration, here’s how to provision your own while maintaining integration with platform monitoring and networking.”
Honest documentation about limitations prevents frustration and helps teams make informed decisions.
The Governance Balance
Platforms need to enable guardrails, not enforce rigid constraints.
Enablement: “Here’s our standard way to deploy services. It handles security, monitoring, and scaling automatically, and it’s the easiest path.”
Enforcement: “You must use our deployment system. No other approach is allowed, regardless of your requirements.”
The difference is subtle but crucial. Enablement creates value; enforcement creates resentment.
That said, some constraints are necessary:
- Security policies (all data must be encrypted at rest)
- Compliance requirements (all changes must be auditable)
- Cost controls (teams can’t spin up unlimited expensive resources without approval)
- Data governance (PII must be handled according to regulations)
The key is separating necessary constraints from unnecessary ones. Enforce what truly matters for security, compliance, or cost control. Enable everything else.
If you enforce that all deployments must go through your platform, but your platform is slow or limited, teams will route around it (and break your enforcement anyway). If you enforce that all deployments must be auditable and your platform is the easiest way to achieve that, teams will use it willingly.
The Paved Road Pattern
Spotify popularized this concept, and it’s worth understanding deeply: paved roads, not guardrails.
The Paved Road
The paved road is the easy, supported, well-lit path. It has:
- Excellent documentation with examples
- Automated scaffolding and templates
- Built-in best practices and organizational standards
- Full platform team support and quick response to issues
- Clear upgrade paths and migration guides
Most teams will stay on the paved road because it’s the path of least resistance. It’s genuinely easier than alternatives.
Off the Paved Road
Going off the paved road is allowed, but:
- You’re responsible for your own support
- You need to document your approach for your team
- You still must meet organizational requirements (security, observability, cost governance)
- You might need to justify to leadership why the paved road didn’t work for your use case
Most teams stay on the paved road not because they’re forced to, but because it’s better. The few teams that go off-road have good reasons and accept the trade-offs.
Evolving the Road
Here’s the key: when teams repeatedly go off-road for the same reason, that’s a signal the paved road needs improvement.
If multiple teams need GPU instances and are all writing custom Kubernetes configurations for them, maybe GPU instances should become a first-class capability. If teams keep needing more control over database configuration, maybe your database abstraction needs more flexibility.
The paved road evolves based on actual usage patterns, not abstract predictions.
The Anti-Pattern: The Ivory Tower
What kills platforms is building in isolation from actual needs:
Ivory Tower Platform: Platform team decides what developers need based on technical elegance or what seems like best practices. Builds beautiful abstractions. Wonders why adoption is low.
Reality-Grounded Platform: Platform team embeds with product teams, observes actual workflows, identifies friction, measures pain points. Builds capabilities that solve real problems in ways that fit actual usage patterns.
Critical Questions
Before building a capability, ask:
- Are we building this because it’s technically interesting, or because teams are struggling without it?
- Have we talked to the teams who would use this? What do they actually need?
- Can we start with a minimal version and iterate based on feedback?
- What’s the adoption plan? How will teams discover and learn this capability?
- How will we measure if this actually helps?
The difference between successful and failed platforms often comes down to starting with user needs rather than technical elegance.
Platform as Product, Revisited
This ties back to the product mindset from the first post, but with a deeper understanding. Products evolve based on user feedback. Products have roadmaps informed by user research. Products balance feature breadth with quality and maintainability.
Your platform doesn’t need to do everything on day one. It needs to:
- Solve high-pain problems really well - Focus on the biggest sources of friction
- Make the common case easy - Optimize for the 80% use case
- Provide escape hatches for the uncommon case - Don’t block edge cases
- Evolve based on actual usage patterns - Watch what teams do, not what they say
When AWS releases a feature, evaluate it through this lens:
- Does this solve a pain point our users have?
- Does it fit within the platform’s scope and mission?
- Can users access it directly if they need it before we officially support it?
- If we add it, will it serve the 80% or the 20%?
Not every new capability needs to be abstracted immediately. Some never need to be abstracted at all. Focus on the ones that multiply developer effectiveness.
Measuring Platform Success
How do you know if your platform is accelerating rather than limiting?
Adoption Metrics: Are teams actually using the platform? Is adoption growing or stagnating? If teams are routing around the platform, why?
Velocity Metrics: How long from “I want to deploy a service” to “service is in production”? Has this improved? How does it compare to before the platform existed?
Satisfaction Metrics: Regular developer satisfaction surveys. NPS (Net Promoter Score) for the platform. Qualitative feedback about pain points.
Support Metrics: Volume of support requests, types of issues, time to resolution. Declining support volume often indicates improving platform quality.
Escape Hatch Usage: How often do teams go off the paved road? Why? This is goldmine data for platform improvement.
Capability Coverage: What percentage of deployments use standard platform capabilities vs. custom solutions? Increasing coverage suggests the platform is meeting more needs.
The goal isn’t to drive every metric to 100%. The goal is continuous improvement and understanding where the platform helps versus where it struggles.
Key Takeaways
For Engineering Leaders:
- Platforms succeed by being opinionated without being limiting - make the common case easy, make edge cases possible
- Escape hatches aren’t just technical features; they’re psychological safety that enables adoption
- Measure platform value through developer productivity and satisfaction, not just utilization
- Platform investment should be driven by actual pain points, not abstract elegance
For Platform Engineers:
- Design for evolution from day one - versioned APIs, backward compatibility, clear upgrade paths
- Thin wrappers with pass-through beat thick abstractions that try to hide everything
- Extensibility by users scales better than trying to build every feature centrally
- When teams repeatedly work around the platform, treat that as a signal to improve, not a failure to comply
For Both:
- The abstraction serves the user, not the other way around
- Platforms accelerate organizations by reducing friction for common cases, not by preventing uncommon ones
- Trust is built through reliability, transparency, and respecting users’ intelligence
- The best platforms make the right thing the easy thing, not the only thing
Looking Ahead
We’ve now established the foundations: what platform engineering is, how it relates to SRE, the layered architecture of platforms, and the philosophy of building useful abstractions.
In future posts, we’ll dive deeper into specific platform capabilities - exploring deployment and release management, observability, infrastructure provisioning, and more. We’ll also examine organizational structures, team topologies, and the cultural aspects of platform adoption.
The goal is always the same: building platforms that genuinely multiply developer effectiveness and enable organizations to ship software faster and more reliably.
This is the fourth post in a series exploring platform engineering in depth. Previous posts covered platform engineering fundamentals, the relationship between SRE and platforms, and platform architecture and layers.