Infrastructure as Code (IaC) and GitOps: The Foundation of Modern Platforms
In the previous post, we explored internal developer portals as the interface layer that provides developers with unified access to platform capabilities. But what powers those capabilities? How do platforms actually provision infrastructure, deploy applications, and maintain consistency across environments?
The answer lies in two foundational patterns: Infrastructure as Code (IaC) and GitOps. These aren’t just technical implementations - they represent fundamental shifts in how we manage infrastructure and application deployment. Understanding these patterns is essential for both building platforms and working effectively with them.
Infrastructure as Code: The Foundation
Infrastructure as Code is the practice of defining infrastructure through machine-readable files rather than manual processes or interactive configuration. But this dry definition misses the profound philosophical shift IaC represents.
The Core Philosophy
IaC applies software engineering principles to infrastructure management. The insight: infrastructure should be treated like application code.
Version Controlled - Every infrastructure change is tracked in Git. You can see who changed what, when, and why. You have complete history and can roll back to any previous state. Infrastructure changes become auditable and reversible.
Reviewed - Infrastructure changes go through pull requests with code review. Multiple people examine changes before they reach production. The review process itself becomes documentation of decisions and rationale.
Tested - You can validate infrastructure definitions before applying them to production systems. Unit tests check individual modules. Integration tests create real infrastructure in test environments. Policy tests ensure compliance requirements are met.
Reproducible - The same definition creates identical infrastructure every time. No more “it works on my machine” for infrastructure. You can confidently recreate entire environments from definitions. Disaster recovery becomes: run the IaC.
Self-Documenting - The code itself is documentation that’s always up-to-date. Want to know how production is configured? Read the Terraform files. They can’t be out of sync with reality because they are the source of truth.
This shift from imperative (run these commands in this order) to declarative (here’s what the end state should look like) is transformative.
Declarative vs Imperative: A Critical Distinction
This distinction is fundamental to understanding how IaC works:
Imperative approaches specify the steps to achieve the desired state. You’re telling the system HOW to build what you want:
1# Create VPC
2aws ec2 create-vpc --cidr-block 10.0.0.0/16
3
4# Wait for VPC to be available
5aws ec2 wait vpc-available --vpc-ids vpc-123
6
7# Create subnet
8aws ec2 create-subnet --vpc-id vpc-123 --cidr-block 10.0.1.0/24
9
10# Create route table
11aws ec2 create-route-table --vpc-id vpc-123
12
13# Associate route table with subnet
14aws ec2 associate-route-table --subnet-id subnet-456 --route-table-id rtb-789Problems with this approach:
- If something already exists, you get errors or duplicates
- You need complex logic to handle existing resources
- No automatic cleanup of resources you no longer need
- Difficult to understand current state by reading commands
- Running the script twice produces different results than running it once
Declarative approaches specify the desired end state. You’re telling the system WHAT you want:
1resource "aws_vpc" "main" {
2 cidr_block = "10.0.0.0/16"
3
4 tags = {
5 Name = "main-vpc"
6 }
7}
8
9resource "aws_subnet" "main" {
10 vpc_id = aws_vpc.main.id
11 cidr_block = "10.0.1.0/24"
12
13 tags = {
14 Name = "main-subnet"
15 }
16}
17
18resource "aws_route_table" "main" {
19 vpc_id = aws_vpc.main.id
20
21 tags = {
22 Name = "main-route-table"
23 }
24}
25
26resource "aws_route_table_association" "main" {
27 subnet_id = aws_subnet.main.id
28 route_table_id = aws_route_table.main.id
29}Benefits of declarative:
- The tool figures out HOW to achieve the desired state
- If you run this twice, it sees everything exists and does nothing
- If you change the CIDR block and run it again, the tool knows what to update
- The definition is readable - you can understand the infrastructure by reading the code
- Adding or removing resources is straightforward
Most modern IaC tools are declarative: Terraform/OpenTofu, CloudFormation, Pulumi (supports both), Kubernetes manifests, Crossplane. This declarative approach enables powerful patterns like drift detection and automatic remediation.
State Management: The Core Challenge
Declarative IaC requires understanding current infrastructure state to calculate what changes to make. This is the state problem.
Consider: You have Terraform defining 100 resources. You run terraform apply and it creates everything. Now you change one resource and run terraform apply again. How does Terraform know:
- Which resources already exist?
- Which resource you changed?
- What needs to be updated versus left alone?
- What resources should be deleted because they’re no longer in your definitions?
The answer: state file. Terraform maintains a database of what it previously created. When you run terraform apply:
- Reads your desired state (the .tf files)
- Reads the last known state (the state file)
- Queries actual infrastructure to see if reality matches the state file
- Calculates a diff (what needs to change)
- Applies the changes
- Updates the state file with the new state
State management introduces several challenges:
Lost State - If you lose the state file, Terraform doesn’t know what it created. Running terraform apply again might try to recreate everything (causing conflicts) or think nothing exists (orphaning resources that cost money).
Corrupted State - If state gets out of sync with reality (someone manually changed infrastructure, or a crash during apply), you have drift. Terraform’s view doesn’t match reality, leading to unexpected behavior.
Concurrent Modifications - Two people run terraform apply simultaneously on the same state. They both read the same initial state, calculate different changes, and try to update. Results in conflicts, lost updates, or corrupted state.
Sensitive Data - State files contain everything about your infrastructure, including sensitive values like database passwords and API keys. They need to be secured but also accessible to automation and the team.
Solutions to state management challenges:
Remote State - Store state in a shared location (S3, Azure Blob Storage, Terraform Cloud, Google Cloud Storage) instead of local files. Everyone and all automation access the same state. Enables collaboration and eliminates “works on my machine” for infrastructure.
State Locking - When someone starts a terraform apply, they acquire a lock. Others wait until the lock is released. Prevents concurrent modifications. Uses DynamoDB for S3 backends, Consul, or cloud-native locking mechanisms.
State Encryption - Encrypt state at rest and in transit. Sensitive values are protected. Most remote backends provide this automatically.
Separate State Per Environment - Don’t share state between dev, staging, and production. Each environment has its own state file. Limits blast radius of mistakes and allows environments to evolve independently.
Drift Detection - Regularly compare state to actual infrastructure. Alert when they diverge. Some tools (Terraform Cloud, Spacelift) can auto-remediate drift, reverting manual changes.
How Platforms Use IaC
Platforms don’t just “use Terraform” - they structure IaC in specific ways to enable self-service, enforce standards, and reduce cognitive load on developers.
Module Libraries: Encoding Best Practices
Instead of every team writing their own infrastructure code, platforms provide reusable, tested modules that encode organizational best practices:
1# Without modules: Teams write their own S3 configuration every time
2resource "aws_s3_bucket" "data" {
3 bucket = "my-data-bucket"
4
5 # ... 30 lines of configuration for:
6 # - Encryption settings
7 # - Versioning
8 # - Lifecycle policies
9 # - Access logging
10 # - Public access blocks
11 # - CORS rules
12 # - Bucket policies
13 # - Cost allocation tags
14 # - Compliance tags
15}
16
17# With platform modules: Standards are built in
18module "data_bucket" {
19 source = "github.com/company/terraform-modules//s3-bucket"
20
21 name = "my-data-bucket"
22 environment = "production"
23
24 # Everything else is handled by the module:
25 # - Encryption enabled by default
26 # - Versioning enabled
27 # - Access logging configured
28 # - Public access blocked
29 # - Cost tags applied
30 # - Compliance controls enforced
31}The module handles complexity and ensures consistency. Teams get:
- Encryption by default (compliance requirement)
- Proper access logging (security requirement)
- Versioning for disaster recovery
- Lifecycle policies to control costs
- Appropriate tags for cost allocation
- Security configurations that pass audits
When security requirements change, the platform team updates the module once. All teams using the module get the improvement automatically (when they update their module version).
Composition Patterns: Higher-Level Abstractions
Platforms combine multiple modules into higher-level capabilities:
1# Platform provides a "web service" module that creates everything needed
2module "web_service" {
3 source = "github.com/company/platform-modules//web-service"
4
5 name = "payment-api"
6 environment = "production"
7 runtime = "python"
8
9 # The module creates:
10 # - ECS task definition with proper resource limits
11 # - Application Load Balancer with SSL termination
12 # - Auto-scaling policies based on CPU and request count
13 # - CloudWatch dashboards with key metrics
14 # - Log groups with proper retention
15 # - IAM roles following least-privilege principles
16 # - Security groups with minimal required access
17 # - Route53 records for DNS
18 # - Service discovery configuration
19 # All properly integrated and following organizational standards
20}One module invocation creates a complete, production-ready service environment. The team doesn’t need to understand ECS, ALB, IAM policies, CloudWatch, or security group rules. The platform has encoded that knowledge in the module.
Policy as Code: Automated Enforcement
Platforms enforce security and compliance requirements through automated policy checks:
1# Using Pulumi's policy framework
2def s3_bucket_encrypted(args, report_violation):
3 if args.resource_type == "aws:s3/bucket:Bucket":
4 if not args.props.get("serverSideEncryptionConfiguration"):
5 report_violation("S3 buckets must have encryption enabled")
6
7def rds_backup_retention(args, report_violation):
8 if args.resource_type == "aws:rds/instance:Instance":
9 retention = args.props.get("backupRetentionPeriod", 0)
10 if retention < 7:
11 report_violation("RDS instances must retain backups for at least 7 days")Or using Open Policy Agent with Terraform:
1deny[msg] {
2 resource := input.resource_changes[_]
3 resource.type == "aws_s3_bucket"
4 not resource.change.after.server_side_encryption_configuration
5 msg := sprintf("S3 bucket %v must have encryption enabled", [resource.address])
6}
7
8deny[msg] {
9 resource := input.resource_changes[_]
10 resource.type == "aws_db_instance"
11 resource.change.after.publicly_accessible == true
12 msg := sprintf("RDS instance %v cannot be publicly accessible", [resource.address])
13}These policies run during terraform plan. Violations block the change from being applied. Security and compliance requirements are enforced automatically through code, not through documentation that people might ignore or forget.
Testing Infrastructure Code
Just like application code, infrastructure code can and should be tested:
1# Using pytest with Pulumi
2@pulumi.runtime.test
3def test_web_service_creates_load_balancer():
4 # Create infrastructure in test mode (doesn't actually provision)
5 service = WebService("test-service", environment="test")
6
7 # Assert load balancer was created with correct configuration
8 def check_load_balancer(args):
9 assert args is not None
10 assert args.internal == False
11 assert args.enable_deletion_protection == True
12
13 pulumi.Output.all(service.load_balancer).apply(check_load_balancer)Or using Terratest with Terraform to test against real infrastructure:
1func TestWebService(t *testing.T) {
2 opts := terraform.Options{
3 TerraformDir: "../modules/web-service",
4 Vars: map[string]interface{}{
5 "name": "test-service",
6 "environment": "test",
7 },
8 }
9
10 // Cleanup after test
11 defer terraform.Destroy(t, &opts)
12
13 // Create actual infrastructure
14 terraform.InitAndApply(t, &opts)
15
16 // Verify outputs
17 loadBalancerDNS := terraform.Output(t, &opts, "load_balancer_dns")
18 assert.NotEmpty(t, loadBalancerDNS)
19
20 // Test the actual infrastructure
21 http_helper.HttpGetWithRetry(
22 t,
23 fmt.Sprintf("https://%s/health", loadBalancerDNS),
24 nil,
25 200,
26 "OK",
27 30,
28 3*time.Second,
29 )
30}Testing catches bugs before they reach production. You can test modules in isolation, validate outputs, even test against real infrastructure in ephemeral environments that are created and destroyed as part of the test suite.
IaC Tools Landscape
Different tools serve different needs:
Terraform/OpenTofu - The most widely used. Cloud-agnostic through providers (AWS, GCP, Azure, Kubernetes, and hundreds more). Large ecosystem of modules. Uses HCL (HashiCorp Configuration Language). OpenTofu is the open-source fork created after HashiCorp’s license change to BSL.
Pulumi - Infrastructure as code using real programming languages (Python, TypeScript, Go, C#, Java). Get full language features: loops, conditionals, functions, classes, testing frameworks. Good for complex logic and dynamic infrastructure.
CloudFormation - AWS-native. Deeply integrated with AWS services, often supporting new features before other tools. JSON or YAML. Free to use. Limited to AWS.
AWS CDK - Write infrastructure in programming languages that generates CloudFormation. Higher-level constructs (“create a load-balanced Fargate service”) compile to CloudFormation. Gets AWS feature support quickly since it uses CloudFormation underneath.
Crossplane - Kubernetes-native infrastructure management. Define infrastructure as Kubernetes custom resources. Infrastructure becomes part of your k8s cluster state, managed by controllers.
Ansible - Can be used for IaC but is more imperative. Better suited for configuration management than infrastructure provisioning.
The choice depends on your constraints: cloud providers, team skills, existing tooling, governance requirements, and complexity needs.
GitOps: The Workflow Pattern
Now let’s examine how changes flow through the system. GitOps is a workflow pattern that makes Git the single source of truth for both application and infrastructure state.
Core Principles
GitOps is built on four fundamental principles:
1. Declarative - Everything is described declaratively. You specify desired state, not procedures to achieve it. Kubernetes manifests, Terraform files, configuration files - all declarative.
2. Versioned and Immutable - All desired state is stored in Git. Every change is a commit. You have complete audit trail and can roll back to any previous state. Git becomes your infrastructure’s time machine.
3. Pulled Automatically - Software agents automatically pull desired state from Git and apply it to target environments. No manual kubectl apply or terraform apply. Changes happen automatically when Git changes.
4. Continuously Reconciled - Agents constantly compare actual state to desired state in Git. If they diverge (drift), the agent automatically remediates. Someone manually changes a deployment? It gets reverted to match Git within minutes.
How GitOps Works
The typical workflow:
1. Developer Makes a Change - Could be application code, could be infrastructure definition, could be configuration. Commits to Git and opens a pull request.
2. CI Runs - Automated tests, linting, security scans, policy checks. All validation happens before merge. The PR shows exactly what will change.
3. Review and Merge - Team reviews the change. After approval, PR merges to main branch. The merge is the deployment trigger.
4. GitOps Agent Detects Change - ArgoCD or FluxCD watches the Git repository. Sees the new commit within seconds (typical polling interval: 30-60 seconds).
5. Agent Applies Change - Pulls the new manifests/definitions, compares to current state, calculates what needs to change, applies the changes to the cluster or infrastructure.
6. Agent Reports Status - Updates sync status in Git (via commit status or PR comments). Updates the developer portal with deployment state. Sends notifications if configured.
7. Continuous Reconciliation - Agent periodically re-syncs (default: every 3 minutes) to catch any drift. If someone manually changes something, it gets reverted to match Git.
GitOps for Applications
The most common GitOps use case is deploying applications to Kubernetes.
Repository Structure:
my-app/
├── app/ # Application source code
│ ├── src/
│ └── Dockerfile
├── manifests/ # Kubernetes manifests
│ ├── deployment.yaml
│ ├── service.yaml
│ ├── ingress.yaml
│ └── configmap.yaml
└── .github/
└── workflows/
└── ci.yaml # Build and update manifestsGitOps Flow:
- Developer changes application code
- CI builds new container image, tags it (e.g.,
v1.2.3orsha-abc123) - CI updates
manifests/deployment.yamlwith new image tag - CI commits the manifest change back to Git
- ArgoCD sees the manifest change
- ArgoCD applies updated deployment to cluster
- Kubernetes rolls out new version
- Developer sees deployment status in portal
The key insight: developers never run kubectl apply. All changes flow through Git. Git is the source of truth for what should be running.
ArgoCD is the most popular GitOps tool for Kubernetes. You define an Application resource:
1apiVersion: argoproj.io/v1alpha1
2kind: Application
3metadata:
4 name: payment-api
5 namespace: argocd
6spec:
7 project: default
8
9 source:
10 repoURL: https://github.com/company/payment-api
11 targetRevision: main
12 path: manifests
13
14 destination:
15 server: https://kubernetes.default.svc
16 namespace: production
17
18 syncPolicy:
19 automated:
20 prune: true # Delete resources not in Git
21 selfHeal: true # Revert manual changes
22 syncOptions:
23 - CreateNamespace=trueArgoCD watches the manifests/ directory in the repository. Any changes are automatically applied to the production namespace. If someone manually changes a deployment with kubectl, ArgoCD reverts it within 3 minutes (default reconciliation interval).
FluxCD is another popular option, more focused on GitOps primitives and extensibility. Uses Kubernetes-native resources and is lighter-weight.
GitOps for Infrastructure
GitOps isn’t limited to Kubernetes - you can apply it to infrastructure:
Approach 1: Terraform + GitOps
Repository structure:
infrastructure/
├── modules/ # Reusable Terraform modules
│ ├── vpc/
│ ├── rds/
│ └── eks/
├── environments/
│ ├── dev/
│ │ ├── main.tf
│ │ └── terraform.tfvars
│ ├── staging/
│ │ ├── main.tf
│ │ └── terraform.tfvars
│ └── production/
│ ├── main.tf
│ └── terraform.tfvars
└── .github/
└── workflows/
└── terraform.yamlWorkflow:
- Changes to infrastructure code are committed to Git
- CI runs
terraform planon pull requests, shows what would change - After merge, CI (or Atlantis, or a Terraform controller) runs
terraform apply - Infrastructure changes are applied automatically
- State is stored remotely and locked during operations
Approach 2: Crossplane (Kubernetes-Native)
Crossplane turns infrastructure into Kubernetes resources. Infrastructure definitions are Custom Resource Definitions (CRDs) in your cluster:
1apiVersion: database.aws.crossplane.io/v1beta1
2kind: RDSInstance
3metadata:
4 name: payment-db
5 namespace: production
6spec:
7 forProvider:
8 region: us-west-2
9 dbInstanceClass: db.t3.medium
10 engine: postgres
11 engineVersion: "15"
12 masterUsername: postgres
13 allocatedStorage: 100
14 storageEncrypted: true
15 backupRetentionPeriod: 7
16 writeConnectionSecretToRef:
17 name: payment-db-connection
18 namespace: productionThis definition lives in Git. ArgoCD applies it to your Kubernetes cluster. Crossplane watches for RDSInstance resources and provisions actual RDS instances in AWS. Fully GitOps-native - infrastructure is just another Kubernetes resource.
When you need to change the database (increase storage, change instance class), you update the YAML in Git, commit, and Crossplane handles the update. The Kubernetes reconciliation model applies to infrastructure.
The Reconciliation Pattern
This is the magic of GitOps: continuous reconciliation.
Traditional Approach:
- Make a change
- Apply it manually or through automation
- Hope nothing changes it later
- Reality drifts from definitions over time
- Drift accumulates until the next big refactor/rebuild
GitOps Approach:
- Define desired state in Git
- Agent continuously ensures actual state matches desired state
- Manual changes are automatically reverted
- Configuration drift is impossible
- System self-heals
This pattern is borrowed from Kubernetes itself. The Kubernetes control plane constantly reconciles:
- You want 3 replicas of a pod
- A node dies, taking a pod with it
- ReplicaSet controller notices: 2 replicas != 3 desired
- Controller creates replacement pod
- System self-heals back to desired state
GitOps applies this reconciliation pattern to everything, not just pod replicas. Applications, infrastructure, configuration, policies - all continuously reconciled against Git.
How GitOps Connects to Platforms
Platforms use GitOps as the mechanism for applying changes safely and reliably:
Self-Service Infrastructure Provisioning:
- Developer uses portal to request a database
- Portal creates a Git commit with database definition (Terraform or Crossplane resource)
- Portal opens pull request automatically (or commits directly if review isn’t required)
- Team reviews (if policy requires) and merges
- GitOps agent sees the merge, applies the change
- Database is provisioned
- Portal queries status and shows the new database in service catalog
Application Deployment:
- Developer merges code to main branch
- CI builds container image, runs tests
- CI updates manifest repository with new image tag
- GitOps agent sees manifest change, deploys new version
- Portal shows deployment progress from ArgoCD status
Configuration Changes:
- Developer needs to change environment variable
- Updates the ConfigMap or deployment manifest in Git
- Creates pull request
- After review and merge, GitOps applies it
- No manual kubectl commands needed
Rollbacks:
- New deployment causes issues
- Team reverts the Git commit that changed the image tag
- GitOps automatically rolls back to previous version
- Or, team uses ArgoCD UI to roll back to previous revision
- ArgoCD updates Git to reflect the rollback
The platform orchestrates these workflows. Git provides the audit trail and source of truth. GitOps agents provide the execution and reconciliation.
GitOps Best Practices
Separate Application and Configuration Repositories - Application code lives in one repo (with CI/CD that builds images). Kubernetes manifests live in another repo (watched by GitOps). This separation provides:
- Different permissions (developers can change code, platform team controls production manifests)
- Cleaner audit trail for production changes
- CI can update manifests without triggering application builds
- Different review processes for application code vs. deployment configuration
Environment Promotion Patterns:
Several approaches work:
Branch per Environment - dev branch for development, staging for staging, main for production. Changes flow through branches (merge dev to staging, staging to main).
Directory per Environment - environments/dev/, environments/staging/, environments/production/. Each directory has manifests for that environment. Promotion is copying changes between directories.
Repository per Environment - Separate repos for each environment. Highest isolation, clearest audit trail, but more operational overhead.
Progressive Delivery Integration:
GitOps tools integrate with progressive delivery:
- Argo Rollouts for advanced deployment strategies (canary, blue-green with automatic analysis)
- Flagger for automatic canary deployments with metric-based promotion/rollback
- Define rollout strategies in Git
- GitOps tool manages the progressive rollout automatically
- Automatic rollback if metrics degrade during canary
Policy Enforcement:
Policies can be stored in Git and applied alongside resources:
- OPA policies as ConfigMaps
- Kyverno policies as Kubernetes resources
- Gatekeeper constraints
- GitOps applies policies when it applies other resources
- Policies prevent invalid resources from being created
Bringing IaC and GitOps Together
These patterns reinforce each other:
- IaC defines WHAT to create - the desired infrastructure state
- GitOps defines HOW changes flow - the workflow and reconciliation pattern
- Developer Portal provides WHERE developers interact - the unified interface
Complete Platform Flow
Example: “Deploy a new service with database”
1. Portal Interaction - Developer fills out form in portal: service name, runtime (Python), needs database (PostgreSQL).
2. Template Execution - Portal executes template that:
- Generates application skeleton code
- Generates Kubernetes manifests (using templates)
- Generates Terraform for database (using platform modules)
- Creates Git repositories for application and manifests
- Commits everything to Git
3. CI Pipeline - Triggered by initial commit:
- Builds application container image
- Runs tests and security scans
- Pushes image to registry
- Updates manifest repo with image tag
4. Infrastructure Provisioning - GitOps controller (Terraform Cloud, Atlantis, or Crossplane):
- Sees database definition in Git
- Runs Terraform to provision RDS instance
- Creates secrets with connection information
- Reports status back to portal
5. Application Deployment - ArgoCD:
- Sees new manifests in Git
- Creates Kubernetes resources (Deployment, Service, Ingress)
- Deployment pulls container image and starts pods
- Service is accessible via Ingress
- Reports sync status
6. Portal Visibility - Developer sees in portal:
- Service registered in catalog
- Link to Git repository
- CI build status
- Deployment status from ArgoCD
- Infrastructure status from Terraform
- Running pods from Kubernetes
- Metrics and logs
The platform orchestrates this entire workflow. IaC provides the execution engine for infrastructure. GitOps provides the workflow for safely applying changes. The portal provides the interface and orchestration.
Key Takeaways
For Engineering Leaders:
- IaC and GitOps are not just technical implementations; they enable audit trails, reproducibility, and safety at scale
- Investment in these foundations pays dividends through reduced incidents, faster recovery, and easier compliance
- These patterns work together - IaC without GitOps lacks workflow; GitOps without IaC lacks execution
- The combination enables true self-service while maintaining governance
For Platform Engineers:
- IaC should be structured in modules that encode organizational best practices, not just tool wrappers
- State management is critical - invest in remote state, locking, and encryption from day one
- GitOps reconciliation provides self-healing infrastructure that reverts drift automatically
- Testing infrastructure code is as important as testing application code
For Both:
- Declarative beats imperative for infrastructure - desired state is easier to reason about than procedures
- Git as single source of truth provides audit trail, rollback capability, and review process
- The reconciliation pattern (continuously comparing desired vs. actual state) is powerful beyond Kubernetes
- These patterns enable the self-service capabilities that developers access through portals
Looking Ahead
We’ve now explored the complete stack: from platform fundamentals through architecture and abstraction philosophy, to the interface layer (developer portals) and the foundational patterns (IaC and GitOps) that power platform capabilities.
In future posts, we’ll dive deeper into specific platform capabilities - deployment and release management, observability and monitoring, data platforms - exploring how they’re built using these foundations. We’ll also examine organizational structures, team topologies, and the cultural aspects of building successful platform teams.
The technical foundations are essential, but platform engineering succeeds or fails based on how well the platform serves its users. The tools and patterns we’ve covered enable the platform; organizational design and culture determine whether it’s adopted and valued.
This is the sixth post in a series exploring platform engineering in depth. Previous posts covered platform fundamentals, SRE relationships, platform architecture, building useful abstractions, and internal developer portals.