3 ways you’re screwing up platform engineering - and how to fix it

Why is it so easy to screw up platform engineering, and how do you undo the damage?

Platform engineering… it’s the trendy new buzzword. For a thing we’ve been doing for years, ever since someone said “what if we re-did Solaris zones, but called it Docker instead?” It means building an internal engineering platform for your digital services or data pipelines. It allows you to scale teams up and down, and supercharge their abilities to deliver outcomes.

But here’s a thing that nobody likes to talk about. It’s easy to totally screw up platform engineering. When that happens, there’s a huge negative impact on your engineering culture, and your teams aren’t able to achieve their goals. So how do you avoid screwing up platform engineering, and if it does happen to you, can you actually fix it?

I’ve spent years in platform leadership roles, building internal engineering platforms at different scaleups and enterprise organizations. I’ve had successes, and I’ve had failures. I’ll cover the following irreversible ways to screw up, and how to start reversing out of them:

  • Power tools - teams spend all their time configuring Kafka, Kubernetes, Istio etc., because the platform is based on overpowered tech
  • Technology anarchy - N teams do the same task in N different ways, because the platform has no opinions on tech choices, ways of working, or path to prod
  • Teams as tickets - making teams interact with the platform via tickets

And I’ll explain why all of these can be traced back to a scaling problem, the granddaddy screw up of platform engineering - your mindset is platform as a project, not as a product.

Talk Outline

This is based on my work in platform leadership roles in the following organizations:

  • 5 teams/20 μliths -> 8 teams/70μliths over 3 years
  • 10 teams/50 μservices -> 60 teams/600 μservices over 2 years
  • 1 team/1 μservice -> 40 teams, 120 μservices over 2.5 years

And a bunch of false starts in platform engineering, in other organizations!

My outline is:

  • Define what platform engineering is, and why it’s a good idea. Call out some classic bad ideas that won’t be covered e.g. on-prem platform, multi-cloud platform
  • Describe the impact of a platform engineering failure, how it can poison positivity in your engineering culture, how it can set back delivery teams by months
  • Discuss screw up #1 - power tools - what it is, how to measure for it, some real world examples, and how to change course if you’re in it
  • Discuss screw up #2 - technology anarchy - what it is, how to measure for it, some real world examples, and how to change course if you’re in it
  • Discuss screw up #3 - teams as tickets - what it is, how to measure for it, some real world examples, and how to change course if you’re in it
  • Explain why it’s all tied to a single scaling problem, a platform as a project mindset, even if you’ve got central funding and/or a platform product manager you’re still thinking and managing things as a project
  • Remind people that a platform is built as an enabler of business value, it’s built for the benefit of teams not the platform itself
  • Wrap up with takeaways

Takeaways

  • Platform engineering is about creating an enabler of business value, so your teams can move faster and safer than ever before
  • Avoid using platform engineering power tools e.g. Kafka, Kubernetes, Istio. If you’re using them, start measuring unplanned work and plan to migrate away
  • Avoid being unopinionated on tech stack, ways of working, and path to prod. If you’re doing that, capture your leadership team’s expectations and track team commitments
  • Avoid only interacting with teams through a ticketing system. If you’re stuck there, start collaborating with teams on self-service paved roads
  • The key to platform engineering success is a platform as a product mindset
platform engineering