Startups are making their first DevOps hire too early
The moment young teams decide to hire for infrastructure, the math on what that decision costs, and what I'd do instead.
There's a specific moment when a young engineering team decides to hire its first DevOps engineer, and I've watched it happen enough times to describe it from memory. The team just raised, or someone lost a weekend to a deploy that went sideways, and in the retro somebody says the words "we need someone who owns infrastructure." Nobody pushes back, because pushing back sounds irresponsible, and a req goes up for a platform engineer at a company with eight people.
I think that moment usually comes a year or two before the workload it's reacting to actually exists. At five to fifteen engineers the infrastructure work is real, but it isn't a job yet, because it arrives in bursts: a bad week when something falls over, then months where the right amount of infrastructure work is none. Hiring a full-time person for burst work has a predictable result. A quarter later there's a Terraform repo where there wasn't one and a migration to a better deploy setup underway, and every change the product team ships now has a review queue, because there's finally a person whose job it is to be careful about these things. We wrote about that queue on the Encore blog recently, and the most consistent thing about it is that it forms around one or two specific people and the whole company ends up waiting on them.
The math deserves to be done out loud, because the anxiety that drives the hire rarely survives contact with it. A platform engineer runs somewhere between $150k and $200k fully loaded, more in the US hubs, while the realistic alternative at that company size is a managed platform bill that rounds to a tenth of it. The sharper number is the opportunity cost: at eight engineers that salary is another product engineer, and there is very little an infrastructure specialist can do at that scale that moves the company the way another person shipping product does.
The strongest counterexample I know points the same way. Monzo built one of the best platform teams in the industry, and by every account it was worth it at their scale, with hundreds of engineers and a bank attached. It also took years and a meaningful share of their engineering budget. What I find telling is what the people who lived that build did next: the ones I've talked to who went on to start new companies describe their platform decision as finding a way to skip the build rather than repeating it.
The honest test, if you think your team is at that moment, is to write down what the person would do in week six, after the deploy pipeline is fixed and the alerts are sane. If the answer looks like "start building our internal platform," the company is about to fund a platform it doesn't need yet, staffed by someone who will reasonably want to build it.
What I'd do instead at that size is pick the most boring managed services available and refuse to feel bad about it, and if the stack is on Terraform, spend the effort on making changes verifiable with policy checks and plan tests, so review stops depending on one person's full attention. AI tooling doesn't get you out of this one, by the way, I've written about why: the model can produce the HCL just fine, but the decisions inside each line are exactly the part it can't read from your code.
I work at Encore, so weigh this paragraph accordingly. The bet we've made is that the cleanest fix is removing the separate infrastructure layer entirely, so the application code declares what it needs and the infrastructure derives from it, with no second PR and no queue. What I can report from the inside is that the teams running this way surprised me. They're companies that have been in production for years, some at meaningful revenue, that simply never made the hire, and when I ask them about it the question seems to have stopped coming up at some point rather than ever being settled in an argument.
And there is a right time. If the infrastructure work is genuinely full-time for two people and still growing, or you're staring at real compliance or multi-region requirements, make the hire. Most teams aren't anywhere near that when the req goes up. They're paying the infrastructure tax a year before it's due.