Ops

Are we ready to scale Production?

Useful for

Startup playbookProduction readinessOperations

Introduction

Production is a different level of commitment. By this point the company is no longer just proving that the product can work. It is making an operational promise to customers, investors and itself.

The exact architecture will depend on cost, customer promises, data requirements and the maturity of the product, but the principle is that production should be designed to survive meaningful failure.

Why this stage matters

Scaling Production is not only about adding compute. As customer count, usage and dependency on the product increase, the tolerance for downtime shrinks and the cost of weak operations rises.

This gate exists because resilience should follow actual risk. Early Production may tolerate planned downtime or cold standby. As usage grows, the business may need warm standby, hot failover, active-active architecture, stronger support and more formal operating evidence.

The decision

Production is ready to scale when the business understands usage patterns, customer expectations, resilience triggers, support load and unit economics well enough to invest deliberately.

Platform resilience

Scaling Production changes the resilience conversation. Early Production may be able to tolerate planned downtime, cold recovery or out-of-hours maintenance. As usage grows, the number of people affected by downtime grows, and the acceptable recovery position can change quickly.

The purpose of this control is to make resilience investment deliberate. The company should know when cold standby is still acceptable, when warm or hot failover is justified, and when active-active or multi-region design becomes a business requirement rather than an engineering preference.

Front Door or an equivalent controlled public edge is part of that conversation. Public ingress, WAF, private backends and routing-layer failover should support the promise the company is making.

Unit economics

Scaling should not hide weak economics. More customers can increase revenue while quietly damaging margin if infrastructure, support, payment, tenant or resilience costs grow faster than expected.

The goal is not perfect financial modelling. The goal is enough understanding to decide when extra resilience, support coverage, customer isolation or platform complexity is justified by customer value and risk reduction.

Support and onboarding

As Production scales, support and onboarding become part of the product experience. The company must be able to bring customers on consistently, answer trust questions quickly and support the service without overloading the team.

The support model should grow with customer expectation. A small early customer base may accept limited hours and slower responses. Larger or more regulated customers may require clearer support hours, stronger incident communication and more repeatable onboarding evidence.

Incident management

Incident management has to mature as Production scales. As customer dependency grows, the team needs to prove that it can coordinate, communicate, fail over, restore and recover without inventing the process under pressure.

Rehearsals are the evidence that the incident process works. They should test the full operating model, not only the technical recovery step. The team should know who coordinates, who communicates, who makes decisions, how evidence is captured and how post-mortem actions reduce the chance of repeat incidents.

Related guidance

Container platform decisions Cost governance and unit economics Incident management Contracts and support promises Payments and billing Multi-tenancy and customer isolation Customer trust pack Data protection assurance Device and endpoint governance Testing and release quality

Summary

The company should understand when to invest in stronger resilience, whether the economics still work, whether support can cope, and whether incident management has been rehearsed at the current level of customer dependency.

How Brokenhouse helps

Turn this into a practical plan.

I help technology teams turn this guidance into decisions, implementation plans, governance evidence and production-ready operating models.

Talk through your situation

Next guidance

Related decisions to work through

View playbook

Ops

Startup playbook: from POC to Production

This is a CTO playbook for augmenting the agentic SDLC with the company work that sits around the software. Most startup writing focuses on building the product. This playbook focuses on the identity, governance, data protection, delivery, cloud and operational decisions that allow a small SaaS company to move from idea to production without creating avoidable risk.

Ops

Is the company ready?

The first few months of a software business are not just about building the product. They are about creating the conditions that allow the product to be built, deployed, governed and supported without the company tripping over its own foundations.

Ops

Can we start the POC?

Before starting the POC, there is a small amount of governance that should be put in place. This is not about slowing the team down or pretending to be an enterprise. It is about creating enough shape that the first few months do not become a mess of forgotten passwords, inconsistent names, unclear decisions and accidental access.