Back to Startup playbook
Ops

Incident management

Useful for

Common knowledgeOperationsRisk management

Introduction

Policies and procedures are only useful if the team can follow them under pressure. Rehearsals turn governance from paperwork into operational muscle memory.

Knowledge scope

This is common CTO knowledge. It applies beyond the startup journey, but the public playbook places it where it usually becomes important for an early-stage company.

Why it matters

An incident is not finished just because the first error has gone away. The company needs response, recovery, communication capture, breach decision records, post-mortem learning and a way to return the system to a controlled state.

How it fits the playbook

This reference supports the Pilot Ready -> Pre-Production Ready stage of the startup CTO playbook. It gives the public context for the decision without exposing the deeper assessment method behind the agentic operating model.

Design considerations

  • Separate response from recovery so temporary fixes are not mistaken for closure.
  • Use an incident coordinator to manage communication, decisions and timelines.
  • Record breaches and suspected breaches, including whether the ICO should be contacted and why.
  • Use runbooks for restore, failover, rollback and emergency access.
  • Use post-mortems to reduce repeat risk without blame.

What good looks like

The team can coordinate under pressure, understand what actually happened and turn incident learning into owned risk reduction.

How Brokenhouse helps

Turn this into a practical plan.

I help technology teams turn this guidance into decisions, implementation plans, governance evidence and production-ready operating models.

Talk through your situation

Next guidance

Related decisions to work through

Ops

Are we ready for Pre-Production?

Before moving from Pilot to Production, the company needs a pre-production governance stance. This is the point where the business has to decide what promises it is prepared to make, who is allowed to make changes, who can accept risk, and what evidence must exist before the production environment is created.

Ops

Are we ready to scale Production?

Production is a different level of commitment. By this point the company is no longer just proving that the product can work. It is making an operational promise to customers, investors and itself.

AI

Agentic software delivery governance

Agents used by the delivery team need a different governance model from AI models embedded in the product. Delivery agents may not sit in the customer-facing service, but they can still read code, write code, inspect logs, summarise documents, generate infrastructure changes or draft customer-facing material.