· Casper van Elteren · Statistics  · 13 min read

When Does It Break? An Introduction to Hazard Analysis

The logit models whether something happens within a window. Hazard analysis opens that window and asks when the event occurs.

One of the things I enjoy most about mathematics is that the same formal idea can keep reappearing in places that, at first glance, seem unrelated. A compact expression shows up in one problem, solves it elegantly, and then later turns out to be useful somewhere else entirely. That kind of migration is part of what makes the subject feel alive to me.

In the previous post on the logistic and logit functions, I followed one such path. The story began with population growth and resource limits, and then moved into probability models where the main question was whether an event happens within a fixed window.

This post continues that story, but it changes the question slightly. Instead of asking whether an event happens, we ask when it happens. That shift may sound small, but it changes the mathematics in an important way. Once time itself becomes part of the object we want to model, we need a language that keeps track not just of risk, but of how that risk unfolds as the system changes.

Into the forest we go!

Imagine you are a forest warden responsible for a local woodland full of squirrels, deer, raccoons, and other wildlife. Sitting in a fire watch above a quiet lake, you write down a simple question: given occasional lightning strikes, how likely is it that the forest catches fire?

How should we study a problem like this? My first instinct is usually to ask for a probability: what is the chance the forest burns this season? But the longer I sit with that question, the less satisfying it becomes. A single probability hides almost everything interesting.

One obvious factor is the rate at which lightning strikes occur. If lightning is frequent, then the chance that some strike eventually ignites a tree is higher. But lightning alone is not enough. Conditions also matter: the air may be wet or dry, leaves may be damp or brittle, and fuel may or may not be available.

That last point already tells us something important. For a fire to start, there must first be trees to burn. The state of the forest therefore depends not only on lightning, but also on the rate at which trees grow, die, and recover after disturbance.

Once we start thinking this way, the problem changes shape. The forest is not a static object with a single fire probability attached to it. It is a system driven by events unfolding through time: trees grow, lightning strikes, fires spread, and burning trees burn out. What we really need is not just a probability, but a way of describing how the system’s internal clock is being pushed around by those events.

Defining the Hazard object

A logit model maps an unrestricted predictor into a probability in. That is the right tool when each observation has a fixed outcome window. In the forest example, we might ask: what is the probability that the forest catches fire this summer, given the current dryness, tree density, and lightning conditions? Once the time window is fixed, the model returns a single bounded probability.

That is useful, but it also compresses the whole temporal story into one number. A fire probability ofdoes not tell us whether the danger is concentrated in the first week, rises steadily as the forest dries out, or spikes only after lightning becomes frequent. To answer questions like that, we need an object that keeps time visible.

We call this object the hazard; the hazard is the instantaneous event rate, conditional on the event not having happened yet. More informally, it tells us how fast the clock toward the event is running at a particular moment. Formally, we write

This definition is worth reading slowly. We look at a very small time intervaland ask for the probability that the event happens inside that interval, given that it has not happened before time. That gives us a small conditional probability, but it still depends on the width of the interval. A wider interval gives a larger probability, and a narrower one gives a smaller probability. To remove that dependence, we divide by the interval lengthand then letshrink to zero. The result is no longer a probability over an arbitrarily chosen window, but an instantaneous rate. That is why the hazard is not itself a probability. It is a rate: probability per unit time in the infinitesimal limit.

A high hazard means the event is likely to happen soon, given that it has not happened already. In the forest setting, the hazard would be low after heavy rain, higher during a dry spell, and perhaps higher still when dry fuel and lightning coincide.

If the hazard tells us how quickly the event is threatening to occur, then the natural companion question is: what is the probability that the event has still not occurred by time? This is the survival function:

whereis the time until the forest catches fire. So if, that means there is a 90% chance the forest has still avoided fire by day 10.

In that sense, hazard analysis takes the bounded-probability story from the previous post and adds a clock to it. The logit asks how predictors map to risk over a fixed window. Once that window is chosen, the answer is a single number: the probability that the event occurs by the end of the interval.

Hazard analysis works one level underneath that summary. Instead of asking only for the final probability, it asks how quickly the system is moving toward the event at each moment. That is what the language of rate is meant to capture. A probability answers “how much chance is there by time?” A hazard answers “how fast is that chance building right now, given that the event has not happened yet?”

This difference matters because the same overall probability can hide very different temporal stories. A forest might have a moderate chance of burning this season because danger is steadily present throughout, or because the risk is low for months and then spikes during a brief dry period with frequent lightning. A single summary probability does not distinguish those cases. A hazard function does.

So the hazard can be read as the local speed of the event clock, while the survival function records how much event-free probability remains as that clock keeps running. The logit gives us a bounded summary of risk. The hazard and survival functions let us see how that risk accumulates, changes, and reveals itself through time.

Up to this point, I have spoken as if there were a single event time, such as the time until the forest first catches fire. To build a simulator, though, it helps to move one level down. Instead of tracking only that final event, we describe the smaller state changes that can eventually produce it: growth, ignition, spread, and burnout. This is a broader setting than classical one-event survival analysis, but it uses the same central idea. Each possible change comes with its own instantaneous rate.

Defining the System

We can now make the system more explicit. To describe the forest mathematically, we need two ingredients: a description of the current state, and a list of the event channels that can change that state.

Letdenote the state of the forest at time. In the simplest version of the model, each cell is in one of three states:

  • empty
  • occupied by a tree
  • burning

From the current state, we can count how many cells are empty, how many contain trees, how many are burning, and how many tree-burning neighbor pairs exist. These counts matter because they feed into the rates at which events occur.

To stay consistent with the logic above, we should define each event channel in the same way we defined the hazard: as a probability over a very short interval, divided by the interval length, and then taken in the limit as the interval shrinks to zero.

So for an event channel, we write

This says: once the system is in state, each possible event has its own instantaneous rate. For a small interval, this means

Only after writing the rates this way do we choose a simpler model for what the functionsshould actually be. A common assumption is that each possible local opportunity contributes independently and at the same per-unit-time rate. Under that assumption, total event rates become “rate per opportunity” times “number of opportunities.”

With that simplification, a natural choice is

whereis the number of empty cells, so more empty space means more opportunities for growth;

whereis the number of trees, so more trees means more possible lightning targets;

wherecounts tree-burning neighbor pairs, so fire spread becomes more likely when burning trees sit next to unburned ones; and

whereis the number of burning trees, so more active fire means more chances for burnout events.

Note that although the system evolves through time, this formulation is memoryless in the Markov sense: once the current stateis known, the future event rates depend only on that state, not on the particular history by which the system arrived there.

These formulas are therefore not a new idea. They are the simplified concrete version of the same hazard logic as before. For example, if one empty cell has growth rate, then in a short intervalit grows with probability approximately. If there areempty cells, the total growth rate becomes.

The total rate is then

This quantity is the total clock speed of the system. Whenis large, something is likely to happen soon. Whenis small, the system is likely to wait longer before the next event. Once we have these state-dependent rates, the next question is immediate: how do we turn them into an actual simulation of event times? We can use an algorithm from computational chemistry, the Gillespie algorithm.

Turning the Formalism into a Simulator

So far, we have reduced a complicated system to its essential ingredients: discrete event types, each with its own rate. The next step is to turn that description into a simulator. In the forest example, those events are things like tree growth, lightning strikes, fire spread, and burnout. The Gillespie algorithm gives us a way to do exactly this, but it helps to first see why the waiting time to the next event takes the form it does.

Letdenote the waiting time until the next event occurs. Then

is the probability that no event has happened by time. This is the survival function of the waiting time distribution.

Now ask a slightly more local question: what is the probability that no event has happened by the slightly later time? That is,

We can rewrite this using conditional probability:

This decomposition is the key step. It says that surviving until timecan be broken into two pieces: first survive until time, and then survive the additional small interval.

Now we use the fact that, once the current stateis fixed, the event rates are constant until the next event occurs. The rates are therefore not constant for the entire trajectory, but they are constant between jumps. If the total rate in the current state is, then over a sufficiently short intervalwe have

In words: conditional on no event having happened yet, the probability that some event occurs in the next tiny interval is approximately the total rate times the interval length.

The complementary probability is therefore

Substituting this into the previous identity gives

Rearranging,

Dividing byand taking the limit gives the differential equation

This is a first-order linear differential equation, and its solution is

So, conditional on the current state, the waiting time to the next event has an exponential survival function. Equivalently, the waiting time itself is exponentially distributed with rate.

This gives a clear interpretation of the total rate. Ifis large, the survival probability decays quickly, so the next event tends to happen soon. Ifis small, the decay is slower, so the system tends to wait longer. If, then no event can occur from that state.

Since the survival function is

the waiting timehas exponential density

This tells us how to sample the time until the next event. Because the system does not change between events, the rates remain fixed during that waiting period. If, we can sample the waiting time by inverting the exponential distribution:

Solving forgives

Sinceis itself uniformly distributed on, this is usually written more simply as

So the logarithm appears because we are inverting the exponential cumulative distribution function. Once the waiting time has been sampled, we choose which event occurs using probabilities proportional to the individual event rates:

In practice, this is done by drawing another uniform random number and comparing it to the cumulative normalized rates. After that, we update the system state, recompute the rates, and repeat the procedure. This is the Gillespie algorithm: a direct simulator of competing hazards in continuous time.

From Logistic to Hazard

The move from logit to hazard is a move from outcome to process. A logit can tell us the chance that something happens by the end of a window. A hazard asks how the event is being approached moment by moment, given that it has not happened yet. That may sound like a small change in emphasis, but it leads to a very different way of thinking. Once events are governed by rates, time is no longer hidden inside a summary probability. It becomes part of the model itself. And from there, simulation is not an extra trick added at the end, but a natural extension of the same idea.

Interactive simulator

This embedded model shows the Gillespie dynamics directly: total hazard controls the waiting time, normalized rates control which event happens next, and the heatmap highlights local ignition risk. The simulator is written in Rust, compiled to WebAssembly, and embedded here as a standalone canvas app.

Open full simulator
Back to Blog

Related Posts

View All Posts »
Are We in the Chinese Room?

AI may not only simulate understanding; it may also tempt us into workflows where we produce competent outputs without preserving understanding ourselves.