Why I Stopped Using PyTorch Lightning

For two years I reached for PyTorch Lightning on every project. It promised to remove boilerplate, and it did — right up until the day the boilerplate was the only thing standing between me and a bug I could not see.

The abstraction tax

The trouble with a framework that owns your training loop is that it owns your training loop. When the loss curve goes flat at 3am, you are no longer debugging your code. You are debugging someone else’s idea of how your code should run.

The training loop is not boilerplate. It is the experiment.

A plain loop is fifteen lines you can read top to bottom:

for epoch in range(epochs):
    for batch in loader:
        opt.zero_grad()
        loss = model(batch).loss
        loss.backward()
        opt.step()

What I switched to

Nothing exotic. Raw PyTorch, a thin train.py, and a single config file. The result is more lines and far fewer surprises — which, when you are the only person paged about the run, is the trade you want.