Software Development

What if your OOM was not just a memory problem?

Sometimes an investigation tells a different story than the one you expected.

That’s what happened to me recently while investigating why a pod was ending up OOMKilled two to three times a day.

A quick look at the memory of the incriminated pod doesn’t show the typical rising curve of a memory leak. I’m missing data just before the OOM (because it’s always when your metrics system is migrating that this kind of incident happens), but with the day’s data, the cause seems to lie elsewhere.