Have db logs killed our
PostgreSQL on AWS?
Just an introduction from our CTO post on LinkedIn – you can view the full version here and for more of Krzysztof’s posts – click here.
I am not going to say – we were caught off-guard at the last minute, but during preparation for soak testing, one of our applications became unresponsive for no apparent reason. After a short investigation, we found that a single AWS RDS instance had run out of space. At first glance, we had no idea why 🤭!
Luckily the impacted environment was only used for testing and had limited access from the outside world. When the environment went down, no load tests were being run, and we were pretty confident that there will not have been any significant amount of data processed and stored that day.
One might be tempted to throw more 💰💰💰 at it by increasing the storage space and move on. But what used up all the disk space? No exciting cliff-hanger here – it was a database log file!
Investigating into the root cause, we were taught a lesson by AWS about its product internals. For others to quickly benefit from our learnings, I’ve outlined below a TLDR version (disclaimer: for brevity, we will be discussing a single AZ setup)