Eating Crow in the Disk Queue

Note: This is a fictionalized account. If it sounds exactly like your company, that’s just because dysfunction has a standard operating procedure. Eating Crow in the Disk Queue “Sometimes the fix you mocked last time is the only fix this time.” We had another P1 today. Last outage, I gave the team grief for restarting the server before finding the root cause. Restarting without diagnosing felt like hitting CTRL+ALT+DEL on your career. I chided them because, in my mind, we needed to understand the why before we reached for the power button. ...

September 28, 2025 · 3 min

The Disk Explosion Holiday Special

Note: This is a fictionalized account. If it sounds exactly like your company, that’s just because dysfunction has a standard operating procedure. The Disk Explosion Holiday Special Most people spend Easter weekend hiding eggs. We spent ours hiding free disk space. I was on vacation when it started. The data warehouse had been sluggish for months. Queries ran long, ETLs limped through the night, dashboards arrived somewhere between breakfast and lunch. We were already nearly out of disk space, but instead of fixing the cause, someone decided the solution was obvious: rebuild everything. ...

September 21, 2025 · 3 min

Metastore Admin by Accident

Note: This is a fictionalized account. If it sounds exactly like your company, that’s just because dysfunction has a standard operating procedure. Once a month, the permissions in Databricks vanish. Not all of them, just enough to ruin your morning and make everyone wonder who angered the SCIM gods this time. The parent company’s fix? Put multiple sub-companies on the same tenant and let “random competent-looking people” inherit admin rights. Not the actual admins. Not cloud architects (we don’t have one). Just whoever looks like they’ve touched a database before. ...

September 7, 2025 · 3 min