I'm working on this problem. I think the solution is white-box testing
from the ground up, so that software encodes not just the rules about
what to do but the specific scenarios that the vibe-coder considered.
Because we only encode rules and not intentions we paint ourselves
into corners where it's always easier to add less layers of code than
it is to rework existing layers. But this is hard to do with the
existing stack, because unix was invented before we appreciated the
power of tests, and it makes it too hard to write tests for blocking
IO, or for the 'find' command, or for what happens when a program runs
out of memory or disk. What we need is a simplistic OS and a comprehensive
library of fakes
(https://en.wikipedia.org/wiki/Mock_object#Mocks.2C_fakes_and_stubs)
for all OS concepts. This is what I'm working on.