Around the same time, Cloudflare’s chief technology officer Dane Knecht explained that a latent bug was responsible in an apologetic X post.
“In short, a latent bug in a service underpinning our bot mitigation capability started to crash after a routine configuration change we made. That cascaded into a broad degradation to our network and other services. This was not an attack,” Knecht wrote, referring to a bug that went undetected in testing and has not caused a failure.


I disagree. I don’t know the details of cloudflares bot detecion, but there are many automated vulnerability scanners that this could protect against.
I agree. Every crash is a failure by the designers. Instead it should be caught by the program and result in a useful error state. They probably have something like that but it didn’t work because the crash was to severe.
I am not complaining. I am informing you that you are missing an angle in your consideration. You can never prevent every crash ever. So when designing your product you have to consider what should happen if every safeguard fails and you get an uncontrolled crash. In that case you have to design for “fail open” or “fail closed”. Cloudflare fucked up. The crash should not have happened and if it did it should have been caught. They didn’t. They fucked up. But, i agree with the result of the fuck up causing a fail closed state.