Cloudfare outage post mortem

homura1650@lemmy.world · 2 months ago

Cloudfare outage post mortem

panda_abyss@lemmy.ca · edit-2 2 months ago

Classic example of how dangerous rust is.

If they had just used Python and ran the whole thing in a try block with bare except this would have never been an issue.

Edit: this was a joke, and not well done. I thought the foolishness would come through.

Zwuzelmaus@feddit.org · 2 months ago

So you think there is no error handling possible in Rust?

Wait until you find out that Pyhon doesn’t write the error handling by itself either…

The_Decryptor@aussie.zone · edit-2 2 months ago

Yeah, the Python equivalent would be something like this.

try:
    config = get_config()
catch:
    sys.exit(1)

It’s possible to handle these things, but if you explicitly don’t then you’ll discover them at runtime.

dan@upvote.au · edit-2 2 months ago

This can happen regardless of language.

The actual issue is that they should be canarying changes. Push them to a small percentage of servers, and ensure nothing bad happens before pushing them more broadly. At my workplace, config changes are automatically tested on one server, then an entire rack, then an entire cluster, before fully rolling out. The rollout process watches the core logs for things like elevated HTTP 5xx errors.

Thallium_X@feddit.org · 2 months ago

As a next step they should have wrapped everything in a true(while) loop so it automatically restarts and the program never dies

panda_abyss@lemmy.ca · 2 months ago

Exactly, while True: try: main(); except: pass;

jimmy90@lemmy.world · 2 months ago

honestly this was a coding cock-up. there’s a code snippet in the article that unwraps on a Result which you don’t do unless you’re fine with that part of the code crashing

i think they are turning linters back to max and rooting through all their rust code as we speak

SinTan1729@programming.dev · edit-2 2 months ago

I hope you’re joking. If anything, Rust makes error handling easier by returning them as values using the Result monad. As someone else pointed out, they literally used unwrap in their code, which basically means “panic if this ever returns error”. You don’t do this unless it’s impossible to handle the error inside the program, or if panicking is the behavior you want due to e.g. security reasons.

Even as an absolute amateur, whenever I post any Rust to the public, the first thing I do is get rid of unwrap as much as possible, unless I intentionally want the application to crash. Even then, I use expect instead of unwrap to have some logging. This is definitely the work of some underpaid intern.

Also, Python is sloooowwww.

panda_abyss@lemmy.ca · 2 months ago

I was joking, but oof it did not go over well.

SinTan1729@programming.dev · 2 months ago

Ah that makes sense. To be fair tho, there’s a lot of unwarranted hate towards Rust so it can be hard to tell.

panda_abyss@lemmy.ca · 2 months ago

I should bite the bullet and learn it.

I decided to learn zig recently, it feels like crafting artisanal software, which is what I liked C for. But it’s kinda janky in that each point version major features come and go (see io and async).

There’s a place for engineering software which is what rust seems great at. Definitely seems like a tool I could/would use as rust is taking over many of my tool workflows.