No way at all? FB has a lot of servers and a lot of users. They have opportuniti...

byroot · on March 13, 2017

They graph the number of incidents. Even if a bad release impact only 0.3% of the user base, it's still an incident.

They have to investigate it, revert or fix the bad code and start the deployment process again.

rumcajz · on March 13, 2017

Ack. Also, the system introduces a new problem: If you are deploying on weekend, most devs are not around to help solving the problem. Thus, the outages would be longer.

rb2k_ · on March 13, 2017

That is usually how all companies at that scale release code (at least the ones that I know of).

Only that it's not 365 groups, because at the size of FB that would be several million people.

eh78ssxv2f · on March 14, 2017

They might already be doing A/B testing (or in your example A1/A2/.../A365). It's not clear what's the definition of an "incident". At my workplace, even if a bad code push affects say 0.1% of users, it would be classified as an incident.

rumcajz · on March 13, 2017

That's deploying on weekends, isn't it?