I can frame this as a blameless postmortem, but a "blameless postmortem" should never, ever mean "the people with power get a pass."
Blameless #postmortems are a way of getting to the root cause, they are a way of saying "if you push a button and it brings down production, the problem in most cases is that there was a button that can bring down production, not that you pushed it"
Your conclusion can still be "the leader made the call to launch because they wanted to give good news to Reagan."
heard that Twitter is DDoSing itself (?)
This is a good opportunity to announce I specialize in software perf & scalability. Reducing hosting costs. And parachuting in to solve hard bugs or otherwise "rescue" sites or projects farked up by a prior approach
as a paid consultant
#DDoS
#Twitter
#performance
#scalability
#scaling
#tuning
#CostReduction
#ResourceMinimization
#troubleshooting
#rescues
#rewrites
#systems
#RootCauseAnalysis
#regressions
#postmortems
#architecture
#efficiency
#SRE
#ddos #twitter #performance #scalability #scaling #tuning #costreduction #resourceminimization #troubleshooting #rescues #rewrites #systems #rootcauseanalysis #regressions #postmortems #architecture #efficiency #sre
It's fine to use names in post-mortems (~300w)
https://blog.danslimmon.com/2023/04/20/its-fine-to-use-names-in-post-mortems/ #devops #sre #postmortem #postmortems #incidentresponse
#devops #sre #postmortem #postmortems #incidentresponse
"Eventually this customer has had enough. They leave. This represents both a sizable blow to revenue and a scathing indictment of your product’s reliability at scale. But, on the bright side, both MTTR and MTBF benefit enormously! That’ll look great on the quarterly slide deck." (~700w)
https://blog.danslimmon.com/2023/04/04/incident-metrics-tell-you-nothing-about-reliability/ #sre #devops #incidentresponse #postmortems
#sre #devops #incidentresponse #postmortems
RT @dadideo
Une autre conférence bien intéressante de @tourainetech #TNT23, c'était les explications de @QuesnelLise sur les #PostMortems, élément important des #Ops, mais aussi le fameux #Feedback/#Monitor de la boucle #DevOps
https://youtu.be/zBjBq6uxp3M
#tnt23 #postmortems #Ops #feedback #devops
We do that anyway after incidents with #postmortems, but good time to reflect on procedures that we typically do.
Can absolutely recommend this practice, it also is a great time for the team to share past #incident stories with each other...
[4/6]
There's still value in low-technical postmortems.
What made this incident low impact? has your team implemented various safety nets to reduce harmful effects?
How did you know that a rollback was the right thing to do?
Could you have implemented a fix-forward instead?
Who else did you need to involve? or were you able to fully execute the incident and any runbooks by yourself without disrupting anyone else?
#Incidents #IncidentResponse #IncidentManagement #Postmortems #ICM #IRM
#incidents #incidentresponse #incidentmanagement #postmortems #icm #irm
@nova @hazelweakly As a seasoned developer/etc who's also had to do devops work, I deeply appreciate your postmortems. I love the transparency with the community.
And SO well done! And I'm actually going to borrow some of the sections for our company's. 💯 ❤️
#postmortems #devops #hachyderm
pleased with this slide of mine from our monthly major incident meta-review, encouraging us towards #LearningFromIncidents and away from focusing on incident statistics
the first half says: "The insights generated from reviewing incidents are primarily qualitative, because incidents are emergent behavior"
the second half says "There is no relationship between the impact of an incident and the quality of insights generated through the review process"
#LearningFromIncidents #postmortems #sre #incidentresponse
2011, Los Alamos, at a for-profit nuclear lab:
"Technicians settled on what seemed like a surefire way to win praise from their bosses: In a hi-tech testing and manufacturing building pivotal to sustaining America's nuclear arsenal, they gathered eight rods painstakingly crafted out of plutonium, and positioned them side-by-side on a table to photograph how nice they looked."
#postmortem #postmortems #nuclearsafety
#postmortems of failures in our technology, a whole forum to enjoy, recently updated with today's Cloudflare outage and a collection of other BGP mishaps:
(boosts welcome!)
#postmortems #postmortem #bgp #cloudflare
Today we experienced an issue with outbound mail for roughly an hour. The details are too extensive to go into here, but have been posted in our #postmortems channel at chat.mxroute.com.
Today we experienced an issue with outbound mail for roughly an hour. The details are too extensive to go into here, but have been posted in our #postmortems channel at chat.mxroute.com.
Jonathan Hall of the Tiny DevOps Guy podcast interviewed me for episode 2! https://www.youtube.com/watch?v=-i-zRZ8nRao I discussed how #aviation can give us lessons for tech. Some of the topics included #postmortems, human factors impacting performance, accident chains (in both aviation and IT), safe attitudes, etc.
Postmortem for Banshee outage this morning available in #postmortems channel at chat.mxroute.com (uses portal.mxroute.com login).
Postmortem for Banshee outage this morning available in #postmortems channel at chat.mxroute.com (uses portal.mxroute.com login).
Arrow, Lucy, Safari, and Friday are back online. Postmortem in the #postmortems channel at chat.mxroute.com.
Arrow, Lucy, Safari, and Friday are back online. Postmortem in the #postmortems channel at chat.mxroute.com.