This week on Slight Reliability I had the honour of interviewing Courtney Nash about why mean time to recover (#MTTR) is an unhelpful metric, what she learned by analysing 10+ incident reports, and much more.
🕵🏽‍♀️ Instead of MTTR, let's focus on learning from incidents, observing patterns and themes, involving leadership, and adding an "accident investigator" lens after the fact to enhance the learning.
#MTTR #sre #devops #incidents #SlightReliability
This week on #SlightReliability I chat with Martin Thwaites from about #observability during #development (#ODD). Some of my takeaways:
đź’» How observability in development frees up developers to spend less time debugging and more time writing code.
🤖 That manual instrumentation is where the power is.
đź’° Keeping the cost of observability data down through a combination of head and tail based sampling. "Keeping every span of trace data is irresponsible".
#SlightReliability #observability #development #odd
This week on #SlightReliability... how do we prevent #observability from only generating value for a small set of engineers? How do executives, product managers, and other stakeholders leverage its power?
(You can also listen to Slight Reliability via most podcast platforms, or check out
#SlightReliability #observability
Unfortunately there is no #SlightReliability episode this week... So as is tradition, I have a haiku for you. #sre
Who else is going to be at AWS Summit in London on June 7th? Would be great to meet some of the community in person. #awssummit #aws #slightreliability
#awssummit #aws #SlightReliability
This week on Slight Reliability I chat to Ivan Merrill about his experiences implementing #observability in the real world. We discuss making observability part of onboarding, discussing risk to get leadership buy-in, inviting over inflicting practices, and much more.
#observability #sre #SlightReliability #reliability
Yesterday #SlightReliability reached 1k subscribers on YouTube! Just wanted to say thank you to everyone who has listened and joined in the discussion about #sre!
This week on #SlightReliability... what is "insight" in #observability? Are tool vendors lying to us about being able to provide it? Is it science? Art? Or magic? #sre
#SlightReliability #observability #sre
This week on #SlightReliability I reminisce from my #performancetesting days when I used to analyse complete sets of raw data using scatterplots, and ponder how we could apply this in #observability #sre
#SlightReliability #performancetesting #observability #sre
Last week on #SlightReliability I chated to Paige Cruz from Chronosphere about cognitive overload in #SRE. We chated about how SREs are often used as the Swiss army knives of the IT department, how as humans our RAM is maxed out, why you shouldn’t give your team a name like “The Lobsters”, and a whole lot more.
This was one of my very favourite interviews I've ever done.
This week on #SlightReliability I talk about how I think #observability promises more than what we're getting. I argue that it needs to look at more than technology in order to help us negotiate the ocean of chaos in the Digital Era. #sre
#SlightReliability #observability #sre
This week on #SlightReliability... what do we do with all our #telemetry data? Should we put it all in a data lake? Or is there another way we can pull insight together? #sre #observability
#SlightReliability #telemetry #sre #observability
My second official #SlightReliability blog, focusing on my #SRE takeaways from #reinvent. I explore serverless, observability data lakes, topologies (technology maps), FinOps, and more. Oh, and lots of #mspaint art!
#SlightReliability #sre #reinvent #mspaint
What is the future of #SRE? This week on #SlightReliability I'm joined by the hosts of the @oncallmemaybe podcast @adrianamvillela and @anamedina to discuss just this.
We discuss the role of #observability in SRE, recruitment tactics, company culture and leadership buy-in, cognitive load, leveraging the scale of community, and more.
#sre #SlightReliability #observability
How do you improve yourself as an #SRE or any other role in technology? This week on #SlightReliability I share the books I read in 2022 and what I gained from each. Perhaps one of them could be useful to you?
About to log off for the year. Thank you to SquaredUp for being an awesome employer, and to everyone who tuned into #SlightReliability (or read my articles) in 2022. Looking forward to hitting the ground running in 2023.
I hope you all have a well earned break, and if you're on call over the holiday period... may your incidents be few, and your MTTR extremely small. Oh wait, MTTR has been disproved or something hasn't it? How about, hope it goes smoothly? #sre #observability
#SlightReliability #sre #observability
About to log off for the year. Thank you to SquaredUp for being an awesome employer, and to everyone who tuned into #SlightReliability (or read my articles) in 2022. Looking forward to hitting the ground running in 2023.
I hope you all have a well earned break, and if you're on call over the holiday period... may your incidents be few, and your MTTR extremely small. Oh wait, MTTR has been disproved or something hasn't it? How about, hope it goes smoothly? #sre #observability
#SlightReliability #sre #observability
This week on #SlightReliability
Henrik Rexed (from Dynatrace) and I share our #observability new year's resolutions. We chat about #otel, continuous #profiling, using distributed #tracing in #testing, and much more. #sre
#SlightReliability #observability #otel #profiling #tracing #testing #sre
This week on #SlightReliability
Henrik Rexed (from Dynatrace) and I share our #observability new year's resolutions. We chat about #otel, continuous #profiling, using distributed #tracing in #testing, and much more. #sre
#SlightReliability #observability #otel #profiling #tracing #testing #sre
This week on #SlightReliability I chat to Gwen Berry and Steve Gill about starting an #SRE team from scratch. We discuss failing at #SLO adoption, being on-call as a junior engineer, single pane of glass #observability, and much more.
#SlightReliability #sre #slo #observability