OH HELL YEAH GUESS WHOSE CONFERENCE TALK JUST GOT ITS VIDEO UPLOADED?! THAT'S RIGHT!
Check it out!
"Hacking the Pachyderm: Scaling Servers and People" in all of its glorious beauty.
https://www.youtube.com/watch?v=qmt0ouHFgwY
(here's the slides + abstract for those curious: https://www.usenix.org/conference/srecon23americas/presentation/weakly)
Thanks again for doing this with me, @esk! It was awesome!
The final look of #srecon
This was a phenomenal conference, and I loved every moment of it. Meeting all my friends in person for the first time was such a fun experience. Making new friends was fantastic too!
I know I didn't get to say goodbye to everyone, but I'm sure I'll see y'all again and I'm looking forward to it! I'm definitely coming next year, and I can't wait to see what's in store ❤️
I had so much fun hanging out with Kelly today! It's so awesome to vibe and just chat about sociotechnical systems, security, runtimes, chaos, resilience, and so much more fun stuff ❤️
I think I accidentally got her interested in TLA+ too, whoops...
#srecon #hazelatsrecon #extrovertsadoptingintroverts
If incident response is a team sport, I want some cheerleaders in my next incident 👀
@norootcause is giving an absolutely baller talk right now. Lots of connections going off right now
@dmagliola find more about queues at https://github.com/dmagliola/happy_queues
@dmagliola this would've been excellent for hachyderm as well. We would get queue latencies of hours because the database was slow and the file system was jacked. Having something like "jobs must finish in X time" would've given us upper bounds to judge the system performance against and would've helped us narrow more unknowns down. Particularly query times being unreasonably long.
@dmagliola making queues built around latency also let's you enforce the contract however you can. Which also means evicting the jobs whenever they violate the contract.
If a job wants to run very soon, it has to start up fast and complete fast.
@dmagliola I particularly like the latency thing here because it ties directly into knowing how to build metrics and alerts around things. One thing hachyderm ran into was an inability to figure out (for a long time) how bad "bad" was for various queues.
We figured it out eventually, but we did it by black box testing, and it was awful
@dmagliola name queues after their latency
within_X_time
And then keep that promise!
(This would make mastodon's queues so much easier to understand and optimize, omg)
@dmagliola
"The one thing we care about is latency"
Yes! We ran into this the hard way when figuring out what worked for hachyderm.
Latency or bust.
@dmagliola queues are broken really means
A job didn't run...
... Yet (but I think it should've)
... So it's late (in my opinion)
Ergo, "broken"
Queues should give indication themselves as to what it means for them to perform as expected or not. The vocabulary we use gets us trapped in locally bad decisions.
@dmagliola Why am I making a story about obvious mistakes and telling you this? Because it happens!
A series of obvious steps can lead to things that look wrong in hindsight
@dmagliola answer to people sticking everything in the "important" queues: create queues for purpose, not for priority
@dmagliola "no matter what we do the queue problem keeps happening"
Lol. Yes. Too real.
Listening to @dmagliola talking about happy sidekiq queues
Could've used some of this a while ago 👀
@paigerduty Paige talking about "ACKing the page"
My dumb ass: "hi Paige! Hiii"
Hey everyone! So stoked! Day one of the conference let's gooooo
#srecon #srecon23 #hazelatsrecon