#Awesome#A curated list of Site Reliability and Production Engineering resources.
A collection of postmortems. Sorry for the delay in merging PRs!
A curated collection of publicly available resources on how technology and tech-savvy organizations around the world practice Site Reliability Engineering (SRE)
Compilation of public failure/horror stories related to Kubernetes
A collection of postmortem templates
#Awesome#A curated list of Site Reliability and Production Engineering Tools
An Incident Management Process / Post Mortem Template
How to run effective incident post-morterms
Selection of Development Templates
💀 🔥 ❄️ A basic analyzer for memory dumps containing managed code
Compilation of public incident/interesting/horror stories related to Kafka operations
Perform post-mortem Linux baselining and forensic analysis.
Post-mortem debugging tool for Python that provides direct access to variables and frames after exceptions occur. Rich tracebacks, frame inspection, and context execution without separate interactive ...