Keyboard shortcuts

Press or to navigate between chapters

Press S or / to search in the book

Press ? to show this help

Press Esc to hide this help

Resilience testing

  • Intentionally cause problems during the work day and see how the tools and the team react.
  • Randomly kill processes and compute servers in production to see how the monitoring system and the whole team reacts
  • Do this often during work hours and reduce the risk of such thing happening during the nights.
  • Fix any issues. Learn.
  • Netflix Chaos Monkey