Skip to content

Site Reliability Engineering (SRE) & CloudOps Runbooks Automation

CloudOps Automation to spend less time doing operations
  • Challenges
    • Application Support and Operations takes up to ~30%?of Developer time
    • Documentation Advantages BUT it takes X months kick-off ~10%? ongoing
  • Goals:
    • Faster Resolution of Issues
    • Simpler Escalations
    • Easier Onboarding
    • Better Training
    • Better Discipline
    • Automation:
      • Improve Outcomes & Lower MTTR (mean time to repair)
      • Reduces Manual DevOps β€˜toil’
      • Auto Remediations
      • Increase Observability

Why CloudOps Runbooks Automation based on Python & Jupyter Notebooks

  • Online – Collaborative β†’ Improve team collaboration
  • CloudOps: Python, JupyterLab
  • Cloud Governance as Code: CloudCustodian (open-source rules engine)
  • Documentation via text/markdown
  • Easy Automation

CloudOps Runbooks

  • Runbooks
  • Configurations
    • SSO Credentials:
    • Input Params: region, AMI ID, etc. β†’Name | Description | Value | Required columns
    • Output Params for each Action
  • Actions
  • Others: GitOps

Cloud Custodian

AWS

Azure