<p><strong>Summary</strong></p><p>We are seeking an experienced Site Reliability Engineer II to help build, maintain, and scale our cloud‑native (Azure) environment. This role partners closely with development and operations teams to ensure high reliability, scalability, security, and efficiency. The ideal engineer is passionate about automation, observability, cloud infrastructure, and SRE best practices.</p><p> </p><p><strong>Key Responsibilities</strong></p><ul><li>Design, implement, and manage Azure cloud infrastructure using Terraform and Terragrunt</li><li>Maintain, monitor, and optimize Kubernetes clusters (AKS)</li><li>Build and manage CI/CD pipelines using GitHub Actions/Workflows and ArgoCD in a GitOps model</li><li>Enhance reliability through monitoring, alerting, and observability using Grafana (Prometheus, Loki, Tempo is a plus)</li><li>Automate operational tasks to reduce manual toil</li><li>Participate in on-call rotations, incident response, and post-mortem reviews</li><li>Collaborate with development teams to improve application reliability, performance, and scalability</li><li>Implement and advocate for SRE practices including SLIs, SLOs, and error budgets</li><li>Continuously improve infrastructure performance, cost efficiency, and security posture</li></ul><p>Please note, these are hybrid in Alpharetta, GA 3x/week and does have an on call rotation 2-3x/month. However, this won’t ever go later than 10:00pm ET.</p>