What You’ll DoDefine SLIs/SLOs, maintain error budgets, and drive platform reliability.Implement safe CI/CD with automated tests, blue/green & canary rollouts (Argo Rollouts) and auto-rollbacks.Harden security: image signing, SBOM, secrets management, PodSecurity, NetworkPolicies, and just-in-time access.Improve observability: OpenTelemetry pipelines, logs/traces correlation, dashboards, and SLO reporting.Optimize costs: right-size resources, Karpenter provisioning, HPA/VPA tuning, FinOps practices.Lead incidents and postmortems; create runbooks, templates, and training.Partner with Product, Backend, and Security teams on capacity, compliance, and roadmap planning.Tech You’ll Work WithAWS, EKS, Argo CD & Rollouts, Terraform/Terragrunt, GitHub Actions, Prometheus/Grafana, OpenTelemetry, Elastic APM, Secrets Manager, Cilium, Aurora/DynamoDB, SQS/SNS/Kafka.