Required Skills - advanced knowledge of:
AWS — 3+ years of hands-on experience
Infrastructure as a Code (Terraform)
Ansible automation
Kubernetes — 2+ years of in-depth experience deploying production applications / containers orchestration
CI/CD (GitLab, Jenkins or Bamboo)
Python or Golang
Best practices and IT operations in an always-up, always-available mission critical service
Desired Experience:
Implementing observability and monitoring in AWS, using Splunk / ELK / similar
EKS, ECS , ECR
Working in an agile environment, focused on rapid cycles and CD
Supporting, analyzing, and troubleshooting large-scale distributed mission-critical systems
Building software and/or platforms where security, regulatory compliance and high availability are critical
Responsibilities:
Set up, integrate, and maintain a scalable, stable set of CI/CD tools to support development, testing, and security scanning
Be accountable for a large-scale SaaS app w/a mission-critical customer base
Manage multiple tools, infrastructure, and roles in a fast-paced environment
Own the availability of our SaaS infrastructure and application
Implement best-in-class AWS solution using infrastructure as code
Collaborate with engineering and product to continuously improve service availability and quality
Be involved in the entire production lifecycle: code deployments, infrastructure management, and troubleshooting
Share ownership w/the Dev team, and own service availability and proactive issue prevention, using structured troubleshooting to mitigate issues
Work closely with our Dev and DevOps teams to ensure that our production services are secure, scalable, performant, and resilient