We invite a Senior DevOps Engineer to join an ambitious project in Abu-Dhabi focused on transforming a high-load Autonomics platform into a multi-region active/active architecture. You will collaborate closely with the core development team to design and implement highly available, disaster-resilient infrastructure using open-source technologies across multiple clouds (AWS, Azure, GCP).
Requirements
- 8+ years of experience in DevOps or Infrastructure Engineering
- Proven delivery of active/active multi-region architectures under tight deadlinesExpert in Kubernetes multi-cluster/multi-region deployment and configuration
- Deep knowledge of distributed systems, CAP theorem, and conflict resolution
- Experience ensuring SLAs of 99.9%+ with disaster recovery and chaos testing
- Hands-on with Percona MySQL multi-master replication
- Expertise with Apache Cassandra, OpenSearch, Redis in production-scale distributed setups
- Strong background in cross-region network design, load balancing, DNS failover
- Proficient in Infrastructure as Code tools (Terraform, Helm)
- Experience in performance/load testing, chaos engineering, and observability setup
- Skilled in developing runbooks and executing knowledge transfer
- Fluent English and excellent communication skills
Responsibilities
-
Deliver full active/active multi-region deployment within 12 weeks
-
Transform a single-region platform into a resilient cross-region system
-
Implement user routing, disaster failover, and latency optimization
-
Design and maintain high-availability database clusters with multi-region replication
-
Collaborate with developers and architects to validate infrastructure design
-
Ensure observability, monitoring, and alerting for all regions
-
Conduct chaos testing and performance validation
-
Update CI/CD pipelines for active/active deployments
-
Prepare comprehensive documentation and train core team for handover
Preferred Qualifications
-
Certified Kubernetes Administrator (CKA) or similar certification
-
Background in financial services or other mission-critical industries
-
Familiarity with service mesh solutions like Istio or Linkerd
-
Experience with automated testing and CI pipelines
-
Practical knowledge of multi-cloud strategies (AWS, Azure, GCP)