DevOps Engineer - Observability & AIOps
DevOps Engineer - Observability & AIOps
Location: Philadelphia
Department: Software Development & Engineering - Observability & AIOps
About the Role
We're looking for a skilled DevOps Engineer to join our Observability & AIOps team. This role is at the heart of ensuring that client's large-scale distributed systems are reliable, observable, and intelligently automated. You'll design, build, and maintain platforms that provide deep visibility into our services, leverage AI/ML for operational insights, and drive automated incident response.
Key Responsibilities
- Build & Maintain Observability Infrastructure
- Deploy, configure, and manage tools for metrics, logs, and traces (e.g., Prometheus, Grafana, ELK stack, OpenTelemetry, Jaeger, Datadog, Splunk).
- Ensure telemetry data is complete, accurate, and accessible across systems and environments.
- Automation & CI/CD Integration
- Integrate observability tools with CI/CD pipelines (Jenkins, GitLab CI, ArgoCD).
- Automate deployment and scaling of monitoring agents using infrastructure-as-code (Terraform, Ansible, Helm).
- AIOps & Intelligent Alerting
- Collaborate with data scientists and platform engineers to feed clean observability data into AI/ML pipelines.
- Implement anomaly detection, alert deduplication, and predictive maintenance solutions.
- Incident Management & SRE Practices
- Partner with SRE teams to define and monitor SLIs/SLOs.
- Reduce mean time to detect (MTTD) and mean time to resolve (MTTR) through automation and intelligent alerting.
- Contribute to incident response playbooks and post-incident reviews.
- Dashboards & Developer Experience
- Build and maintain custom dashboards that visualize service health and performance.
- Provide self-service observability tools that empower development and operations teams.
- Treat observability as a product, focusing on usability, reliability, and scalability.
Qualifications
- 3+ years of experience as a DevOps Engineer, SRE, or similar role in large-scale, cloud-based environments.
- Solid knowledge of observability concepts (metrics, logs, traces) and tools (Prometheus, Grafana, ELK stack, OpenTelemetry, etc.).
- Hands-on experience with cloud platforms (AWS, GCP, or Azure), Kubernetes, and Docker.
- Proficiency with automation and IaC tools (Terraform, Ansible, Helm).
- Familiarity with incident management tools (PagerDuty, OpsGenie, ServiceNow).
- Strong scripting skills (Python, Bash, or similar).
- Excellent problem-solving and communication skills, with an ability to work across teams.
Preferred Skills
- Experience applying AI/ML techniques to IT operations or monitoring.
- Knowledge of SRE practices (SLIs/SLOs, error budgets).
- Background in high-scale, distributed systems.
Example Projects You Might Work On
- Rolling out distributed tracing with OpenTelemetry across microservices.
- Building anomaly detection models for latency and error rates.
- Automating remediation for common failure modes to enable self-healing systems.
- Reducing alert fatigue with intelligent noise suppression and correlation.
GCS is acting as an Employment Business in relation to this vacancy.
DevOps Engineer - Observability & AIOps
Other similar jobs
Popular job searches
Your next job
starts here.
JOB SPECIALISMS
LATEST JOBS
TOP SEARCHES
LOCATIONS
- IT Support & Infrastructure
- Project Management
- Software Development
- Manufacturing & Production
- BI & Data Analytics
- Engineering Technology
- Engineering
- .NET/C#
- Controls & Automation
- Cyber
- Network security consultant
- Python developer
LATEST JOBS
- Controls Engineer
- Python Developer
- Network Engineer
- Cloud Security Engineer - Cont...
- Marketing Representative
- Head of Sales
- C# Developer Role - Hybrid - B...
- Information Security Programme...
- Business Development
- Customer Engagement Manager
- Account Manager
- Senior / Lead Consultant
TOP SEARCHES
LOCATIONS
- Engineer
- Data Scientist
- Senior Data Scientist
- Head of Data Science
- Trainee Data Scientist
- Data Science Graduate
- Senior Financial Accountant
- Management Accountant
- Cost Accountant
- Civil Engineer
- Senior Civil Engineer
- Civil Design Engineer