Sr. Site Reliability Engineer III (6448)

6 days, 4 hours ago
Full-time
Senior
Software Development
MetroStar

MetroStar

MetroStar builds innovative technology solutions designed to enhance and accelerate the missions of government agencies, leveraging a rich legacy of expertise in the digital age.

IT Services
251-1K
Founded 1999

Description

  • Design, deploy, and maintain mission-critical application workloads in virtualized or containerized environments such as VMware or Kubernetes.
  • Develop and sustain automated CI/CD pipelines, monitoring, and configuration management workflows across development, integration, staging, and production environments.
  • Provision, configure, and maintain developer environments and toolchains that support secure and efficient software delivery.
  • Identify friction across the software development lifecycle and implement solutions that improve the developer experience.
  • Establish and maintain customer trust through deep technical expertise and mission-focused problem solving.
  • Support operational observability and reliability for highly available production systems.
  • Participate in incident response, root cause analysis, and continuous improvement activities.

Requirements

  • Active Top Secret clearance or higher.
  • Certification meeting DoD 8140 requirements, such as Security+ or higher.
  • Bachelor’s degree in Computer Science or a related engineering field preferred; relevant experience may substitute.
  • 7+ years of experience in software development, systems engineering, or operations roles focused on availability, performance, and reliability.
  • Experience blending software engineering and systems administration practices to support highly available, scalable applications.
  • Experience designing and managing monitoring, alerting, and observability solutions to meet Service Level Objectives.
  • Experience with Ansible and Desired State Configuration.
  • Experience with GitLab CI/CD automation and Bash scripting.
  • Experience with Kubernetes, including container-native storage and object storage solutions such as MinIO, S3-compatible services, or PortWorx.
  • Experience with enterprise load-balancing solutions such as F5 or similar platforms.
  • Ability to contribute immediately with minimal ramp-up in a mission-critical operational environment.
  • Essential personnel designation with potential work during government shutdowns, emergencies, or other critical situations.

Benefits

  • Salary range of $185,000 to $230,000.
  • Eligible for performance-based bonuses and additional incentives based on individual and company performance.
  • Company-paid training and/or certifications.
  • Referral bonuses.
  • Health, dental, and vision insurance.
  • 401(k) retirement plan with company match.
  • Paid time off and holidays.
  • Parental leave, dependent care, flexible work arrangements, professional development opportunities, and employee assistance and wellness programs.

Interested in this position?

Apply directly on the company website

Apply Now

Similar Roles

Senior Site Reliability Engineer (DevTools)

Nebius 51-250 Internet Software & Services

Nebius is hiring an SRE for its DevTools team to maintain and improve large-scale developer infrastructure that supports builds, artifacts, and version control workflows for its AI cloud platform.

CI/CD GitLab Go Java Kotlin Python Ruby Spring TeamCity
3 hours ago

Senior Site Reliability Engineer (SRE)

The Investigo Group Professional Services

The Investigo Group is hiring a Senior Site Reliability Engineer to operate and mature its production Kubernetes and OpenShift platforms across secure on-premises and hybrid environments.

Ansible Argo CD CI/CD Flux GitHub Actions GitOps Go Grafana Helm Juniper Kubernetes Linux Load Balancing Machine Learning OpenID Connect OpenShift OpenTelemetry Palo Alto Prometheus Python SAML Shell Scripting Terraform
3 hours, 51 minutes ago

Staff Site Reliability Engineer, Production Engineering

Dropbox 1K-5K Internet Software & Services

Dropbox is hiring a Site Reliability Engineer to define and drive company-wide reliability strategy for an AI-enabled engineering environment, with the goal of strengthening stability, observability, incident response, and operational excellence at scale.

3 hours, 59 minutes ago

Senior Cloud Resilience Architect

Blink Health 251-1K Health Care Providers & Services

Blink Health is hiring a disaster recovery and resilience architecture leader to strengthen the reliability of its healthcare technology platforms and critical patient-facing systems.

Ansible AWS Azure CloudFormation DNS GCP Kubernetes Load Balancing Pulumi Terraform
4 hours, 12 minutes ago

You're on a roll! Sign up now to keep applying.

Sign Up

Already have an account? Log in

Used by 14,729+ remote workers