Menu
micro1.

Site Reliability Engineer (LInE)

micro1.
full time remote mid

Job Description

Job Title: Site Reliability Engineer


Job Type: Contractor


Location: Remote


Job Summary:

Join our customer's team as an expert Site Reliability Engineer and play a pivotal role in ensuring the performance, reliability, and scalability of mission-critical infrastructure. You'll leverage your deep expertise in Linux, Kubernetes, and Prometheus to architect, monitor, and enhance robust systems supporting innovative applications.


Key Responsibilities:

  • Design, implement, and maintain scalable infrastructure using Linux, Kubernetes, and Prometheus.
  • Monitor system health, analyze performance metrics, and proactively address bottlenecks or potential failures.
  • Automate operational processes to minimize manual intervention and increase system reliability.
  • Respond swiftly to incidents, conduct root cause analysis, and drive continuous improvements in incident response procedures.
  • Collaborate closely with development and operations teams to deliver seamless deployments and high system availability.
  • Create comprehensive documentation and clear runbooks for operational excellence and knowledge sharing.
  • Champion best practices in SRE, security, and compliance across the customer's ecosystem.


Required Skills and Qualifications:

  • Expert-level hands-on experience with Linux system administration and troubleshooting.
  • Advanced proficiency with Kubernetes, including cluster deployment, operations, and management.
  • Deep knowledge of Prometheus for monitoring, metrics collection, and alerting.
  • Strong scripting abilities (Bash, Python, or similar) for automation and tooling.
  • Excellent written and verbal communication skills, with the ability to document and share knowledge effectively.
  • Proven track record in site reliability engineering or similar roles in high-availability environments.
  • Demonstrated commitment to proactive problem-solving and collaborative teamwork.


Preferred Qualifications:

  • Experience with other cloud-native tools (e.g., Grafana, Helm, Istio, or similar).
  • Certifications in Kubernetes, Linux, or cloud platforms.
  • Background in high-growth or large-scale production environments.