Lead Site Reliability Engineer (SRE)

Full time Full day

Location: This position is open to all cities, with the option of remote working. 


#### We are seeking a talented and experienced Lead Site Reliability Engineer to join our growing team. As the Lead SRE, you will be responsible for overseeing the reliability, scalability, and performance of our company's infrastructure and applications. You will play a crucial role in designing, implementing, and maintaining systems and processes to ensure high availability and operational efficiency.

#### Responsibilities:

  • Lead a team of Site Reliability Engineers and provide technical guidance, mentorship, and support to ensure the team's success.

  • Collaborate with cross-functional teams, including Development, Operations, and Quality Assurance, to drive the adoption of best practices in reliability engineering.

  • Design and implement monitoring, alerting, and logging systems to proactively identify and resolve potential issues before they impact production.

  • Develop and maintain incident response and disaster recovery plans, and participate in on-call rotations to address critical incidents.

  • Perform system capacity planning and optimize infrastructure to handle current and future traffic demands.

  • Continuously improve system reliability, performance, and efficiency through automation, infrastructure as code, and other relevant techniques.

  • Conduct regular performance and security audits to identify areas for improvement and implement remedial actions.

  • Stay up to date with the latest industry trends, tools, and technologies related to Site Reliability Engineering, and make recommendations for their adoption.

#### Qualifications:

  • Bachelor's degree or above in Computer Science, Engineering, or a related field (or equivalent practical experience).

  •  6+ years of experience working as a Site Reliability Engineer or in a similar role like Cloud Architect or DevOps Engineer etc., with a proven track record of managing and leading technical teams.

  • Deep understanding of distributed systems, cloud architecture, and containerization technologies (e.g., Kubernetes, Docker).

  • Proficiency in scripting and automation using languages such as Python, Ruby, or Bash.

  • Strong experience with configuration management tools (e.g., Ansible, Puppet, Chef) and infrastructure-as-code frameworks (e.g., Terraform).

  • Knowledge of cloud platforms (e.g., AWS, Azure, GCP) and experience in building and maintaining scalable and resilient cloud-based applications.

  • Familiarity with monitoring and logging tools (e.g., Prometheus, Grafana, DataDog, PagerDuty) to ensure system visibility and detect anomalies.

  • Excellent problem-solving and troubleshooting skills, with the ability to analyze complex issues and provide effective solutions.

  • Strong communication and leadership skills, with the ability to collaborate with diverse teams and communicate technical concepts to non-technical stakeholders.

#### Preferred Qualifications:

  • Advanced certifications in cloud platforms or related technologies (e.g., AWS Certified DevOps Engineer, Certified Kubernetes Administrator).

  • Experience with implementing and managing CI/CD pipelines.

  • Knowledge of security best practices and experience in implementing security controls in a production environment.

  • Familiarity with agile methodologies and project management frameworks.

Job Type: Full-time

How to apply

To apply for this job you need to authorize on our website. If you don't have an account yet, please register.

Post a resume

Similar jobs

United Nations Pakistan
Full time Full day
IOM, as the leading UN agency in the field of migration, works closely with governmental, intergovernmental, and non-governmental partners. IOM is dedicated to promoting humane and orderly migration for the benefit of all. It does so by providing services and...
Full time Full day
Position Title: AM HR Automation Location: Karachi, Head Office Employment Type: Permanent Job Summary: To support and streamline HR operations and processes through the implementation and management of HR technology systems. The successful candidate will be responsible for designing, developing,...
UNICEF works in some of the world toughest places, to reach the world most disadvantaged children. To save their lives. To defend their rights. To help them fulfil their potential. Across 190 countries and territories, we work for every child,...