Site Reliability Engineer III
f5
Date: 1 day ago
City: Hyderabad
Contract type: Full time
At F5, we strive to bring a better digital world to life. Our teams empower organizations across the globe to create, secure, and run applications that enhance how we experience our evolving digital world. We are passionate about cybersecurity, from protecting consumers from fraud to enabling companies to focus on innovation
Everything we do centers around people. That means we obsess over how to make the lives of our customers, and their customers, better. And it means we prioritize a diverse F5 community where each individual can thrive Position Summary Software engineering is a core discipline at F5 for many roles. As a software engineer specializing in site reliability, you will bring a software engineering and automated solution mindset to your work The Site Reliability Engineer III will be responsible for ensuring the reliability, availability, and scalability of critical systems and SaaS platforms. Systems under the care of an SRE III must operate effectively and reliably through scalable builds and deployments, frequent releases, and complex architectures that encompass modern technologies. You will work closely with technical and non-technical teams throughout the organization to facilitate the design and implementation of scalable solutions, drive automation initiatives, and monitor and maintain the performance of critical systems What You’ll Do
Everything we do centers around people. That means we obsess over how to make the lives of our customers, and their customers, better. And it means we prioritize a diverse F5 community where each individual can thrive Position Summary Software engineering is a core discipline at F5 for many roles. As a software engineer specializing in site reliability, you will bring a software engineering and automated solution mindset to your work The Site Reliability Engineer III will be responsible for ensuring the reliability, availability, and scalability of critical systems and SaaS platforms. Systems under the care of an SRE III must operate effectively and reliably through scalable builds and deployments, frequent releases, and complex architectures that encompass modern technologies. You will work closely with technical and non-technical teams throughout the organization to facilitate the design and implementation of scalable solutions, drive automation initiatives, and monitor and maintain the performance of critical systems What You’ll Do
- Apply modern engineering principles and practices to operational functions and employ this methodology throughout the full system lifecycle; from initial concept and architecture through deployment, daily operation, and overall optimization, and apply these practices to refining existing systems
- Support and maintain technology systems to ensure optimal performance, reliability, and security
- Scale systems sustainably through mechanisms such as automation and evolve systems by fostering changes that improve velocity
- Troubleshoot and resolve complex issues, including systems failures, connectivity problems, and performance bottlenecks
- Partner with cross-functional teams to design and implement scalable and robust system architecture to improve services on an ongoing basis
- Investigate various open source and proprietary technologies, components, libraries, tools etc. and help build a highly available, highly scalable and easily manageable system
- Apply observability and data skills to proactively measure system performance, diagnosing services/needs and quickly identify solutions
- Participate in service operation and RCA activities and assist with defining SLOs and SLIs for business stakeholders
- Implement and enforce security best practices to protect our systems, data, and infrastructure against unauthorized access, cyber threats, and vulnerabilities
- Create and maintain comprehensive knowledge bases for system documentation, including standard operating procedures, configurations, and troubleshooting guides, to support end-users' ability to use the systems effectively
- Participate in on-call rotation
- Responsible for upholding F5’s Business Code of Ethics and for promptly reporting violations of the Code or other company policies
- Performs other related duties as assigned
- A code-first approach to managing resources across cloud and SaaS platforms
- Expertise in managing Docker container applications and orchestrating Kubernetes clusters
- Proficiency in Agile methodologies, DevOps principles, SRE practices, and associated tools and technologies
- Strong capability to support web applications running on Tomcat, Apache, NGINX, and Node.js
- Comprehensive administration skills for core platforms, including backups, recovery, monitoring, maintenance, and upgrades
- Experience in scripting and automation with Infrastructure as Code (IaC) tools such as Azure Resource Manager, AWS CloudFormation, Ansible, or Terraform
- Proficiency in writing YAML code to build and manage Azure DevOps pipelines
- Practical experience with CI/CD pipelines and tools, specifically in writing YAML code for Azure DevOps
- Expertise in Azure resource management and operations
- Strong familiarity with Linux system internals and administration
- Understanding of compliance and regulatory guidelines
- Solid grasp of cybersecurity principles and best practices
- Demonstrated ability to work independently and collaboratively as an integral member of an agile team
- Experience with observability tooling, including logging infrastructure, time series metrics databases, tracing systems, and alert definitions
- Proficient communication, planning, problem-solving, troubleshooting, and organizational skills
- Flexibility to adapt to changing project requirements and timelines
- BS/BA or equivalent work experience
- 5+ years' experience as a software engineer specializing in site reliability similar role in a technology environment
- Technical confidence and familiarity with DevOps tools and SRE Practices
- Strong proficiency in scripting and/or programming languages (Python, Bash, TypeScript or Java preferred)
- Hands on experience with technology systems tools, protocols, and platforms
How to apply
To apply for this job you need to authorize on our website. If you don't have an account yet, please register.
Post a resumeSimilar jobs
Lead Software Engineer
Capgemini,
Hyderabad
5 days ago
Job Description
Works in the area of Software Engineering, which encompasses the development, maintenance and optimization of software solutions/applications. 1. Applies scientific methods to analyse and solve software engineering problems. 2. He/she is responsible for the development and application of software engineering practice and knowledge, in research, design, development and maintenance. 3. His/her work requires the exercise of original thought...
Kafka
Capgemini,
Hyderabad
1 week ago
Job Description
Works in the area of Software Engineering, which encompasses the development, maintenance and optimization of software solutions/applications. 1. Applies scientific methods to analyse and solve software engineering problems. 2. He/she is responsible for the development and application of software engineering practice and knowledge, in research, design, development and maintenance. 3. His/her work requires the exercise of original thought...
Risk & Control Insights Assistant Manager
Lloyds Technology Centre,
Hyderabad
1 week ago
End Date
Thursday 27 February 2025
We Support Flexible Working – Click here for more information on flexible working options
Flexible Working Options
Hybrid Working
Job Description Summary
As a Risk & Control Insights Assistant Manager, you'll be integral in transforming how we manage our risk and control environment, upping the pace as we do more to grow, focus &...