Senior DB SRE

Designation: Data Tier SREs

Roles and Responsibilities:

Engage, influence, and promote SRE practices with development, operational, and product groups to align technology service/solution delivery.
Drive quality accountability within the organization with well-defined processes, metrics, and goals.
Manage availability, latency, scalability, and efficiency of Shared Services development by instilling engineering reliability into our development life cycle with a focus on fault-tolerant approaches.
Must be able to define and report "progress" on strategic initiates and project-level tasks to all stakeholders including senior executives and clients and use practical communication approaches with each constituency.
Implement metrics-driven processes to ensure service quality targets are met.
Manage system availability, health and service levels (SLAs, SLOs) of the large-scale cloud infrastructure, running in AWS and GCP.
Proactively monitor, diagnose, analyze failures, and provide support for software engineers to debug production issues across microservices and distributed platforms. Work with development team in resolving the issues found.
Participate in on-call rotation and resolution of issues in multi-cloud (AWS/GCP) environment.
Monitor metrics and performance of applications and cloud infrastructure.
Manage code releases, i.e., push code and patches on cloud.
Own entire lifecycle of incidents (incident management), including reporting, analyzing, handling incidents, all the way up to its closure and writing RCAs

Qualification:

Bachelor’s or Master’s degree in Computer science, Information Science, Electronics and Communication.
Minimum 6-7 years of DevOps/SRE experience.
3+ years hands-on experience with AWS or GCP, EC2 (GCE), IAM, S3 (GS), Docker, Kubernetes pods, Jenkins, Prometheus, CloudWatch (Stack Driver), Linux, Ansible.
3+ years’ experience in deploying code and infrastructure in AWS or GCP using continuous integration/continuous delivery (CI/CD) tools in production environments.
3+ years of automation using python or/and Golang or/and shell scripting.
4+ prior experience in developing metrics to monitor health of infrastructure and applications.
3+ years of experience in managing SaaS applications infrastructure with REST based test automation experience using python.
The candidate should have a thorough understanding of networking fundamentals (TCP/IP, UDP, DHCP, DNS, ICMP, AR, routing and switching).
General understanding of distributed systems.
Understanding of data management technologies including relational and non-relational databases.

Additional Information:

Certification on AWS etc is a BIG plus.
Knowledge of build pipeline/infrastructure like Jenkin, GitHub, CICD would be added advantage.
Work in an agile and highly collaborative environment with our globally distributed engineering teams, architecture, product management, and operations.
Maintain excellent written and verbal communications with clients, employees, and management chain, including status reports, project plans, presentations, etc.
Basic understanding of Terraform or CloudFormation or any IaC code is preferred.
Ideally detailed understanding of IP routing, Security and Cloud services such as CGNAT, IPSec, IDP and SDWAN/SDN for different customer use cases.

Time zone interactions: US and Tokyo times

Location: Bengaluru

Save Job

Karnataka, INDIA
(80)

Karnataka, INDIA

red_red1982@gmail.com