Manager Site Reliability Engineer
Auckland, New Zealand (Hybrid) · മുഴുവൻ സമയവും
അപേക്ഷിക്കുന്ന ആദ്യയാളാകൂ
- അനുഭവം
- 3+ yrs
- ശമ്പളം
- —
- ഓപ്പണിംഗുകൾ
- 1
- പോസ്റ്റ് ചെയ്തു
- 1 മണിക്കൂർ മുമ്പ്
- Work mode
- ഹൈബ്രിഡ്
- വിദ്യാഭ്യാസം
- Bachelor’s degree in Computer Science, Engineering, or a related field
- Eligibility
- Experienced engineering leaders with a background in SRE or database reliability engineering, especially those who have managed large-scale database systems and distributed infrastructure. Applicants should hold a bachelor’s degree in a relevant technical discipline and be able to work in a flexibl…
- Resume
- Required to apply
Where you'll work
ജോലി വിവരണം
About Workday
Workday is a global company known for its AI platform that helps organizations manage people, money, and agents. The business is focused on making work more rewarding for employees, customers, and the wider community. Its culture emphasizes integrity, empathy, and collaboration, with a strong belief in solving major problems through thoughtful ideas and genuine care. Team members are encouraged to be curious, optimistic, and bold while doing meaningful work in an environment that supports growth, learning, and long-term success.
About the Team
The Database Engineering group is responsible for delivering reliable, high-performance data services across Workday’s broad infrastructure. The team supports thousands of production and non-production databases spread across multiple data centers, cloud environments, and regions, with a strong focus on availability, scalability, and smooth day-to-day operations.
About the Role
Workday is looking for an experienced Engineering Manager to lead its Database Reliability Engineering function. This role is centered on building the next generation of data infrastructure by treating infrastructure like software and using cloud-native and open-source technologies at scale. You will guide a team of engineers focused on resilience, security, and performance across the data layer.
The position calls for someone who can replace manual database operations with automated, self-healing systems and who enjoys solving complex distributed systems challenges. The ideal leader combines software engineering and database expertise, can drive technical excellence, and helps create a strong team culture built on trust, accountability, and continuous improvement.
Responsibilities
- Lead a database reliability engineering team responsible for the uptime, stability, and performance of large-scale database environments.
- Shape and improve the architecture of data infrastructure with a focus on automation, resilience, and scalability.
- Reduce manual intervention by building self-healing and automated operational workflows.
- Guide incident response for major data outages, drive root-cause analysis, and help prevent repeat failures.
- Oversee observability practices for database systems using monitoring, alerting, and service-level tracking.
- Manage and prioritize SRE and reliability work, including toil reduction and backlog execution within Agile practices.
- Mentor senior engineers and support a high-performing, psychologically safe team environment.
- Lead deep technical troubleshooting involving database internals, Linux, networking, latency, and distributed systems behavior.
- Work across cloud platforms and database hosting models to support production workloads.
- Contribute to improving team effectiveness and overall engineering performance.
Requirements
- At least 3 years of leadership experience in SRE or database engineering teams focused on reliability, availability, and performance.
- 8+ years of experience in software or systems engineering, including 4+ years in SRE/DBRE roles.
- Strong knowledge of database internals such as engine tuning, replication design, and query optimization.
- Hands-on experience managing databases in Kubernetes using operators or stateful sets.
- Background in leading responses to critical outages and improving MTTR through structured RCA practices.
- Experience with monitoring and observability stacks such as Prometheus, Grafana, Datadog, or PMM.
- Solid understanding of Agile/Scrum and continuous improvement methods for reducing toil and automating operational work.
- Proven ability to coach and develop senior engineers while maintaining a high-performance culture.
- Capability to troubleshoot complex issues involving Linux internals, networking, and distributed latency.
- Experience operating database workloads on AWS and GCP.
- Understanding of team performance concepts and methods to improve effectiveness.
- Bachelor’s degree in Computer Science, Engineering, or a related discipline.
Flexible Work Arrangement
Workday uses a flexible work model that combines in-person collaboration with remote work. Employees in this setup spend at least 50% of each quarter working from the office or in the field with customers, prospects, or partners, depending on the role. This allows for a flexible schedule while still preserving regular opportunities for connection and teamwork. Remote home-office roles may also join office gatherings for key moments when needed.
Additional Information
Workday is committed to an inclusive and accessible hiring process. If you need support or a workplace accommodation during any stage of recruitment, you can contact the company by email for assistance.
If you were referred by someone at Workday, you should ask that person about the employee referral process.
For privacy and security reasons, Workday states that candidates should only apply through its official careers channels. The company does not ask applicants to use unofficial websites, and it does not request recruiting fees or payments for consulting or coaching in order to apply.