Share this Job

Please be advised that our Careers site will be unavailable from November 28 at 12am ET to November 29 12am ET for scheduled system maintenance.

Title:  Senior System Reliability Engineer (SRE), FCORE




Requisition ID: 118457

Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.


The Team 
As a member of the Financial Crimes and Operational Risk Engineering (FCORE) Systems Reliability team,  the System Reliability Engineer (SRE) will collaborate with application teams, infrastructure teams, and business partners to continuously improve the stability, and reliability of FCORE systems through Site Reliability Engineering (SRE) based practices that will include continuous people, process and technology (“automating all the things”) enhancements in support of our rapidly changing technology product portfolio.



The Role 
You will work cross-functionally amongst a variety of teams and be a contributor in all significant engineering service or solutions delivered to FCORE business partners. You will also have an understanding ‘what could go wrong’, help to solve complex problems and have a flare for communicating and participating in discussions with technical and business partners. You will work directly with our Software Engineering teams to both maintain and operate our existing technology and help manage and maintain end-to-end performance for all applications, workloads and environments.

Some of the key accountabilities include:
•    Champions a customer focused culture to deepen client relationships and leverage broader Bank relationships, systems and knowledge
•    Champions Stability and Reliability across a portfolio of applications and services:
o    Work with / Coach service owner teams on continuously improving system reliability in terms of reduced downtime and MTTR metrics
o    Work closely with service owner teams to lead troubleshooting of our most severe incidents – including leading senior stakeholder communication or driving problem-solving (e.g., log analysis, non-invasive tests)
o    Participate in major incident root cause analysis and blameless post-mortem activities to ensure we take action to avoid similar problems in the future
o    Lead in-depth technical and data analysis to gauge service trends and drive improvements
o    Contribute to prioritization of reliability features
o    Contribute to the design, development and delivery of effective tooling, alerts, and automated responses to identify and address reliability risks.
o    Champion operational & release engineering processes and enforce structure, including documentation, training and runbooks, escalations, RCAs and post-mortems to ensure systems are well understood and work smoothly, and recover gracefully in case of unexpected failure
•    SLO Adoption: 
o    Contribute to the management of Service Level Objectives (SLO’s) with senior engineering and business leads which may include enhancements in the way availability, latency and overall system health is measured and monitored. Create SLI/SLO/SLA models & defect budgets
o    Contribute to the proactive communication of reliability and stability results (based on SLO’s), service health, and key reliability risks to senior business and technology stakeholders – to prioritize activity and direct investment



What You Will Bring to Succeed 
Must Haves:
•    Degree in Computer Science, Engineering, or equivalent experience. 
•    8+ years experience in software development and/or Infrastructure Administration with at least 3 years in a leadership capacity
•    Experience working with large-scale distributed systems
•    Experience with analyzing and troubleshooting systems
•    Experience with Unix-based operating systems internals (ie. filesystems, system calls), and/or with networking or cloud systems.
•    Excellent communication (both verbal and written). The ability to communicate confidently and clearly on conference calls, in meetings, via email, etc. at all levels of the organization is essential
•    Ability to quickly and clearly communicate incident status via email in business friendly language
•    Experience with ITSM tools (ServiceNow, a plus) with strong understanding of SRE and service management principles
Nice to Have
•    Systematic problem-solving approach, coupled with effective communication skills and a sense of drive.
•    Experience with Peformance and Capacity Management (PCM) tools (ie. Dynatrace, Splunk)
•    Well-rounded broad knowledge of OS platforms (Linux/UNIX), Networking, Web Systems and IT Ops
•    ITIL Foundation Certification


The Workplace 
•    We are technology partners who help the business transform how our employees around the world work 
•    We have an inclusive and collaborative working environment that encourages creativity, curiosity, and celebrates success! 
•    You'll get to work with and learn from diverse industry leaders, who have hailed from top technology companies around the world 



Location(s):  Canada : Ontario : Toronto 

Scotiabank is a leading bank in the Americas. Guided by our purpose: "for every future", we help our customers, their families and their communities achieve success through a broad range of advice, products and services, including personal and commercial banking, wealth management and private banking, corporate and investment banking, and capital markets.  

At Scotiabank, we value the unique skills and experiences each individual brings to the Bank, and are committed to creating and maintaining an inclusive and accessible environment for everyone. If you require accommodation (including, but not limited to, an accessible interview site, alternate format documents, ASL Interpreter, or Assistive Technology) during the recruitment and selection process, please let our Recruitment team know. If you require technical assistance, please click here. Candidates must apply directly online to be considered for this role. We thank all applicants for their interest in a career at Scotiabank; however, only those candidates who are selected for an interview will be contacted.

Job Segment: Software Engineer, Cloud, Systems Engineer, Computer Science, Engineer, Engineering, Technology