Share this Job

Please be advised that our Careers site will be unavailable from November 28 at 12am ET to November 29 12am ET for scheduled system maintenance.

Title:  Senior Manager, Site Reliability Engineering, Scotia Digital (Vancouver Hub)




Requisition ID: 156320

Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.


As a member of the Scotia Digital Engineering Operations team,  the Senior Manager of Site Reliability Engineering (SRE) will closely collaborate with application teams, infrastructure teams, other technology operations, and business partners to continuously improve the stability, reliability and efficiency of digital banking systems through SRE based principles and practices that will include continuous people, process and technology enhancements in support of our rapidly changing technology product portfolio.


You will work cross-functionally amongst a variety of technology and business teams and be a core contributor in all significant digital banking service or solutions delivered to key stakeholders.  You will also have a technical understanding ‘what could go wrong’, contribute to solving complex problems and have a flare for communicating and leading discussions with technical and business partners. You will work directly with multiple engineering teams to both maintain and operate our existing technology and transforming to our next generation of technologies.  You will leverage your deep experience with IT Service Delivery and IT Service Management to standardize and improve operations, analysis and service levels across the digital banking portfolio. 


 Is this role right for you?


  • Work in collaboration with Application Development, Infrastructure and Network teams to Champion SRE culture and practices
  • Collaborate with multiple core technology teams to ensure service stability, engineering for solutions and lead resiliency work streams for core digital services
  • Work closely with Development and operations teams to lead troubleshooting of our most severe incidents – leading senior stakeholder communication, driving problem-solving (e.g., log analysis, non-invasive tests) and debugging with best practice techniques
  • Contribute to definitions of application Service Level Indicators and management of Service Level Objectives with senior development and engineering teams
  • Contribute to initiatives to continuously refine our build, plan and deploy practices for improved stability, reliability, efficiency, repeatability and security. You’ll create plans, collaborate with other SROs and engineering team members - coordinating activity with development and business leads to increase service levels, lower costs, and support stability or resilience objectives
  • Have a strong sense of urgency from problem detection to recovery, with data driven decision making instinct
  • Participate in continuous improvement and execution of quality and timely major incident root cause analysis and blameless post-mortem activities to ensure we take action to avoid similar problems in the future
  • Contribute to prioritization of reliability features and contribute to the design, development and delivery of effective tooling, alerts, and automated responses to identify and address reliability risks.
  • Lead in-depth technical and data analysis to gauge service trends and drive improvements.
  • Contribute to proactive technical communication of reliability, stability, and efficiency results (based on Service Level Objectives), service health (via dashboards) key reliability risks and issues to senior business and technology stakeholders – to prioritize activity (based on trend analysis) and direct investment and action
  • Non standard hours and overtime are occasionally required, to meet the operational demand of 24/7 digital product and services


 Do you have the skills that will enable you to succeed in this role?


  • Superb communication (both verbal and written). The ability to communicate confidently and clearly on conference calls, in meetings, via email, etc. at all levels of the organization is essential
  • Ability to quickly and clearly communicate incident status via multiple channels that include Senior Business / Technology executives
  • 5+ years’ hands on experience in large scale IT operations and distributed systems
  • Degree in Computer Science, Engineering, or equivalent experience.
  • ITIL V3/v4 Foundation Cert. in ITSM an asset
  • Experience with ITSM tools (ServiceNow, a plus) with strong understanding of SRE and service management principles
  • Hands on experience with engineering operation tools such as Dynatrace, Splunk, Stackdriver, Graphana, LaunchDarkly..etc.
  • Good understanding of hybrid environment architecture with emphasis on GCP/GKE and Azure/PCF
  • Well-rounded broad knowledge of on prem infrastructure and operating systems
  • Advanced understanding of SOA or microservices architecture concepts, in relation to digital platforms
  • Advanced understanding of continuous integration systems and toolsets
  • Experience working in an Agile environment
  • Strong organizational skills and the ability to effectively manage multiple tasks simultaneously
  • Capable of working in a complex and fast paced environment
  • Ability to maintain calm during stressful situations


What's in it for you?


  • We have an inclusive and collaborative working environment that encourages creativity, curiosity and celebrates success!
  • Dress codes don't apply here; being comfortable does
  • We provide you with the tools and technology needed to create meaningful customer experiences.
  • Onsite cafeteria for when you work onsite.
  • We offer a competitive total rewards package that includes a base salary, a performance bonus, company matching programs (on pension & profit sharing), generous vacation, personal & sick days, personal development funding, maternity leave top-up, parental leave, and more.
  • Access to thousands of online and in-person courses so you can hone your current skills or learn new ones.


*Some of our perks & onsite offerings will be offline as we continue to monitor federal and provincial regulations around COVID-19.


Working conditions: Hybrid/Remote




Location(s):  Canada : British Columbia : Vancouver || Canada : Ontario : Toronto 

Scotiabank is a leading bank in the Americas. Guided by our purpose: "for every future", we help our customers, their families and their communities achieve success through a broad range of advice, products and services, including personal and commercial banking, wealth management and private banking, corporate and investment banking, and capital markets.  

At Scotiabank, we value the unique skills and experiences each individual brings to the Bank, and are committed to creating and maintaining an inclusive and accessible environment for everyone. If you require accommodation (including, but not limited to, an accessible interview site, alternate format documents, ASL Interpreter, or Assistive Technology) during the recruitment and selection process, please let our Recruitment team know. If you require technical assistance, please click here. Candidates must apply directly online to be considered for this role. We thank all applicants for their interest in a career at Scotiabank; however, only those candidates who are selected for an interview will be contacted.