Please be advised that our Careers site will be unavailable from November 28 at 12am ET to November 29 12am ET for scheduled system maintenance.

Title:  Site Reliability Engineer




Requisition ID: 186406

Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.


Digital Engineering Operations SRE team comprises Site Reliability Engineers and Software Developers to improve Scotia Digital production services' availability, scalability, performance, and reliability. The team proactively looks for ways to improve application monitoring, address production issues and investigate and assist with customer inquiries.


What will you do?


  • Provide operational technical triage for Storefront and Marketing technology applications which includes:
    • Acknowledge and respond back to incident ticket based on Storefront and Marketing technology service level agreement.
    • Initial incident triage and troubleshooting
    • recreation of production issue for troubleshooting and problem determination
    • perform necessary technical steps in collaboration with stakeholder teams to figure out root cause of production issue
    • engage other support teams and/or technical development teams to resolve production issue
    • be the point of contact for support communications; keeping stakeholders up to date on accurate and up-to-date information by way of email, call bridge, slack, MS Teams or other communication methods.
    • reassign tickets to appropriate teams/queues
  • monitor the mailbox and service support requests from customers
  • provide on-going operational support on Storefront and Marketing technology applications:
    • SSL/TLS x.509 digital certificates lifecycle management for Storefront and Marketing technology in-scope applications and web sites
    • perform production validation of integrated dependency systems in production
  • monitor, troubleshoot, and resolve production issues of GMT applications using Scotiabank-approved tools (e.g., Slack,, Thousand Eyes, GEMS, Splunk, Dynatrace, etc.)
  • ensure all required access to perform expected in-scope operational support has been acquired. This includes all regular access renewal requests are successfully completed before expiry
  • lead and deliver post-incident-report (PIR) on production operational incidents when required
  • define and deliver documentation on operational processes for applications within the scope of Storefront and Marketing technology
  • one operational support person must be on the production incident bridge at all times until issue has been resolved
  • provide business hours (9am-5pm) operational support
    • facilitate and host emergency calls to application triage and resolve issues as necessary
    • primary Support responsible for system maintenance and issue resolutions to ensure operational health of production Middleware/system according to the team system accountability; AEM, Storefront and Marketing technology Application Infrastructure.
    • postproduction development operational/administrative tasks as provided by development runbook
  • problem management: lead initiatives to perform deep analysis into problem recurring or complex problems
  • lead initiatives to perform analysis to achieve lasting application reliability
  • documentation creation and modification as appropriate to their support team
  • enhancement of steady state operations, alarms, real-time monitoring & operational visibility, and support processes with direction from product owner and Storefront and Marketing technology management team
  • pager duties for 24x7 active pager support
    • expected in rotational basis, carrying a physical device
    • at least one person is on-call at all times
    • SLA of 15 minutes to report and respond to requestor


What do you need to succeed?


  • Be self-motivated, autonomous and a team player in a fast-paced environment. 
  • Good understanding of Networking concepts: TCP/IP, DNS, HTTP, TLS, OSI Model.
  • Good understanding of multi-tier applications.
  • Working knowledge of one or more programming languages (Java, NodeJS, Python, etc.).
  • Basic knowledge of one or more scripting languages (Python, Bash, etc.).
  • 1-2 years of experience in developing and/or supporting complex, large-scale customer-facing platforms.
  • Proficiency with fundamental front-end stack: HTML, CSS and JavaScript.
  • Strong working experience with incident management and setting up monitoring alerts.
  • Have a proficient understanding of code versioning tools, such as Git/Bitbucket.
  • Knowledge about building a highly automated production monitoring and support model, hands-on experience integrating Splunk, Dynatrace, StackDriver, ThousandEyes,, or equivalents.
  • Proven ability to translate ideas into technical and business realities and map technology to business problems.
  • Experience with private/public cloud services and platforms.
  • Superior verbal and written communication skills with the ability to influence decision-making with stakeholders.
  • A proactive approach to spotting problems, areas for improvement, and performance bottlenecks.
  • Exceptional written and verbal communication skills
  • Excellent problem-solving skills
  • Flexible approach to work and the ability to adapt to change
  • Prior production support or SRE experience.
  • Proficient with MS suite


Nice to have:


  • Experience working with scalable containerized systems in the public cloud (Azure and GCP).
  • Experience with Docker (or other container runtimes) and Kubernetes.
  • Experience in building public and internal REST APIs.
  • Experience with CI/CD tools such as Jenkins.
  • Experience working with database technology such as Sybase, Oracle, and MongoDB.
  • Experience with the Atlassian tools (Bitbucket, JIRA, Confluence).


What's in it for you?


  • We have an inclusive and collaborative working environment that encourages creativity and curiosity and celebrates success
  • We provide you with the tools and technology needed to create meaningful customer experiences
  • You’ll get to work with and learn from diverse industry leaders, who have hailed from top technology companies around the world
  • We offer a competitive total rewards package, including a performance bonus, company matching programs (on pension & profit sharing), and generous vacation.




Working location condition: Hybrid 






Location(s):  Canada : Ontario : Toronto 

Scotiabank is a leading bank in the Americas. Guided by our purpose: "for every future", we help our customers, their families and their communities achieve success through a broad range of advice, products and services, including personal and commercial banking, wealth management and private banking, corporate and investment banking, and capital markets.  

At Scotiabank, we value the unique skills and experiences each individual brings to the Bank, and are committed to creating and maintaining an inclusive and accessible environment for everyone. If you require accommodation (including, but not limited to, an accessible interview site, alternate format documents, ASL Interpreter, or Assistive Technology) during the recruitment and selection process, please let our Recruitment team know. If you require technical assistance, please click here. Candidates must apply directly online to be considered for this role. We thank all applicants for their interest in a career at Scotiabank; however, only those candidates who are selected for an interview will be contacted.

Job Segment: Cloud, Web Design, Front End, Investment Banking, Software Engineer, Technology, Creative, Engineering, Finance