Share this Job

Please be advised that our Careers site will be unavailable from November 28 at 12am ET to November 29 12am ET for scheduled system maintenance.

Title:  Site Reliability Engineer, Scotia Digital

 

 

 

Requisition ID: 151741

Join a purpose driven winning team, committed to results, in an inclusive and high-performing culture.

 

Digital Engineering Operations SRE team comprises Site Reliability Engineers and Software Developers to improve Scotia Digital production services' availability, scalability, performance, and reliability. The team proactively looks for ways to improve application monitoring, address production issues and investigate and assist with customer inquiries.

 

 Is this role right for you?

 

Are you passionate about improving automation and ensuring the resiliency of technology? Do you get your energy by providing technology solutions working with a team? We are currently seeking a Site Reliability Engineer who is curious and drives insights from massive-scale data in real-time. Specifically, we are searching for someone who brings fresh ideas, demonstrates a unique and informed viewpoint, and enjoys collaborating with a cross-functional team to investigate and assist with resolving recurring and major issues and help improve the performance of our supported applications. This role requires 24/7 on-call rotation.

 

  • You will run the production environment by monitoring availability and taking a holistic view of system health.
  • You will improve our suite of software solutions' reliability, quality, and time-to-market.
  • Measure and optimize system performance to push our capabilities forward, get ahead of customer needs, and innovate to improve continually.
  • You will provide primary operational support and engineering for multiple large, distributed software applications.
  • Participate in defining SLIs, SLOs and SLAs for Enterprise Systems.
  • Gather and analyze metrics from both applications and infrastructure to assist in performance tuning and fault finding
  • Partner with development teams to improve services through rigorous testing and release procedures.
  • Participate in system design consulting, release management, and capacity planning.
  • Create sustainable systems and services through automation and process improvements.
  • Balance feature development speed and reliability with well-defined service level objectives.
  • Monitor multiple application health and discover opportunities to optimize in a continuously growing large complex hybrid environment.
  • Lead on-call problem escalation and outage recovery effort, not limited to code fixes in presentation and integration layer, but also provide infrastructure level investigation and support where necessary.
  • Lead post-incident technical retrospect to discover and implement remediation actions.
  • You will be part of a 24/7 on-call rotation and support multiple applications and occasional weekend releases.
  • You will perform troubleshooting, deploy systems or execute maintenance tasks as necessary to meet the specified SLOs.

 

  Do you have the skills that will enable you to succeed in this role?

 

  • Be self-motivated, autonomous and a team player in a fast-paced environment. 
  • Good understanding of Networking concepts: TCP/IP, DNS, HTTP, TLS, OSI Model.
  • Good understanding of multi-tier applications.
  • Working knowledge of one or more programming languages (Java, NodeJS, Python, etc.).
  • Basic knowledge of one or more scripting languages (Python, Bash, etc.).
  • 1-2 years of experience in developing and/or supporting complex, large-scale customer-facing platforms.
  • Proficiency with fundamental front-end stack: HTML, CSS and JavaScript.
  • Strong working experience with incident management and setting up monitoring alerts.
  • Have a proficient understanding of code versioning tools, such as Git/Bitbucket.
  • Knowledge about building a highly automated production monitoring and support model, hands-on experience integrating Splunk, Dynatrace, StackDriver, ThousandEyes, PagerDuty.com, or equivalents.
  • Proven ability to translate ideas into technical and business realities and map technology to business problems.
  • Experience with private/public cloud services and platforms.
  • Superior verbal and written communication skills with the ability to influence decision-making with stakeholders.
  • A proactive approach to spotting problems, areas for improvement, and performance bottlenecks.
  • Exceptional written and verbal communication skills
  • Excellent problem-solving skills
  • Flexible approach to work and the ability to adapt to change
  • Prior production support or SRE experience.
  • Proficient with MS suite

 

Nice to have:

 

  • Experience working with scalable containerized systems in the public cloud (Azure and GCP).
  • Experience with Docker (or other container runtimes) and Kubernetes.
  • Experience in building public and internal REST APIs.
  • Experience with CI/CD tools such as Jenkins.
  • Experience working with database technology such as Sybase, Oracle, and MongoDB.
  • Experience with the Atlassian tools (Bitbucket, JIRA, Confluence).

 

What's in it for you?

 

  • We have an inclusive and collaborative working environment that encourages creativity, curiosity and celebrates success!
  • Dress codes don't apply here; being comfortable does
  • We provide you with the tools and technology needed to create meaningful customer experiences.
  • Onsite cafeteria for when you work onsite.
  • We offer a competitive total rewards package that includes a base salary, a performance bonus, company matching programs (on pension & profit sharing), generous vacation, personal & sick days, personal development funding, maternity leave top-up, parental leave, and more.
  • Access to thousands of online and in-person courses so you can hone your current skills or learn new ones.

 

Working Arrangement: Remote / Hybrid

 

*Some of our perks & onsite offerings will be offline as we continue to monitor federal and provincial regulations around COVID-19.

 

Location(s):  Canada : Ontario : Toronto 

Scotiabank is a leading bank in the Americas. Guided by our purpose: "for every future", we help our customers, their families and their communities achieve success through a broad range of advice, products and services, including personal and commercial banking, wealth management and private banking, corporate and investment banking, and capital markets.  

At Scotiabank, we value the unique skills and experiences each individual brings to the Bank, and are committed to creating and maintaining an inclusive and accessible environment for everyone. If you require accommodation (including, but not limited to, an accessible interview site, alternate format documents, ASL Interpreter, or Assistive Technology) during the recruitment and selection process, please let our Recruitment team know. If you require technical assistance, please click here. Candidates must apply directly online to be considered for this role. We thank all applicants for their interest in a career at Scotiabank; however, only those candidates who are selected for an interview will be contacted.


Job Segment: Test Engineer, Cloud, Testing, Web Design, Sustainability, Engineering, Technology, Creative, Energy