Senior Systems Reliability Engineer
Fecha: 5 jun 2025
Ubicación: Lima, LIM, PE
Empresa: Scotiabank
ID de la solicitud: 227737
Gracias por tu interés en ser parte de Scotiabank Perú, apreciamos tu postulación. Estamos en la búsqueda de personas con talento que quieran crecer y lograr los objetivos de nuestra organización. ¡Te deseamos mucho éxito dentro de este proceso!
Senior Systems Reliability Engineer
-Business Line: Operaciones & Tecnología
-Unit: SRO
-Nivel: 7.2
-Tipo de Contrato: (Indefinido)
-Fecha final de recepción de cvs: 15 de junio
Misión:
As a Senior SRE in the Global Systems Reliability team, the Senior Systems Reliability Engineer will oversee the SRE (Site Reliability Engineering). This role involves working collaboratively with management, peers, and business partners to enhance the stability, reliability, and efficiency of our global systems. The focus will be on applying SRE/DevOps principles and practices to drive continuous improvements in people, processes, and technology, supporting a rapidly evolving technology product portfolio.
¿Qué esperamos de ti?
*Degree in Computer Science, Engineering, or equivalent experience
*6 years’ experience in IT
*2-3 years professional coding experience in one or more of the following: C, C++, Java would be asset.
*Mastery of one or more scripting languages for automating systems, e.g. Bash, Python, Ansible would be asset.
*ITIL V3 Foundation Cert. in ITSM
*Experience with ITSM tools (ServiceNow, a plus) with strong understanding of SRE and service management principles
*Well-rounded broad knowledge of OS platforms (Linux/UNIX), Networking, Web Systems and IT Ops
*Experience working with large-scale distributed systems understanding of SOA or microservices architecture, using Jenkins, Bamboo or other CI tolos, Advanced experience with GCP/AWS services Understanding of serverless architecture (Lamda) and IaaS.
*Understanding of serverless architecture (Lamda) and IaaS, data structures, algorithms, best practices and containerization using Docker or similar
¿A qué retos te enfrentarás?
*Develop and implement system reliability strategies aligned with corporate objectives and SRE principles. Craft and Execute the technical implementation of comprehensive strategies aimed at improving the reliability and performance of critical systems. Collaborate with various teams to integrate these strategies into daily operations, continuously refining them based on performance metrics and evolving business needs.
*Proactive management of high-priority incidents and problems (classified as 411/911). Take a senior role in managing and resolving the technical resolution of major incidents and problems impacting the organization. Coordinate with relevant stakeholders, ensure timely resolution, and conduct thorough Post-Mortem analysis to understand root causes, implement corrective measures to prevent recurrence, and ensure accurate documentation in the ServiceNow platform.
*Define, monitor, and optimize Service Level Objectives (SLOs) and Service Level Indicators (SLIs). Establish and manage SLOs and SLIs to maintain high service quality. Define these metrics, monitor performance closely, and adjust as needed to optimize system performance. Ensure service delivery consistently meets or exceeds defined standards using data and trend analysis to guide improvements.
*Oversee key performance indicators (KPIs) such as Mean Time to Recovery (MTTR), the timely closure of problem tickets, playbooks published and the impact of changes.
*Guarantee efficient and effective operations within respective areas. Execute technical operations are conducted efficiently and effectively, adhering to established business controls and regulatory requirements. Address operational risks and ensure compliance with anti-money laundering and counter-terrorism financing regulations.
*Design and develop advanced data analysis and visualizations. Creation of sophisticated data analysis and visualizations to provide actionable insights into system performance and reliability. Leverage data to identify trends, understand system behaviors, and drive continuous improvements, informing strategic decisions and optimizing business performance.
*Provide essential information to support areas regarding major incidents or root cause analysis reports. Ensure key stakeholders receive timely and accurate information about major incidents and root cause analysis. Prepare and technical incident reports, participate in Major Problem Review (MPR) sessions, and review Post Mortem reports to ensure compliance with regulatory requirements.
*Regularly update Playbooks with detailed information. Maintain and update Playbooks with comprehensive details on system architecture, functionalities, availability schedules, and escalation procedures. Ensure updates are reflected in the Application Portfolio Management (APM) system.
*Attend and contribute to key committees such as Risks (NIRA), Architecture (ARB), Demand Management, Operational Readiness (OR), and Digital Planning Week. Participate in these committees to align system reliability strategies with broader organizational goals. Provide technical insights and recommendations to support decision-making and strategic planning.
*Actively participate in daily escalation sessions with GTEP. Engage in daily escalation sessions to address recent impacts and coordinate responses. Collaborate with global and regional teams to provide updates, discuss ongoing issues, and ensure effective resolution of critical incidents.
*Participate in regional SRO committees to establish and execute LATAM directives for the Systems Reliability Office. Execute directives for system reliability across LATAM. Support the execution of new guidelines and practices to establish consistent standards and procedures.
*Address inquiries from internal and external auditors, as well as regulatory inspections. Manage and provide accurate and timely information to facilitate audits and inspections related to IT applications and systems.
*Evaluate local and global changes to maintain system stability and reliability. Assess proposed changes at local and global levels to ensure system stability. Implement measures to manage risks and maintain system reliability.
Ubicación(s): Perú : Lima : San Isidro
Agradecemos tu interés, sin embargo, únicamente los candidatos/as seleccionados para entrevista serán contactados.
** Scotiabank Peru es una empresa incluyente, que respeta la diversidad y no hace ningún tipo de discriminación.
Área de trabajo:
Compliance, Data Analyst, Computer Science, Financial Analyst, Strategic Planning, Legal, Data, Finance, Strategy, Technology