As a System Monitoring Engineer, you will:
KRA 1: System and Infrastructure Monitoring (50%)
- Take ownership of monitoring the system health of the INFRA hosting customer-facing environments, including infrastructure health, application performance, and transaction queues across platforms.
- Ensure continuous 24/7 monitoring to maintain system availability, meet service level agreements (SLAs), and minimize downtime.
- Utilize monitoring tools to identify anomalies, performance degradation, and outages in real time, ensuring high availability and reliability.
- Proactively seek support from other teams to address and resolve complex issues that cannot be solved by the GM team.
- Coordinate and follow up on the entire problem lifecycle, from problem identification to problem resolution
KRA 2: Incident Response and Resolution (25%)
- Follow established runbooks to address common incidents and system issues.
- Escalate complex technical issues to the appropriate teams in accordance with defined protocols.
- Contribute to incident documentation by collecting all relevant details, including time of occurrence, impact, and actions taken.
- Participate in shift handovers to ensure continuous coverage and awareness of ongoing issues.
- Support post-incident analysis efforts to help prevent future occurrences.
- Follow communication protocols when incidents occur, ensuring timely updates to stakeholders.
- Adhere to defined SLAs for incident response and communication.
KRA 3: Reporting and Team Support (25%)
- Assist in collecting data for weekly and monthly uptime reports.
- Help compile data on incidents, including resolution times and impact, for leadership review.
- Collaborate closely with the System Operation team to stay informed on system updates, such as server changes, service status, deployments, maintenance, and migrations.
- Support the development and refinement of monitoring runbooks based on operational experience.
- Provide feedback on monitoring tools and processes to improve effectiveness.
- Contribute to knowledge transfer within the team to build collective expertise.
- Participate in knowledge-sharing sessions to enhance technical skills.
- Gain a deeper understanding of the monitoring program’s scope and future expansion plans.
Requirements for Success:
- Bachelor’s degree in Information Technology or a related field.
- 1–3 years of experience in the IT industry.
- Basic understanding of IT infrastructure and monitoring concepts. (Experience with CIP-TA or the Foundry platform is highly preferred.)
- Willingness to learn troubleshooting techniques and system monitoring practices.
- Familiarity with documentation standards and strong attention to detail.
- Basic analytical skills to support issue identification and resolution.
- Fluency in Vietnamese and English to collaborate effectively with a global team and direct resources.
- Ability to work in a shift-based environment, including evenings, nights, or weekends as required.
- Adaptable, eager to learn, and demonstrates a team-oriented attitude.
- Working Time: rotate between Morning (7AM – 4PM), Swing (1PM – 10PM) and Night (10PM – 7AM) shifts to cover 24/7 365 services with the flexibility to support a global organization.
- Work location: Helios Building, 5th Floor, Quang Trung Software, Dist. 12, Ho Chi Minh City.
Experience
Required- 1 - 3 years: 1–3 years of experience in the IT industry
Report job