Site Reliability Engineer (SRE)

Full Time

Ha Noi

Closed

With ~200 employees, OpenCommerce Group is a leading technology organization that offers e-commerce products with offices in China, and Hanoi. We are fortunate to have a team that can target vast and vibrant markets, with the benefit of ten years in the eCommerce industry and over one million online sales sector customers around the world. most nations in the world, such as the United States and China. The aim of the OpenCommerce Group is to create a product ecosystem to facilitate and improve e-commerce in general, as well as cross-border commerce in particular, and to serve as a launching pad for entrepreneurs. Starts expands and performs in the online world. We're expanding rapidly, and we're searching for top talent to help us create a global Commerce Community. Join OpenCommerce Group to expand the scope of your work.

JOB DESCRIPTION

Manage and improve system reliability through SLO, SLI, and SLA practices.
Design and implement observability systems (metrics, logs, tracing, alerting) using tools like Prometheus, Grafana, ELK, etc.
Build and automate CI/CD pipelines and Infrastructure as Code (IaC) using tools such as Terraform, Ansible, Pulumi, Helm.
Collaborate in the analysis, design, and deployment of systems and processes to ensure reliability, observability, and scalability.
Optimize system cost, performance (latency, throughput), and security.
Operate and optimize Kubernetes clusters (EKS); strong knowledge of Docker, Kubernetes, Helm is required.
Develop internal tools to automate workflows and support other teams.
Participate in incident response, root cause analysis, postmortem reviews, and improve incident handling processes.
Support and coordinate with NOC (Network Operation Center) teams.
Be part of the on-call rotation when needed.

REQUIREMENT

2–5 years of experience in SRE / DevOps / Platform Engineering.
Hands-on experience with monitoring and alerting systems (Prometheus, Grafana, ELK, Loki, etc.).
Proficient in CI/CD tools (GitLab CI, Jenkins) and familiar with Git workflows.
Experience in deploying and managing Kubernetes (EKS is a plus).
Understanding of gRPC, and capable of optimizing nginx connections and network stacks.
Strong Linux background with deep knowledge of kernel, network stack, file system, and processes.
Excellent troubleshooting skills — able to analyze issues from OS to application layer.
System-thinking mindset, focus on automation, and ability to mentor teammates.
Proactive, responsible, and able to work under pressure during incident response.

Nice to Have

Experience with AWS (EKS, EC2, RDS, CloudWatch).
Strong understanding of networking concepts (TCP/IP, DNS, Load Balancing, CDN).
Experience with high availability and distributed systems.
Previously built a complete observability stack.
Experience in building or optimizing Golang SDKs or internal frameworks.
Knowledge of cloud-native networking (CNI, overlay, BGP, eBPF-based load balancing).

BENEFITS

You'll find this place irresistibleEnjoy top-tier compensation, including:Compensation & Rewards

Competitive monthly NET salary, transparent and fully take-home
up to 16 months’ salary per year, including a 13th-month salary, quarterly incentives, and annual performance bonuses.

Work Flexibility & Time Off

24 remote working days per year, enabling a healthy work–life balance
12 days of paid annual leave, in addition to public holidays
Flexible working hours, Monday to Friday – weekends are fully yours

Well-being & Employee Care

Annual health check-ups
Full social insurance coverage (BHXH) in compliance with Vietnamese labor regulations
Company-sponsored sports clubs to support both physical and mental well-being
Regular company trips and team bonding activities

Career Growth & Work Environment

Be part of a fast-growing global B2B SaaS organization
Clear and accelerated career development and promotion pathways
Collaborate with talented, diverse, and high-performing teams across regions
Work in a modern, open, and empowering environment where individuality is respected and potential is nurtured

We are not just building products — we are building a workplace where people can grow, perform at their best, and create long-term impact.