High-Availability Game Server Platform Architecture
Professional case study
High-Availability Game Server Platform Architecture
A cloud-native foundation that absorbs launch-day traffic for a small team to run
- SZ Code Lab
- ASP.NET Core
- Node.js
- TypeScript
- Docker
- Kubernetes (EKS)
- AWS
- Terraform
- Photon Engine
- CloudWatch
- Splunk
- ELK
- Availability
- Zero-downtime rolling updates
- Traffic
- HPA-driven pod scale-out for spikes
- Environments
- Dev · Staging · Prod (IaC)
- Ops team
- Operated by a small engineering team
Challenge
Two projects arrived around the same time with very different server-side asks.
- City of Holdem — Unify the client and server so feature developers could write cross-platform content logic in one language.
- Golden Mango Casino — Survive global-launch and marketing-campaign traffic spikes, and stay manageable by a small engineering team.
Solution
City of Holdem — a unified client/server workflow
- Prototyping: Abstracted the network module behind an Adapter, started development against a mock server, and moved to Photon Engine for real-time multiplayer prototyping.
- Live architecture: Rewrote the server side in ASP.NET Core so client and server share the same C# content logic. The same engineer can implement a feature end-to-end without context-switching languages.
Golden Mango Casino — cloud-native infrastructure
graph TD
Terraform["Terraform (IaC)
VPC · EKS · IAM · ECR
Git-tracked, version-pinned"] AWS["AWS
VPC / EKS / ELB / CloudWatch"] EKS["Kubernetes (EKS)
Rolling updates · HPA"] Pods["Game API Pods
Node.js / TypeScript
Dockerized"] Telemetry["Telemetry
CloudWatch + Splunk + ELK"] Envs["Environments
Dev / Staging / Prod"] Terraform --> AWS Terraform --> Envs AWS --> EKS EKS --> Pods Pods --> Telemetry Envs --> EKS
VPC · EKS · IAM · ECR
Git-tracked, version-pinned"] AWS["AWS
VPC / EKS / ELB / CloudWatch"] EKS["Kubernetes (EKS)
Rolling updates · HPA"] Pods["Game API Pods
Node.js / TypeScript
Dockerized"] Telemetry["Telemetry
CloudWatch + Splunk + ELK"] Envs["Environments
Dev / Staging / Prod"] Terraform --> AWS Terraform --> Envs AWS --> EKS EKS --> Pods Pods --> Telemetry Envs --> EKS
- Containerization — TypeScript/Node.js API servers packaged into Docker images so the runtime environment was identical from a dev laptop to production.
- IaC with Terraform — VPC, EKS, IAM, ECR — everything declared as code. Differences between Dev / Staging / Prod live as variables only, which kills the silent environment drift that normally rots multi-env setups.
- Orchestration on Kubernetes — Distribution, scaling, recovery delegated to K8s. Predictable spikes (marketing campaigns) ride on HPA, and rolling updates keep deploys zero-downtime.
- Telemetry layered intentionally — CloudWatch + Splunk for infra load, ELK for application/logic errors. Each channel watches what it should watch.
Achievements
- Zero-downtime live updates and elastic traffic handling. Kubernetes rolling updates plus HPA absorbed major marketing-driven spikes with no service interruptions.
- A small team running a complex cloud. Terraform + IaC discipline made the entire ecosystem manageable, extendable, and reproducible by a small engineering team.
- Faster feature cycles thanks to client/server unification. In City of Holdem, a content change is one flow on one language instead of two parallel PRs in two stacks.
This case touches operational data, so I don’t share code excerpts or log captures publicly. The summary stays at the structural and decision level; concrete numbers and trade-offs are best discussed in an interview.