Professional case study

High-Availability Game Server Platform Architecture

A cloud-native foundation that absorbs launch-day traffic for a small team to run

SZ Code Lab

ASP.NET Core
Node.js
TypeScript
Docker
Kubernetes (EKS)
AWS
Terraform
Photon Engine
CloudWatch
Splunk
ELK

Availability: Zero-downtime rolling updates
Traffic: HPA-driven pod scale-out for spikes
Environments: Dev · Staging · Prod (IaC)
Ops team: Operated by a small engineering team

Challenge

Two projects arrived around the same time with very different server-side asks.

City of Holdem — Unify the client and server so feature developers could write cross-platform content logic in one language.
Golden Mango Casino — Survive global-launch and marketing-campaign traffic spikes, and stay manageable by a small engineering team.

Solution

City of Holdem — a unified client/server workflow

Prototyping: Abstracted the network module behind an Adapter, started development against a mock server, and moved to Photon Engine for real-time multiplayer prototyping.
Live architecture: Rewrote the server side in ASP.NET Core so client and server share the same C# content logic. The same engineer can implement a feature end-to-end without context-switching languages.

Golden Mango Casino — cloud-native infrastructure

graph TD Terraform["Terraform (IaC)
VPC · EKS · IAM · ECR
Git-tracked, version-pinned"] AWS["AWS
VPC / EKS / ELB / CloudWatch"] EKS["Kubernetes (EKS)
Rolling updates · HPA"] Pods["Game API Pods
Node.js / TypeScript
Dockerized"] Telemetry["Telemetry
CloudWatch + Splunk + ELK"] Envs["Environments
Dev / Staging / Prod"] Terraform --> AWS Terraform --> Envs AWS --> EKS EKS --> Pods Pods --> Telemetry Envs --> EKS

Containerization — TypeScript/Node.js API servers packaged into Docker images so the runtime environment was identical from a dev laptop to production.
IaC with Terraform — VPC, EKS, IAM, ECR — everything declared as code. Differences between Dev / Staging / Prod live as variables only, which kills the silent environment drift that normally rots multi-env setups.
Orchestration on Kubernetes — Distribution, scaling, recovery delegated to K8s. Predictable spikes (marketing campaigns) ride on HPA, and rolling updates keep deploys zero-downtime.
Telemetry layered intentionally — CloudWatch + Splunk for infra load, ELK for application/logic errors. Each channel watches what it should watch.

Achievements

Zero-downtime live updates and elastic traffic handling. Kubernetes rolling updates plus HPA absorbed major marketing-driven spikes with no service interruptions.
A small team running a complex cloud. Terraform + IaC discipline made the entire ecosystem manageable, extendable, and reproducible by a small engineering team.
Faster feature cycles thanks to client/server unification. In City of Holdem, a content change is one flow on one language instead of two parallel PRs in two stacks.

This case touches operational data, so I don’t share code excerpts or log captures publicly. The summary stays at the structural and decision level; concrete numbers and trade-offs are best discussed in an interview.

← Back to portfolio