Senior Consultant Specialist (Model Hosting/Inference Optimization)
Location
Guangzhou, Guangdong
Job Type
Full-time
Category
Computer Occupations
Posted
June 17, 2026
Some careers have more impact than others.
If you’re looking for a career where you can make a real impression, join HSBC and discover how valued you’ll be.
We are currently seeking an experienced professional to join our team in the role of Senior Consultant Specialist.
Business: CTO
Location: Guangzhou
Job ID: 48324
Principal responsibilities
- Design, build, and operate scalable, reliable model hosting platforms for LLMs, embeddings, and STT/TTS across heterogeneous hardware.
- Drive inference optimisation for latency, throughput, and cost (quantisation, KV-cache optimisation, dynamic/continuous batching).
- Evaluate, integrate, and tailor inference frameworks (e.g., vLLM, TensorRT-LLM, SGLang) to maximise performance on target hardware.
- Own inference health and performance monitoring: lat...