Senior Consultant Specialist (Model Hosting/Inference Optimization)

HSBC Global Services Limited • Guangzhou, Guangdong, China • Posted June 17, 2026

Location Guangzhou, Guangdong
Job Type Full-time
Category Computer Occupations
Posted June 17, 2026

Some careers have more impact than others.

If you’re looking for a career where you can make a real impression, join HSBC and discover how valued you’ll be.

 

We are currently seeking an experienced professional to join our team in the role of Senior Consultant Specialist.

 

Business: CTO

Location: Guangzhou

Job ID: 48324

 

Principal responsibilities

  • Design, build, and operate scalable, reliable model hosting platforms for LLMs, embeddings, and STT/TTS across heterogeneous hardware. 
  • Drive inference optimisation for latency, throughput, and cost (quantisation, KV-cache optimisation, dynamic/continuous batching). 
  • Evaluate, integrate, and tailor inference frameworks (e.g., vLLM, TensorRT-LLM, SGLang) to maximise performance on target hardware. 
  • Own inference health and performance monitoring: lat...

Interested in this role?

Click the button below to start your application.

Apply Now