Data Scientist with a production engineering side. I work on AI systems that have to run reliably at inference time — not just in notebooks.
Current focus: multimodal AI serving (TTS, OCR, ASR, LLM) on GPU infrastructure.
vLLM-based serving repos with real benchmark data on A100 / H200.
All expose OpenAI-compatible APIs and ship with Docker.
- ASR / TTS: speech systems, streaming inference, latency optimization
- Vision: OCR, face recognition, document AI
- LLM: RAG pipelines, serving infra, OpenAI-compatible APIs
- Data: ClickHouse, PostgreSQL, ETL pipelines, AQI/weather data systems
Python · vLLM · FastAPI · CUDA · Docker · Kubernetes
ClickHouse · PostgreSQL · Java · SQL
