I’m an engineer who aspires to be a scientist. I work on multimodal AI, with a strong focus on vision-language models, speech systems, and efficient on-device inference.
I currently work at Hugging Face, where I lead our multimodal research and contribute to projects spanning:
- Vision-Language Models (VLMs)
- Speech-to-speech and conversational systems
- Multimodal research with an emphasis on efficiency and real-world deployment
- Robotics-facing AI systems
I enjoy building things that are both technically solid and actually usable, from research code to demos and production-ready tools.
- Research prototypes and experimental ideas
- Open-source tools and demos
- Work around multimodal models, audio, and vision
- Occasional side projects
- PhD in applied machine learning (speech and generative models)
- Former senior ML engineer at Unity
- Interested in small, fast, and well-engineered models
Feel free to explore, fork, or reach out.





