Benjamin Muller

PhD student in Natural Language Processing


Sorbonne Université

I am a third-year PhD student at Sorbonne Université and INRIA Paris in the Almanach research team. My research focuses on understanding the behavior of large scale language models and applying them efficiently in the multilingual context. I have interned at Apple AI/ML and Amazon Alexa AI. I am also the main instructor for the Machine Learning for NLP course at ENSAE Paris.


  • Natural Language Processing
  • Deep Learning


  • PhD in Machine Learning applied to Natural Language Processing, 2022

    INRIA & Sorbonne Université

  • MSc Data Science, 2016

    Ecole Polytechnique, ENSAE, Télécom Paris

  • BSc in Statistics, 2014



(2021). Cross-Lingual GENQA: A Language-Agnostic Generative Question Answering Approach for Open-Domain Question Answering. arxiv.


(2021). When Being Unseen from mBERT is just the Beginning: Handling New Languages With Multilingual Language Models. NAACL.


(2021). First Align, then Predict: Understanding the Cross-Lingual Ability of Multilingual BERT. EACL.


(2020). CamemBERT: a Tasty French Language Model. ACL.

PDF Project

(2019). Enhancing BERT for Lexical Normalization. WNUT.



2019-2022 - Lecturer and Main Instructor at ENSAE Paris for the Machine Learning for Natural Language Processing course.

2018-2019 - Teaching assistant for Master students in Initiation to Research in Statistics at University Paris-Descartes

2014-2016 - Teaching assistant for Bachelor students in Mathematics, Prépa ECS at Lycée Ipécom, Paris


2022 - Mentor for the Fatima Fellowship program. Mentoring 3 students in analysing cultural biases in multilingual language models.

2018-2022 - Supervising MSc students’ research projects. Focusing on implementation, evaluation and analysis of static Word Embedding techniques and Language Modeling.

2019 - Supervising internship on Domain Adaptation for Non-Canonical Data (6 months)