Within the span of six short years (2017-2023), the field of Natural Language Processing (NLP) has been deeply transformed by the advances of general-purpose neural architectures, which are both used to learn deep representations for linguistic units and to generate high-quality textual content. These architectures are nowadays ubiquitous in NLP applications; trained at scale, these “large language models” (LLMs) offer multiple services (summarisation, writing aids, translation) in one model through human-like conversations and prompting techniques.
In this project, our aim is to analyse the new state of play from the perspective of machine translation (MT) and ask two main questions:
The consortium is composed of two academic research teams: ISIR/MLIA (Sorbonne-Université and CNRS) and ALMAnaCH (Inria, Paris) and one SME (SYSTRAN).
ISIR is a joint laboratory of Sorbonne-Université and CNRS; within ISIR, the MLIA team conducts research in the field of Statistical Machine Learning (ML) with an emphasis on algorithmic aspects and on applications involving semantic data analysis and modelling complex physical systems.
Inria is the National Institute for Research in Digital Science and Technology. ALMAnaCH is Inria Paris' NLP research team, carrying out research in Natural Language Processing and Computational Humanities.
.
Since its creation in 1968, SYSTRAN has been a pioneer in MT technology. The company is an industry leader and has developed many innovations and innovative solutions commonly used by businesses and the general public over the years. Strongly focused on research and development, SYSTRAN has approximately 100 employees and a turnover of €20M.
The French National Research Agency (ANR) is a public administrative institution under the authority of the French Ministry of Higher Education, Research and Innovation. Project number N° ANR-23-IAS1-0006