Discovery of multiword expressions, their meaning and their linguistic properties in texts using large language models

The Research unit ATILF (Computer Processing and Analysis of the French Language) offers a postdoctoral position in natural language processing (NLP).

  • Topic: Discovery of multiword expressions, their meaning and their linguistic properties in texts using large language models
  • Location: ATILF, Nancy, France
  • Starting date: from February 2024
  • Duration: 12 months (possibility to extend the duration for one more year)
  • Supervisors: Mathieu Constant (Univ. Lorraine, France) and Agata Savary (Univ. Paris-Saclay, France)
  • Salary: depends on experience after PhD and salary grids, from 3070 (<2-year experience) to 4465 euros (>7-year-experience) before tax
  • Application deadline: 5th December 2023

Subject

The term multiword expression refers to a combination of multiple lexical items that displays irregular composition possibly on different linguistic levels (morphology, syntax, semantics, …). They include a large variety of phenomena such as idioms (run around in circles), support verb constructions (take a walk), nominal compounds (dry run), complex function units (in spite of). They have been the subject of extensive research work in the NLP community over the last 50 years.

The goal of this post-doc position is to investigate new methods for discovering multiword expressions, their meaning and their linguistic properties in texts, in order to enrich an induced semantic lexicon with new multiword entries, definitions, argumental structure, and other properties. The emergence of Large Language Models (LLM) opens new promising perspectives for multiword expressions, not only regarding their semantic compositionality but also their linguistic characterization. The methods will be primarily experimented on French, but other languages are also possible.

Context

The position is part of the SELEXINI project (2022-2026) funded by the French National Research Agency (ANR). The goal of the SELEXINI project is to develop next-generation lexicon induction methods for natural language processing. The induced lexicons will not only cluster word usages according to their senses, but also contain multiword expressions, argumental structure, generated definitions, etc, combining the power of large pre-trained language models and existing lexical resources to address the lack of interpretability and diversity in current language technology. The hired researcher will be fully integrated in the project team.

Requirements

Applicants should hold a PhD thesis in computer science, in applied mathematics, in natural language processing, or in computational linguistics. Applications from PhD students planning their defense by December 31st, 2023 are also welcome. The hired post-doc researcher should have the following skills:

  • expertise in deep learning for NLP and notably large language models
  • excellent programming skills
  • good linguistic skills
  • good knowledge of French would be a plus
  • team spirit

Application

The applicants should submit a cover letter, a CV including their publications, a list of references for recommendation, a transcript of Master grades, on the official web site. The applications should be submitted not later than December 5.