31 Aug 2023 19:00 - 20:00
datacraft –
13 Rue des Arquebusiers
75003 Paris, France

STATE-OF-THE-ART – Frugal AI: Knowledge Extraction for Species Description

This workshop will be animated by :

  • Maya Sahraoui, PhD student at Sorbonne University, ISIR and MNHN

Workshop overview
Join us for a presentation of the latest research work of Maya, on the potential of frugal AI.

With the constraint of few annotated data, Maya has adapted a promising self-training method, that is the teacher-student architecture, to confidently propagate pertinent annotations to new unlabelled data. And she devised a comprehensive test protocol to assess the annotation. What she will present can be applied in many fields (marketing, maintenance…) where you have to create a knowledge base from a few annotated datasets.

For more details:
Maya focuses on elevating knowledge extraction models for analyzing biological species descriptions. Her research introduces a distantly supervised model for Named Entity Recognition (NER) and outlines a robust protocol for constructing knowledge graphs from entity labeling.
To ensure rigorous evaluation, she has meticulously devised a comprehensive test protocol consisting of two datasets. The first dataset includes entities encountered during training, while the second comprises entirely new entities, challenging the limits of her models.
Throughout her investigation, she encountered two significant scientific challenges: specificity of vocabulary and turn-of-phrase, as well as missing annotations. She is excited to share how she tackled these hurdles by proposing a language model pre-training technique to enhance NER precision on both datasets. Moreover, she will demonstrate the efficacy of our teacher-student architecture, formulated as self-training, which achieved remarkable recall on both test sets.
Her findings shed light on the indispensable role of recent language models in deciphering complex and specialized texts. The implications are far-reaching, benefiting researchers in species diversity and evolution, and offering potential applications in comparative morphology and biodiversity informatics.


