Unsupervised Learning for Handling Code-Mixed Data: A Case Study on POS Tagging of North-African Arabizi Dialect

Abstract

Language model pretrained representation are now ubiquitous in Natural Language Processing. In this work, we present some first results in adapting those models to Out-of-Domain textual data. Using Part-of-Speech tagging as our case study, we analyze the ability of BERT to model a complex North-African Dialect NArabizi.

Publication
EurNLP
Avatar
Benjamin Muller
Researcher in Natural Language Processing