Times are given in Irish Standard Time (IST), i.e., either UTC+0 or UTC+1 (Daylight Saving).

Loading Events

« All Events

  • This event has passed.

Teanga NLP Seminar Series #1 – Dr Jonathan Dunn (University of Illinois Urbana-Champaign)

February 22 @ 5:00 pm 6:00 pm GMT

Linguistic Diversity in the Digital World: From Perception to Production

The Unit for Linguistic Data at the Insight SFI Reseach Centre for Data Analytics / Data Science Institute, University of Galway, is delighted to welcome Dr Jonathan Dunn, an associate professor at the University of Illinois Urbana-Champaign as the first speaker of our seminar series. In this talk, he will speak about the linguistic diversity on the internet.

Abstract

This talk looks at linguistic diversity on the internet, where linguistic diversity covers both (i) the languages that are used and (ii) the internal variants or dialectal features that are used. First, we start with data-driven language mapping to see where digital corpora come from. Second, we turn to computational dialectology to see what this source of global-scale observations can tell us about variation within languages. Third, we consider whether a geographic skew in training data has a downstream impact on pre-trained LLMs. And fourth, we end by asking whether machine-assisted production in digital contexts serves to erase variants and level dialects.

About the speaker

Dr Jonathan Dunn is an associate professor at the University of Illinois Urbana-Champaign. Previously, we worked as senior lecturers at the University of Canterbury and Leader ofLanguage Technology Theme at the New Zealand Institute for Language, Brain and Behaviour. He was also a visiting scientist at the National Geospatial-Intelligence Agency. Dr Dunn is a computational linguist whose research interest is to use data science to model the emergence of grammatical structure and variation in grammatical structure using large multi-lingual corpora. His recent work focuses on the impacts of linguistic variation on NLP models and low-resource contexts. He has published over 35 papers, and Cambridge University Press has published his first book. His interdisciplinary teaching experience includes a MOOC that has taught over 14,000 students about NLP.

About the host

The seminar series is led by the Teanga project team. The Tenga project aims to provide an NLP platform for minority and historical languages. It would provide a single interface and data model for NLP data and services using Python. The team aims to conduct the quarterly seminar series to connect researchers working to alleviate challenges around language resources and technologies for minority, historical, indigenous and lesser-resourced languages across the globe. The seminar series will provide us with a platform to discuss various types of problems and share our views to solve problems that researchers face during their research.