I am an assistant professor in ‘Humanities and AI’ at the wonderful Leiden University Centre for Linguistics, affiliated with the Leiden HumAN (Humanities AI & NLP) group) and SAILS (Society, AI & Life Sciences)!
I investigate language and curiosity, using methods from experimental and computational linguistics.
Ongoing research
- Using questions on social media as a window on our collective curiosity. Does encounting a question on social media, make us more sensitive to subsequent (dis)information that answers it?
- WetSuite, unlocking government/legal data for NLP research, and unlocking NLP for legal researchers.
- Using NLP to predict implicit questions, or ‘questions under discussion’ (continuing from this early attempt).
- Why are disjunctions like “it’s a lion or a mammal” are a bit weird? (Aiming to get this manuscript published at some point.)
- I am supervising excellent BA and MA students writing theses about questions, intonation (focus, rising declaratives), discourse structure, framing and moral judgments.
Teaching
Come study Language and Cognition and/or Computational Linguistics in Leiden! I teach the following courses:
- Python for Linguists 1
- Python for Linguists 2
- Language and logic
- Advanced Semantics
- Seminar BA-thesis Language and Cognition
Tools
Some bare-bones command-line tools to ‘get stuff done’:
- SpaCy-wrap: A wrapper around the spaCy library for Natural Language Processing.
- jsonlined: If you often find yourself extracting values from a
.jsonl
file, doing something to them, and re-inserting them. - sentclass: For classifying sentences along dimensions like concreteness and sentiment.
- sembed: For directly computing sentence embeddings.
- spanbed: For directly computing embeddings for a span in a context.
- cleval: Easily evaluate a classifier, comparing targets vs. predictions.
- rmnl: For removing superfluous line endings when e.g. copying text from PDF.
- strample: Data exploration tool, does stratified sampling to generate HTML tables for manual inspection.
Some datasets:
- TED-talks with the questions they evoke, for research into curiosity and discourse structure – also check the more convenient representation here.
- Dutch intonation – scraped dataset with audio examples with intonation transcriptions, in the ToDI framework.
- ManyNames dataset – crowdsourced image labels, many per image. I collaborated on the English portion.
Selected research papers
Implicit questions and curiosity
-
M. Westera, J. Amidei & L. Mayol (2020). Similarity or deeper understanding? Analyzing the TED-Q dataset of evoked questions. In Proceedings of 28th International Conference on Computational Linguistics (CoLing). 📃
-
M. Westera, L. Mayol & H. Rohde (2020). TED-Q: TED Talks and the Questions they Evoke. In Proceedings of Language Resources and Evaluation Conference (LREC). 📃
-
M. Westera & A. Brasoveanu (2014). Ignorance in context: the interaction of modified numerals and QUDs. In Proceedings of Semantics and Linguistic Theory (SALT) 24. 📃
Semantics and deep learning
-
Westera, M., A. Gupta, G. Boleda & S. Padó (2021). Distributional models of category concepts based on names of category members. Cognitive Science 45 (9). 📃
-
Aina, L., X. Liao, G. Boleda & M. Westera (2021). Does referent predictability affect the choice of referential form? A computational approach using masked coreference resolution. In Proceedings of the 25th Conference on Computational Natural Language Learning. 📃
-
C. Silberer, S. Zarrieß, M. Westera & G. Boleda (2020). Humans Meet Models on Object Naming: A New Dataset and Analysis. In Proceedings of 28th International Conference on Computational Linguistics (CoLing). 📃
-
M. Westera & G. Boleda (2020). A closer look at scalar diversity using contextualized semantic similarity. In Proceedings of Sinn und Bedeutung 24 (SuB). 📃
-
L. Aina, C. Silberer, I. Sorodoc, M. Westera & G. Boleda (2019). What do Entity-Centric Models Learn? Insights from Entity Linking in Multi-Party Dialogue. In Proceedings of NAACL-HLT. 📃
-
M. Westera & G. Boleda (2019). Don’t blame distributional semantics if it can’t do entailment. In Proceedings of the 13th International Conference on Computational Semantics (IWCS). 📃
Attention and indirect communication
-
M. Westera (2022). Attentional Pragmatics: a new pragmatic approach to exhaustivity. Semantics and Pragmatics Vol.15. 📃
-
M. Westera (2022). Alternatives. In D. Altshuler (ed.), Linguistics meets philosophy. 📃 (or the official paywalled version)
-
M. Westera (2019). Implying or implicating not both in declaratives and interrogatives. In Proceedings of Sinn und Bedeutung 24 (SuB). 📃
-
M. Westera (2018). An attention-based explanation for some exhaustivity operators. In Proceedings of Sinn und Bedeutung 21. 📃
-
M. Westera (2017). QUDs, brevity, and the asymmetry of alternatives. In Proceedings of the Amsterdam Colloquium 21. 📃
-
M. Westera (2017). Exhaustivity and intonation: a unified theory. PhD dissertation, Institute for Logic, Language and Computation, University of Amsterdam 📃
The meaning of intonation
-
M. Westera, D. Goodhue & C. Gussenhoven (2020). Meanings of tones and tunes. In C. Gussenhoven & A. Chen (ed.), The Oxford Handbook of Language Prosody 📃 (or the official paywalled version)
-
M. Westera (2019). Rise-fall-rise as a marker of secondary QUDs. In D. Gutzman & K. Turgay (ed.), Secondary content: the linguistics of side issues 📃 (or the official paywalled version)
-
M. Westera (2018). Rising declaratives of the Quality-suspending kind. Glossa: a journal of general linguistics 3(1), 121 📃
-
M. Westera (2014). Grounding topic and focus in biological codes. In Proceedings of Tonal Aspects of Languages (TAL). 📃
-
M. Westera (2013). ‘Attention, I’m violating a maxim!’ - a unifying account of the final rise. In Proceedings of SemDial. 📃