I investigate language and curiosity, using methods from experimental and computational linguistics.
I am an assistant professor in Humanities and AI at the wonderful Leiden University Centre for Linguistics, affiliated with the Leiden HumAN (Humanities AI & NLP) group), SAILS (Society, AI & Life Sciences) and the CHAOS group (Cyber-Humanities for the Advancement of Society).
Research
- Using questions on social media as a window on our collective curiosity. Does encountering a question on social media, make us more sensitive to subsequent (dis)information that answers it?
- WetSuite, unlocking government/legal data for NLP research, and unlocking NLP for legal researchers.
- Why are disjunctions like “it’s a lion or a mammal” are a bit weird? (Aiming to get this manuscript published at some point.)
- I have developed an extensive theory of exhaustivity implicature and the meaning of prosody.
- I am supervising excellent PhDs on turn taking, quantity expressiona and legal tech, and BA and MA students writing theses about questions, intonation (focus, rising declaratives), discourse structure, framing, and moral judgments.
Teaching
Come study Language and Cognition and/or Computational Linguistics in Leiden! I teach the following courses:
- Python for Linguists 1
- Python for Linguists 2
- Language and logic
- Advanced Semantics
- Seminar BA-thesis Language and Cognition
Tools
Some bare-bones command-line tools to ‘get stuff done’:
- QuoteLLM: A command line tool (wrapping around
transformers
) to constrain LLMs to only generate literal quotes from a given passage (countering hallucination). - ChoiceLLM: A command line tool (wrapping around
transformers
) to more easily get scalar ratings, comparative judgmenets, and multiple choice judgments from LLMs (local and OpenAI). - Strample: Data exploration tool, does stratified sampling to generate HTML tables for manual inspection.
- SpaCy-wrap: A wrapper around the
spaCy
library for Natural Language Processing. - Jsonlined: If you often find yourself extracting values from a
.jsonl
file, doing something to them, and re-inserting them; though I have mostly shifted tojq
now. - cleval: Easily evaluate a classifier, comparing targets vs. predictions.
- htmlreport: Add just one line of code, and your print statements and
matplotlib
plots end up on a simple html page.
Some datasets:
- TED-talks with the questions they evoke, for research into curiosity and discourse structure – also check the more convenient representation here.
- Dutch intonation – scraped dataset with audio examples with intonation transcriptions, in the ToDI framework.
- ManyNames dataset – crowdsourced image labels, many per image. I collaborated on the English portion.
Selected research papers
Implicit questions and curiosity
-
M. Westera, J. Amidei & L. Mayol (2020). Similarity or deeper understanding? Analyzing the TED-Q dataset of evoked questions. In Proceedings of 28th International Conference on Computational Linguistics (CoLing). 📃
-
M. Westera, L. Mayol & H. Rohde (2020). TED-Q: TED Talks and the Questions they Evoke. In Proceedings of Language Resources and Evaluation Conference (LREC). 📃
-
M. Westera & A. Brasoveanu (2014). Ignorance in context: the interaction of modified numerals and QUDs. In Proceedings of Semantics and Linguistic Theory (SALT) 24. 📃
Semantics and deep learning
-
Westera, M., A. Gupta, G. Boleda & S. Padó (2021). Distributional models of category concepts based on names of category members. Cognitive Science 45 (9). 📃
-
Aina, L., X. Liao, G. Boleda & M. Westera (2021). Does referent predictability affect the choice of referential form? A computational approach using masked coreference resolution. In Proceedings of the 25th Conference on Computational Natural Language Learning. 📃
-
C. Silberer, S. Zarrieß, M. Westera & G. Boleda (2020). Humans Meet Models on Object Naming: A New Dataset and Analysis. In Proceedings of 28th International Conference on Computational Linguistics (CoLing). 📃
-
M. Westera & G. Boleda (2020). A closer look at scalar diversity using contextualized semantic similarity. In Proceedings of Sinn und Bedeutung 24 (SuB). 📃
-
L. Aina, C. Silberer, I. Sorodoc, M. Westera & G. Boleda (2019). What do Entity-Centric Models Learn? Insights from Entity Linking in Multi-Party Dialogue. In Proceedings of NAACL-HLT. 📃
-
M. Westera & G. Boleda (2019). Don’t blame distributional semantics if it can’t do entailment. In Proceedings of the 13th International Conference on Computational Semantics (IWCS). 📃
Attention and indirect communication
-
M. Westera (2022). Attentional Pragmatics: a new pragmatic approach to exhaustivity. Semantics and Pragmatics Vol.15. 📃
-
M. Westera (2022). Alternatives. In D. Altshuler (ed.), Linguistics meets philosophy. 📃 (or the official paywalled version)
-
M. Westera (2019). Implying or implicating not both in declaratives and interrogatives. In Proceedings of Sinn und Bedeutung 24 (SuB). 📃
-
M. Westera (2018). An attention-based explanation for some exhaustivity operators. In Proceedings of Sinn und Bedeutung 21. 📃
-
M. Westera (2017). QUDs, brevity, and the asymmetry of alternatives. In Proceedings of the Amsterdam Colloquium 21. 📃
-
M. Westera (2017). Exhaustivity and intonation: a unified theory. PhD dissertation, Institute for Logic, Language and Computation, University of Amsterdam 📃
The meaning of intonation
-
M. Westera, D. Goodhue & C. Gussenhoven (2020). Meanings of tones and tunes. In C. Gussenhoven & A. Chen (ed.), The Oxford Handbook of Language Prosody 📃 (or the official paywalled version)
-
M. Westera (2019). Rise-fall-rise as a marker of secondary QUDs. In D. Gutzman & K. Turgay (ed.), Secondary content: the linguistics of side issues 📃 (or the official paywalled version)
-
M. Westera (2018). Rising declaratives of the Quality-suspending kind. Glossa: a journal of general linguistics 3(1), 121 📃
-
M. Westera (2014). Grounding topic and focus in biological codes. In Proceedings of Tonal Aspects of Languages (TAL). 📃
-
M. Westera (2013). ‘Attention, I’m violating a maxim!’ - a unifying account of the final rise. In Proceedings of SemDial. 📃