Linglib

A community library for formal linguistics in Lean 4. Built so theories of meaning, structure, sound, and use can be stated precisely, compared rigorously, and held accountable to data.

Contribute View on GitHub

Why formalize linguistics?

Formal linguistics has accumulated decades of careful work — generative syntax, model-theoretic semantics, optimality theory, distributed morphology, dynamic semantics, rational speech-act pragmatics. The results live in prose scattered across thousands of papers, with assumptions often left implicit. Linglib is an attempt to gather the machinery in one place so a proof assistant can do the bookkeeping.

Three things become possible when the bookkeeping is mechanical. First, breakage becomes detectable. Change your semantics for attitude verbs and Lean tells you exactly which downstream theorems about conditionals, questions, or pragmatic inference no longer follow — no more discovering an inconsistency from a reviewer.

Second, predictions can be checked. Theories are often stated in notation ambiguous enough to hide gaps between what is claimed and what actually follows from the definitions. Lean won’t let a proof go through unless the prediction genuinely follows from the theory.

Third, theories become comparable. When two accounts both claim to handle the same data, we can formally characterize where they agree and where they diverge, rather than arguing past each other in different formalisms. The library is organized so that the same phenomenon is often formalized under multiple competing frameworks — not as a contest, but to make the interconnections visible.

What's here

2940 Lean files · 83 phenomena · 123 languages · 732 theory files · 860 study files

Semantics

392 files

Compositional, dynamic, intensional, probabilistic — Montague through RSA.

Syntax

127 files

Minimalism, HPSG, CCG, dependency grammar, formal language theory.

Pragmatics

61 files

Implicature, speech acts, Gricean and rational speech-act models.

Phonology

53 files

Optimality theory, harmonic grammar, autosegmental, stratal, tier-based locality.

Morphology

26 files

Distributed Morphology, vocabulary insertion, allomorphy, paradigm structure.

Processing

9 files

Surprisal, dependency locality, memory-based parsing, cue-based retrieval.

How it’s organized

Linglib separates phenomena (what we observe) from theories (what explains it). Theory-neutral observations — judgments, experimental results, distributional patterns — live under Phenomena/ and are written so any framework can consume them. Frameworks live under Theories/. Each Studies/ file connects a specific paper’s account to the phenomenon, proving the predictions the paper makes.

The connection across files is made by import, not stipulation: when a study about scope freezing needs c-command, it imports the same Core/ definition that every other study uses. We call this interconnection density — and the dependency map shows it visually: a small number of mathematical foundations, with most of the interesting linguistic theory living on the bridges between them.

Contribute

Linglib is a community library. There’s room for many people in many roles, and most contributions don’t require deep Lean expertise to start. Four good entry points:

Formalize a paper from your subfield. Pick a paper you know well, write a study file under Phenomena/X/Studies/AuthorYear.lean, and prove the predictions it makes. The contributor guide walks through what a study file looks like.
Add a Fragment for an undercovered language. We have entries for over a hundred languages, but most are sparse. If you work on a language whose typological profile isn’t well represented, even a small fragment is valuable. (Guide)
Pick something off the wishlist. The wishlist is a curated list of theorems, papers, languages, and tools we’d love to see formalized — modeled on Lean’s 1000+ theorems. Items are tagged by difficulty.
Write a post or report a problem. Critical readings are welcome — if a definition seems wrong, or a theorem trivializes the empirical content, open an issue. The blog is open to contributors.

Whether you’re an undergrad working through your first formal-semantics course, a postdoc with strong opinions about CCG, a computational linguist with a probabilistic framework to formalize, or a Lean enthusiast looking for a domain to apply theorem proving — there’s something here you can do.

Resources

Wishlist — theorems, papers, languages, and tools we'd love help with.
Blog — development notes and essays.
Map — interactive dependency graph of the mathematical foundations.
Bibliography — every paper cited in the formalization, with reverse index.
Roadmap — north stars, key conjectures, and short-term work.
API documentation — generated reference for every declaration.
GitHub — source, issues, and pull requests.