I have always had a broad view of mathematics, refusing the artificial separation between algebra and analysis, pure and applied math, traditional and formal mathematics. I have dreamed of a unified database of all mathematics, which is easily searchable, and where all connections are recorded explicitly. We built it! TheoremGraph unites arXiv and 25 #Lean repos, including #mathlib into one large dependency graph. We have the dataset, the API, and the MCP server.</p>\n<p>Very proud of the team: Simon Kurgan, Evan Wang, Eric Leonen, Sophie Szeto, Luke Alexander, Artemii Remizov, Jarod Alper, Giovanni Inchiostro, and myself. Of course, this couldn't have been possible without the UW Math AI lab!</p>\n<p>Webpage: <a href=\"https://www.theoremsearch.com/theorem-graph\" rel=\"nofollow\">https://www.theoremsearch.com/theorem-graph</a><br>Stellar explainer video by Simon Kurgan: <a href=\"https://www.youtube.com/watch?v=H2z3VRRd4aQ\" rel=\"nofollow\">https://www.youtube.com/watch?v=H2z3VRRd4aQ</a> (must watch!!!)</p>\n","updatedAt":"2026-06-30T07:30:04.691Z","author":{"_id":"66bff1188c3816c563f42d20","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66bff1188c3816c563f42d20/G_GDUjvn9fRV7Sjc5W_yG.jpeg","fullname":"Vasily Ilin","name":"Vilin97","type":"user","isPro":false,"isHf":false,"isHfAdmin":false,"isMod":false,"followerCount":3,"isUserFollowing":false}},"numEdits":0,"identifiedLanguage":{"language":"en","probability":0.9165665507316589},"editors":["Vilin97"],"editorAvatarUrls":["https://cdn-avatars.huggingface.co/v1/production/uploads/66bff1188c3816c563f42d20/G_GDUjvn9fRV7Sjc5W_yG.jpeg"],"reactions":[],"isReport":false}}],"primaryEmailConfirmed":false,"paper":{"id":"2606.25363","authors":[{"_id":"6a4198750dbbc53604b66929","name":"Simon Kurgan","hidden":false},{"_id":"6a4198750dbbc53604b6692a","name":"Evan Wang","hidden":false},{"_id":"6a4198750dbbc53604b6692b","name":"Eric Leonen","hidden":false},{"_id":"6a4198750dbbc53604b6692c","name":"Sophie Szeto","hidden":false},{"_id":"6a4198750dbbc53604b6692d","name":"Luke Alexander","hidden":false},{"_id":"6a4198750dbbc53604b6692e","name":"Artemii Remizov","hidden":false},{"_id":"6a4198750dbbc53604b6692f","name":"Jarod Alper","hidden":false},{"_id":"6a4198750dbbc53604b66930","name":"Giovanni Inchiostro","hidden":false},{"_id":"6a4198750dbbc53604b66931","name":"Vasily Ilin","hidden":false}],"mediaUrls":["https://cdn-uploads.huggingface.co/production/uploads/66bff1188c3816c563f42d20/lNnymRwXBwTOFgeN8Nc5G.jpeg"],"publishedAt":"2026-06-24T00:00:00.000Z","submittedOnDailyAt":"2026-06-30T00:00:00.000Z","title":"TheoremGraph: Bridging Formal and Informal Mathematics","submittedOnDailyBy":{"_id":"66bff1188c3816c563f42d20","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66bff1188c3816c563f42d20/G_GDUjvn9fRV7Sjc5W_yG.jpeg","isPro":false,"fullname":"Vasily Ilin","user":"Vilin97","type":"user","name":"Vilin97"},"summary":"Mathematical knowledge is organized around statements and their dependencies, but this structure is exposed unevenly: informal papers cite mostly at the document level, while formal libraries record fine-grained dependencies over a much smaller body of mathematics. We introduce TheoremGraph, a unified statement-level dependency graph spanning both informal and formal mathematics. On the informal side, we parse 11.7M theorem-like environments from mathematics arXiv and recover 18.3M candidate directed dependencies, each labeled by the extractor that proposed it so downstream users can trade coverage for precision. On the formal side, we release LeanGraph, a Lean 4 elaborator-level extractor producing 388,105 declaration nodes and 11.3M typed edges across 25 Lean projects. We bridge the two graphs by embedding generated natural-language slogans into a shared semantic space, linking related statements across papers and across the informal/formal divide; an LLM judge affirms 47,952 such matches above a 0.8 cosine floor, with the judge-acceptance rate rising from 48% across the floor to 87% in the >=0.9 tier. On formal concept retrieval, our name-and-signature representation with graph expansion comes within 0.5pp of LeanSearch v2's reranked Recall@10 (0.775 vs. 0.780) without an LM reranker. We release the dataset, extractors, HTTP API, and MCP interface as infrastructure for mathematical search, attribution, and retrieval-augmented reasoning, available at theoremsearch.com and huggingface.co/datasets/uw-math-ai/theorem-matching.","upvotes":5,"discussionId":"6a4198760dbbc53604b66932","projectPage":"https://www.theoremsearch.com/theorem-graph","ai_summary":"A unified mathematical dependency graph connects informal and formal mathematics through semantic embedding and automated extraction from arXiv papers and Lean projects.","ai_keywords":["TheoremGraph","LeanGraph","formal libraries","informal papers","statement-level dependency graph","natural-language slogans","semantic space","LLM judge","concept retrieval","graph expansion","LeanSearch"],"ai_summary_model":"Qwen/Qwen2.5-Coder-32B-Instruct","organization":{"_id":"68eee0cdc7ef8ecec4212270","name":"uw-math-ai","fullname":"University of Washington Math AI Lab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/66bff1188c3816c563f42d20/l6mZEpaxnWVpVyZWtQKLM.png"}},"canReadDatabase":false,"canManagePapers":false,"canSubmit":false,"hasHfLevelAccess":false,"upvoted":false,"upvoters":[{"_id":"66bff1188c3816c563f42d20","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/66bff1188c3816c563f42d20/G_GDUjvn9fRV7Sjc5W_yG.jpeg","isPro":false,"fullname":"Vasily Ilin","user":"Vilin97","type":"user"},{"_id":"69fcc7759844753a490022aa","avatarUrl":"/avatars/01fb3c2c7377b7e6f331fc27763f5359.svg","isPro":false,"fullname":"Emily meng","user":"pickle05","type":"user"},{"_id":"699f82020b26cc395dc98154","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/gXFSk4HCNRl1zyvOlSF0r.jpeg","isPro":false,"fullname":"Simon Kurgan","user":"simku22","type":"user"},{"_id":"69081136e77c7a753da7718e","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/noauth/vBNafgaLtam0aOHRiZaVf.png","isPro":false,"fullname":"Simon Chess","user":"sgvtc","type":"user"},{"_id":"63c1699e40a26dd2db32400d","avatarUrl":"https://cdn-avatars.huggingface.co/v1/production/uploads/63c1699e40a26dd2db32400d/3N0-Zp8igv8-52mXAdiiq.jpeg","isPro":false,"fullname":"Chroma","user":"Chroma111","type":"user"}],"acceptLanguages":["en"],"dailyPaperRank":0,"organization":{"_id":"68eee0cdc7ef8ecec4212270","name":"uw-math-ai","fullname":"University of Washington Math AI Lab","avatar":"https://cdn-avatars.huggingface.co/v1/production/uploads/66bff1188c3816c563f42d20/l6mZEpaxnWVpVyZWtQKLM.png"},"markdownContentUrl":"https://huggingface.co/buckets/huggingchat/papers-content/resolve/2606/2606.25363.md","query":{}}">
TheoremGraph: Bridging Formal and Informal Mathematics
Abstract
A unified mathematical dependency graph connects informal and formal mathematics through semantic embedding and automated extraction from arXiv papers and Lean projects.
Mathematical knowledge is organized around statements and their dependencies, but this structure is exposed unevenly: informal papers cite mostly at the document level, while formal libraries record fine-grained dependencies over a much smaller body of mathematics. We introduce TheoremGraph, a unified statement-level dependency graph spanning both informal and formal mathematics. On the informal side, we parse 11.7M theorem-like environments from mathematics arXiv and recover 18.3M candidate directed dependencies, each labeled by the extractor that proposed it so downstream users can trade coverage for precision. On the formal side, we release LeanGraph, a Lean 4 elaborator-level extractor producing 388,105 declaration nodes and 11.3M typed edges across 25 Lean projects. We bridge the two graphs by embedding generated natural-language slogans into a shared semantic space, linking related statements across papers and across the informal/formal divide; an LLM judge affirms 47,952 such matches above a 0.8 cosine floor, with the judge-acceptance rate rising from 48% across the floor to 87% in the >=0.9 tier. On formal concept retrieval, our name-and-signature representation with graph expansion comes within 0.5pp of LeanSearch v2's reranked Recall@10 (0.775 vs. 0.780) without an LM reranker. We release the dataset, extractors, HTTP API, and MCP interface as infrastructure for mathematical search, attribution, and retrieval-augmented reasoning, available at theoremsearch.com and huggingface.co/datasets/uw-math-ai/theorem-matching.
Community
I have always had a broad view of mathematics, refusing the artificial separation between algebra and analysis, pure and applied math, traditional and formal mathematics. I have dreamed of a unified database of all mathematics, which is easily searchable, and where all connections are recorded explicitly. We built it! TheoremGraph unites arXiv and 25 #Lean repos, including #mathlib into one large dependency graph. We have the dataset, the API, and the MCP server.
Very proud of the team: Simon Kurgan, Evan Wang, Eric Leonen, Sophie Szeto, Luke Alexander, Artemii Remizov, Jarod Alper, Giovanni Inchiostro, and myself. Of course, this couldn't have been possible without the UW Math AI lab!
Webpage: https://www.theoremsearch.com/theorem-graph
Stellar explainer video by Simon Kurgan: https://www.youtube.com/watch?v=H2z3VRRd4aQ (must watch!!!)
Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.
Tap or paste here to upload images
Cite arxiv.org/abs/2606.25363 in a model README.md to link it from this page.
Cite arxiv.org/abs/2606.25363 in a Space README.md to link it from this page.
Discussion (0)
Sign in to join the discussion. Free account, 30 seconds — email code or GitHub.
Sign in →No comments yet. Sign in and be the first to say something.