01Start with the transcription, not a theory
The pipeline uses the Landini–Stolfi IVTFF archive and filters to Takahashi ;H lines: 5,207 parsed lines across 225 pages. That textual substrate is credited against Zandbergen’s public transliteration and IVTFF documentation before any modeling begins.
02EVA becomes STA
EVA is good for transcription, but analysis benefits from families. The Super Transliteration Alphabet is René Zandbergen’s public processing alphabet; this framework uses STA-style family/member codes — q-like prefixes, gallows, vowel clusters, and terminal forms — to test structure without pretending we know phonetics.
03Words become graph flow
Tokens are treated as nodes and adjacent token transitions as edges. This turns a page, paragraph, or full manuscript into a transition system: what follows what, how often, how constrained, and how different the regimes are.
04Hodge enters as a filter
Hodge-style graph analysis separates flow into structured recurrence versus noisy drift. In plain English: it asks whether the text has stable circulating patterns that survive when you compare sections, hands, and controls.
05High-dimensional features, then projection
The framework can lift token windows into high-dimensional feature space — STA families, positions, transitions, entropy, graph features, and embedding modes — then project them back into interpretable measurements. The point is not mysticism; it is controlled compression.
06Claims must survive holdout
Any candidate “meaning” or key has to generalize across folios, sections, scribal hands, and image context. If it only works after patching rules page-by-page, it fails.