-
Notifications
You must be signed in to change notification settings - Fork 101
Description
Topic
Formalize one-dimensional cellular automata (CA) for language recognition in CSLib, starting with core definitions and building toward real-time recognition results.
This is based on an existing Lean 4 formalization (hediet/lean-cellular-automata, ~9.4k LOC of definitions and proofs, 10 sorry-free results + conjectures) that I'm adapting to CSLib conventions. The original project covers CA definitions, acceptance schemes, quiescent/dead border properties, OCA↔CA equivalence, speedup theorems, RT transducer composition, and a novel advice framework.
Scope
Phase 1: Core Definitions (this PR)
CellAutomaton α β Q— one-dimensional CA with radius-1 neighborhood, parameterized by input alphabetα, output alphabetβ, and internal state typeQ.Config,Trace— configurations overℤand temporal traces.- Transition operations:
next,step,comp,trace. LCellAutomaton— language-accepting CA (Option αinput,Booloutput).AcceptanceScheme— parameterized acceptance (time step and position as functions of input length), withrt,rtRight, andltschemes.QuiescentandDeadstate predicates, withDead → Quiescent.
Phase 2: Base Results (future PRs)
- OCA ↔ CA equivalence (left-independent simulation, 2x time factor)
- Quiescent border and dead border constructions (proving the unconstrained border is WLOG)
-
$k$ -step speedup theorems - Exponential word length is RT-recognizable
Design Decisions vs. Standard Literature
The formalization makes several deliberate generalizations over the standard definitions (e.g. Kutrib 2009, Smith 1972). A detailed literature review with annotated paper references is available here. Key decisions:
Transducer view (embed/project maps)
Standard: Input alphabet
Ours: Separate embed : α → Q and project : Q → β maps.
This is more general — the standard approach is the special case where embed is an inclusion and project is membership in
Unconstrained border
Standard: Permanent boundary # ∉ S (Kutrib 2009) or quiescent border
Ours: embed(none) has no a priori constraints.
This is conservative — Phase 2 results prove that quiescent and dead borders can always be imposed without changing the recognized language. Many papers assume this without proof; ours makes it explicit. The motivation for leaving the border unconstrained is that it simplifies the core definitions and avoids baking in an assumption that would then need to be threaded through every construction. Instead, the border properties become standalone results that users can apply when needed.
Parameterized acceptance schemes
Standard: Leftmost cell enters accepting state (Kutrib 2009), sometimes rightmost for OCA (Dyer 1980).
Ours: AcceptanceScheme(t, p) with arbitrary time and position functions.
This unifies real-time, linear-time, and rightmost-cell acceptance into a single framework. It also cleanly separates the CA dynamics from the acceptance mechanism: the CA itself is a pure transducer (CellAutomaton), and the AcceptanceScheme is an external observer that reads a cell at a chosen time. This separation naturally accommodates CAs that decide on their own when to halt — such models can be expressed by having the CA signal acceptance through its output (e.g. a dedicated "done" symbol), with the scheme reading accordingly. The key insight is that the halting logic is part of the CA's state machine, not the acceptance framework.
Configurations over ℤ (0-indexed)
Standard: Usually 1-based indexing (cells
Ours: 0-based (cells ℤ → Q.
Convention difference only. At time
Potential Discussion Points
-
Reuse of existing CSLib abstractions: CellAutomaton vs. LTS/FLTS
Should
CellAutomatoninstantiateLTS,FLTS, or another existing CSLib framework? After detailed analysis, no — the mismatch is fundamental, not cosmetic.What LTS provides.
LTS State Label(LTS/Basic.lean) defines a labelled transition relationTr : State → Label → State → Prop, with infrastructure for multistep transitions (MTroverList Label), executions, reachability (CanReach), bisimulation, simulation, and trace equivalence.FLTS(FLTS/Basic.lean) is the deterministic specialization:tr : State → Label → State.Where the CA model diverges:
-
Global synchronous step vs. per-state transition. In an LTS, a "state" is an atomic element and a transition is a single step from one state to another. In a CA, the meaningful "state" is an entire configuration (
Config Q = ℤ → Q) — an infinite assignment of cell states. The transitionnext : Config Q → Config Qupdates all cells simultaneously based on their neighborhoods. There is no meaningful per-cell LTS because cell$i$ 's update depends on cells$i-1$ and$i+1$ ; the cells don't transition independently. -
No labels. LTS transitions are labelled; the label type drives the multistep theory (label lists), trace equivalence (label sequences), and bisimulation (label matching). A CA step has no natural label — it's a deterministic iteration with no choice or observable action. The only encoding would be
Label = Unit, which makesList Labelisomorphic toℕ, and all the label-based machinery (trace equivalence, bisimulation up to label matching) becomes vacuous overhead. CA already hasstep c t = next^[t] cwhich is the right abstraction for deterministic iteration. -
Different acceptance model. LTS has no native acceptance; its theory is about behavioral equivalence. The CSLib
Acceptorframework (underComputability.Automata.Acceptors) uses runs and accepting states — also not applicable. CA acceptance reads a specific cell at a specific time step viaAcceptanceScheme(t, p), which has no LTS analogue. -
State space mismatch. LTS infrastructure (image-finiteness, finitely-branching, finite-state) assumes the state type is manageable.
Config Q = ℤ → Qis uncountable for$|Q| \geq 2$ . None of the LTS classification results would apply.
What an instantiation would look like. Technically possible:
def CellAutomaton.toFLTS (C : CellAutomaton α β Q) : FLTS (Config Q) Unit where tr := fun c _ => C.next c
This gives us
FLTS.mtr(fold overList Unit= iteratenext), which duplicatesstep. We'd inherit nothing useful from bisimulation or simulation theory — bisimulation onConfig QwithUnitlabels just says two configurations evolve identically, which is trivially decidable from equality.Conclusion.
CellAutomatonshould remain a standalone structure. The CA ↔ LTS connection is a theorem to prove (e.g., "a CA induces a deterministic dynamical system on configurations"), not a design to impose. Forcing the CA into an LTS interface would add complexity (Unit labels, relation-vs-function encoding, useless infrastructure) without enabling any existing LTS results to be applied. -
-
Naming and placement: Currently under
Cslib.Computability.CellularAutomata(sibling ofAutomata). An alternative would beCslib.Computability.Automata.CellularAutomata, grouping all automata together. However, CAs differ substantially from finite-word automata (global synchronous step, infinite configuration space overℤ, no runs/final-state acceptance), and the existingAutomatasubtree shares abstractions (Acceptors, language recognition via runs) that don't apply to CA. The current placement mirrorsMachines.SingleTapeTuring, which is also a sibling ofAutomatarather than nested under it. -
Generalization of definitions: The current formalization is restricted to 1D radius-1 neighborhoods. Some papers (e.g. Delacourt & Poupet 2015) work with arbitrary dimension
$d$ and general neighborhoods$N \subseteq \mathbb{Z}^d$ . Should the definitions be parameterized over dimension/neighborhood from the start, or is it better to formalize 1D first and generalize later? The 1D case covers the core language recognition theory, and most results (speedup, OCA↔CA, acceptance schemes) are specific to 1D. -
Qas parameter vs. field — universe polymorphism: Currently the CSLib version takesQas a parameter:structure CellAutomaton (α β Q : Type*). The original formalization instead makesQa field:structure CellAutomaton (α β : Type) where Q : Type; .... This is a significant design difference.With
Qas a field (original),CellAutomaton α βis a sigma type — each CA bundles its own state type. This makes existential statements natural: language classes are defined as sets oftCellAutomaton α, andℒ(CA_rt)is simply{ L | ∃ C ∈ CA_rt, L = C.lang }. Constructions that transform CAs (product, speedup, OCA↔CA conversion) can freely choose differentQtypes — e.g.Q₁ × Q₂,Fin k → Q,Option Q— without changing the external type signatureCellAutomaton α β. This is essential for the composition pipeline where each stage has a different state space.With
Qas a parameter (CSLib version), existential statements require explicit∃ Q, ∃ C : CellAutomaton α β Q, ...with universe quantification. Language classes can't be simple sets of CAs — they need to quantify overQ. Constructions that change the state type produce a different type of CA, making composition chains harder to express uniformly.However, the field approach requires
Qto carry typeclass constraints (the original uses[Alphabet Q]requiringDecidableEq,Fintype,Inhabited), which adds overhead. The parameter approach is more conventional in Lean/Mathlib and gives better type inference whenQis known.This choice primarily affects Phase 2+ (where constructions compose CAs with different state types). For Phase 1 definitions, both approaches work. Feedback on the preferred direction would be valuable before porting the heavier constructions.
Related: the original uses
Type(inType 0), while the CSLib version usesType*(universe-polymorphic). For CA language theory all relevant types are finite in practice, soTypesuffices, butType*is consistent with CSLib/Mathlib conventions.
Execution Plan
- ✅ Phase 1 PR (WIP): Core definitions —
Basic.lean,Defs.lean,Language.lean,State.lean - Port base results (Results 1–6) adapting constructions from the original formalization
- Each phase as a separate PR after the previous is merged
References
- Smith III, A. R. (1972). Real-time language recognition by one-dimensional cellular automata. JCSS 6(3), 233–253.
- Kutrib, M. (2009). Cellular automata and language theory. Encyclopedia of Complexity and Systems Science, 800–823.
- Dyer, C. R. (1980). One-way bounded cellular automata. Information and Control 44(3), 261–281.
- Grandjean, A., Richard, G. & Terrier, V. (2012). Linear functional classes over cellular automata. EPTCS 90, 185–197.