Skip to content

feat(CellularAutomata): One-Dimensional Cellular Automata for Language Recognition #407

@hediet

Description

@hediet

Topic

Formalize one-dimensional cellular automata (CA) for language recognition in CSLib, starting with core definitions and building toward real-time recognition results.

This is based on an existing Lean 4 formalization (hediet/lean-cellular-automata, ~9.4k LOC of definitions and proofs, 10 sorry-free results + conjectures) that I'm adapting to CSLib conventions. The original project covers CA definitions, acceptance schemes, quiescent/dead border properties, OCA↔CA equivalence, speedup theorems, RT transducer composition, and a novel advice framework.

Scope

Phase 1: Core Definitions (this PR)

  • CellAutomaton α β Q — one-dimensional CA with radius-1 neighborhood, parameterized by input alphabet α, output alphabet β, and internal state type Q.
  • Config, Trace — configurations over and temporal traces.
  • Transition operations: next, step, comp, trace.
  • LCellAutomaton — language-accepting CA (Option α input, Bool output).
  • AcceptanceScheme — parameterized acceptance (time step and position as functions of input length), with rt, rtRight, and lt schemes.
  • Quiescent and Dead state predicates, with Dead → Quiescent.

Phase 2: Base Results (future PRs)

  • OCA ↔ CA equivalence (left-independent simulation, 2x time factor)
  • Quiescent border and dead border constructions (proving the unconstrained border is WLOG)
  • $k$-step speedup theorems
  • Exponential word length is RT-recognizable

Design Decisions vs. Standard Literature

The formalization makes several deliberate generalizations over the standard definitions (e.g. Kutrib 2009, Smith 1972). A detailed literature review with annotated paper references is available here. Key decisions:

Transducer view (embed/project maps)

Standard: Input alphabet $A \subseteq S$, acceptance via $F \subseteq S$ (Kutrib 2009).
Ours: Separate embed : α → Q and project : Q → β maps.

This is more general — the standard approach is the special case where embed is an inclusion and project is membership in $F$. The transducer view follows Grandjean, Richard & Terrier (2012).

Unconstrained border

Standard: Permanent boundary # ∉ S (Kutrib 2009) or quiescent border $\delta(q, q, q) = q$ (Smith 1972, Martin 1994).
Ours: embed(none) has no a priori constraints.

This is conservative — Phase 2 results prove that quiescent and dead borders can always be imposed without changing the recognized language. Many papers assume this without proof; ours makes it explicit. The motivation for leaving the border unconstrained is that it simplifies the core definitions and avoids baking in an assumption that would then need to be threaded through every construction. Instead, the border properties become standalone results that users can apply when needed.

Parameterized acceptance schemes

Standard: Leftmost cell enters accepting state (Kutrib 2009), sometimes rightmost for OCA (Dyer 1980).
Ours: AcceptanceScheme(t, p) with arbitrary time and position functions.

This unifies real-time, linear-time, and rightmost-cell acceptance into a single framework. It also cleanly separates the CA dynamics from the acceptance mechanism: the CA itself is a pure transducer (CellAutomaton), and the AcceptanceScheme is an external observer that reads a cell at a chosen time. This separation naturally accommodates CAs that decide on their own when to halt — such models can be expressed by having the CA signal acceptance through its output (e.g. a dedicated "done" symbol), with the scheme reading accordingly. The key insight is that the halting logic is part of the CA's state machine, not the acceptance framework.

Configurations over (0-indexed)

Standard: Usually 1-based indexing (cells $1..n$).
Ours: 0-based (cells $0..n-1$), configurations ℤ → Q.

Convention difference only. At time $n-1$, information from position $n-1$ has just reached position 0.

Potential Discussion Points

  1. Reuse of existing CSLib abstractions: CellAutomaton vs. LTS/FLTS

    Should CellAutomaton instantiate LTS, FLTS, or another existing CSLib framework? After detailed analysis, no — the mismatch is fundamental, not cosmetic.

    What LTS provides. LTS State Label (LTS/Basic.lean) defines a labelled transition relation Tr : State → Label → State → Prop, with infrastructure for multistep transitions (MTr over List Label), executions, reachability (CanReach), bisimulation, simulation, and trace equivalence. FLTS (FLTS/Basic.lean) is the deterministic specialization: tr : State → Label → State.

    Where the CA model diverges:

    • Global synchronous step vs. per-state transition. In an LTS, a "state" is an atomic element and a transition is a single step from one state to another. In a CA, the meaningful "state" is an entire configuration (Config Q = ℤ → Q) — an infinite assignment of cell states. The transition next : Config Q → Config Q updates all cells simultaneously based on their neighborhoods. There is no meaningful per-cell LTS because cell $i$'s update depends on cells $i-1$ and $i+1$; the cells don't transition independently.

    • No labels. LTS transitions are labelled; the label type drives the multistep theory (label lists), trace equivalence (label sequences), and bisimulation (label matching). A CA step has no natural label — it's a deterministic iteration with no choice or observable action. The only encoding would be Label = Unit, which makes List Label isomorphic to , and all the label-based machinery (trace equivalence, bisimulation up to label matching) becomes vacuous overhead. CA already has step c t = next^[t] c which is the right abstraction for deterministic iteration.

    • Different acceptance model. LTS has no native acceptance; its theory is about behavioral equivalence. The CSLib Acceptor framework (under Computability.Automata.Acceptors) uses runs and accepting states — also not applicable. CA acceptance reads a specific cell at a specific time step via AcceptanceScheme(t, p), which has no LTS analogue.

    • State space mismatch. LTS infrastructure (image-finiteness, finitely-branching, finite-state) assumes the state type is manageable. Config Q = ℤ → Q is uncountable for $|Q| \geq 2$. None of the LTS classification results would apply.

    What an instantiation would look like. Technically possible:

    def CellAutomaton.toFLTS (C : CellAutomaton α β Q) : FLTS (Config Q) Unit where
      tr := fun c _ => C.next c

    This gives us FLTS.mtr (fold over List Unit = iterate next), which duplicates step. We'd inherit nothing useful from bisimulation or simulation theory — bisimulation on Config Q with Unit labels just says two configurations evolve identically, which is trivially decidable from equality.

    Conclusion. CellAutomaton should remain a standalone structure. The CA ↔ LTS connection is a theorem to prove (e.g., "a CA induces a deterministic dynamical system on configurations"), not a design to impose. Forcing the CA into an LTS interface would add complexity (Unit labels, relation-vs-function encoding, useless infrastructure) without enabling any existing LTS results to be applied.

  2. Naming and placement: Currently under Cslib.Computability.CellularAutomata (sibling of Automata). An alternative would be Cslib.Computability.Automata.CellularAutomata, grouping all automata together. However, CAs differ substantially from finite-word automata (global synchronous step, infinite configuration space over , no runs/final-state acceptance), and the existing Automata subtree shares abstractions (Acceptors, language recognition via runs) that don't apply to CA. The current placement mirrors Machines.SingleTapeTuring, which is also a sibling of Automata rather than nested under it.

  3. Generalization of definitions: The current formalization is restricted to 1D radius-1 neighborhoods. Some papers (e.g. Delacourt & Poupet 2015) work with arbitrary dimension $d$ and general neighborhoods $N \subseteq \mathbb{Z}^d$. Should the definitions be parameterized over dimension/neighborhood from the start, or is it better to formalize 1D first and generalize later? The 1D case covers the core language recognition theory, and most results (speedup, OCA↔CA, acceptance schemes) are specific to 1D.

  4. Q as parameter vs. field — universe polymorphism: Currently the CSLib version takes Q as a parameter: structure CellAutomaton (α β Q : Type*). The original formalization instead makes Q a field: structure CellAutomaton (α β : Type) where Q : Type; .... This is a significant design difference.

    With Q as a field (original), CellAutomaton α β is a sigma type — each CA bundles its own state type. This makes existential statements natural: language classes are defined as sets of tCellAutomaton α, and ℒ(CA_rt) is simply { L | ∃ C ∈ CA_rt, L = C.lang }. Constructions that transform CAs (product, speedup, OCA↔CA conversion) can freely choose different Q types — e.g. Q₁ × Q₂, Fin k → Q, Option Q — without changing the external type signature CellAutomaton α β. This is essential for the composition pipeline where each stage has a different state space.

    With Q as a parameter (CSLib version), existential statements require explicit ∃ Q, ∃ C : CellAutomaton α β Q, ... with universe quantification. Language classes can't be simple sets of CAs — they need to quantify over Q. Constructions that change the state type produce a different type of CA, making composition chains harder to express uniformly.

    However, the field approach requires Q to carry typeclass constraints (the original uses [Alphabet Q] requiring DecidableEq, Fintype, Inhabited), which adds overhead. The parameter approach is more conventional in Lean/Mathlib and gives better type inference when Q is known.

    This choice primarily affects Phase 2+ (where constructions compose CAs with different state types). For Phase 1 definitions, both approaches work. Feedback on the preferred direction would be valuable before porting the heavier constructions.

    Related: the original uses Type (in Type 0), while the CSLib version uses Type* (universe-polymorphic). For CA language theory all relevant types are finite in practice, so Type suffices, but Type* is consistent with CSLib/Mathlib conventions.

Execution Plan

  1. ✅ Phase 1 PR (WIP): Core definitions — Basic.lean, Defs.lean, Language.lean, State.lean
  2. Port base results (Results 1–6) adapting constructions from the original formalization
  3. Each phase as a separate PR after the previous is merged

References

  • Smith III, A. R. (1972). Real-time language recognition by one-dimensional cellular automata. JCSS 6(3), 233–253.
  • Kutrib, M. (2009). Cellular automata and language theory. Encyclopedia of Complexity and Systems Science, 800–823.
  • Dyer, C. R. (1980). One-way bounded cellular automata. Information and Control 44(3), 261–281.
  • Grandjean, A., Richard, G. & Terrier, V. (2012). Linear functional classes over cellular automata. EPTCS 90, 185–197.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions