feat(CellularAutomata): One-Dimensional Cellular Automata for Language Recognition

## Topic

Formalize one-dimensional cellular automata (CA) for language recognition in CSLib, starting with core definitions and building toward real-time recognition results.

This is based on an existing Lean 4 formalization ([hediet/lean-cellular-automata](https://github.com/hediet/lean-cellular-automata), ~9.4k LOC of definitions and proofs, 10 sorry-free results + conjectures) that I'm adapting to CSLib conventions. The original project covers CA definitions, acceptance schemes, quiescent/dead border properties, OCA↔CA equivalence, speedup theorems, RT transducer composition, and a novel advice framework.

## Scope

### Phase 1: Core Definitions (this PR)

- `CellAutomaton α β Q` — one-dimensional CA with radius-1 neighborhood, parameterized by input alphabet `α`, output alphabet `β`, and internal state type `Q`.
- `Config`, `Trace` — configurations over `ℤ` and temporal traces.
- Transition operations: `next`, `step`, `comp`, `trace`.
- `LCellAutomaton` — language-accepting CA (`Option α` input, `Bool` output).
- `AcceptanceScheme` — parameterized acceptance (time step and position as functions of input length), with `rt`, `rtRight`, and `lt` schemes.
- `Quiescent` and `Dead` state predicates, with `Dead → Quiescent`.

### Phase 2: Base Results (future PRs)

- OCA ↔ CA equivalence (left-independent simulation, 2x time factor)
- Quiescent border and dead border constructions (proving the unconstrained border is WLOG)
- $k$-step speedup theorems
- Exponential word length is RT-recognizable

## Design Decisions vs. Standard Literature

The formalization makes several deliberate generalizations over the standard definitions (e.g. Kutrib 2009, Smith 1972). A detailed literature review with annotated paper references is available [here](https://github.com/hediet/cslib/blob/cslib-cellular-automata-docs/lean-literature.md). Key decisions:

### Transducer view (`embed`/`project` maps)

Standard: Input alphabet $A \subseteq S$, acceptance via $F \subseteq S$ (Kutrib 2009).
Ours: Separate `embed : α → Q` and `project : Q → β` maps.

This is more general — the standard approach is the special case where `embed` is an inclusion and `project` is membership in $F$. The transducer view follows Grandjean, Richard & Terrier (2012).

### Unconstrained border

Standard: Permanent boundary `# ∉ S` (Kutrib 2009) or quiescent border $\delta(q, q, q) = q$ (Smith 1972, Martin 1994).
Ours: `embed(none)` has no a priori constraints.

This is conservative — Phase 2 results prove that quiescent and dead borders can always be imposed without changing the recognized language. Many papers assume this without proof; ours makes it explicit. The motivation for leaving the border unconstrained is that it simplifies the core definitions and avoids baking in an assumption that would then need to be threaded through every construction. Instead, the border properties become standalone results that users can apply when needed.

### Parameterized acceptance schemes

Standard: Leftmost cell enters accepting state (Kutrib 2009), sometimes rightmost for OCA (Dyer 1980).
Ours: `AcceptanceScheme(t, p)` with arbitrary time and position functions.

This unifies real-time, linear-time, and rightmost-cell acceptance into a single framework. It also cleanly separates the CA dynamics from the acceptance mechanism: the CA itself is a pure transducer (`CellAutomaton`), and the `AcceptanceScheme` is an external observer that reads a cell at a chosen time. This separation naturally accommodates CAs that decide on their own when to halt — such models can be expressed by having the CA signal acceptance through its output (e.g. a dedicated "done" symbol), with the scheme reading accordingly. The key insight is that the halting logic is part of the CA's state machine, not the acceptance framework.

### Configurations over `ℤ` (0-indexed)

Standard: Usually 1-based indexing (cells $1..n$).
Ours: 0-based (cells $0..n-1$), configurations `ℤ → Q`.

Convention difference only. At time $n-1$, information from position $n-1$ has just reached position 0.

## Potential Discussion Points

1. **Reuse of existing CSLib abstractions: CellAutomaton vs. LTS/FLTS**

   Should `CellAutomaton` instantiate `LTS`, `FLTS`, or another existing CSLib framework? After detailed analysis, **no** — the mismatch is fundamental, not cosmetic.

   **What LTS provides.** `LTS State Label` (`LTS/Basic.lean`) defines a labelled transition *relation* `Tr : State → Label → State → Prop`, with infrastructure for multistep transitions (`MTr` over `List Label`), executions, reachability (`CanReach`), bisimulation, simulation, and trace equivalence. `FLTS` (`FLTS/Basic.lean`) is the deterministic specialization: `tr : State → Label → State`.

   **Where the CA model diverges:**

   - **Global synchronous step vs. per-state transition.** In an LTS, a "state" is an atomic element and a transition is a single step from one state to another. In a CA, the meaningful "state" is an entire *configuration* (`Config Q = ℤ → Q`) — an infinite assignment of cell states. The transition `next : Config Q → Config Q` updates *all* cells simultaneously based on their neighborhoods. There is no meaningful per-cell LTS because cell $i$'s update depends on cells $i-1$ and $i+1$; the cells don't transition independently.

   - **No labels.** LTS transitions are labelled; the label type drives the multistep theory (label lists), trace equivalence (label sequences), and bisimulation (label matching). A CA step has no natural label — it's a deterministic iteration with no choice or observable action. The only encoding would be `Label = Unit`, which makes `List Label` isomorphic to `ℕ`, and all the label-based machinery (trace equivalence, bisimulation up to label matching) becomes vacuous overhead. CA already has `step c t = next^[t] c` which is the right abstraction for deterministic iteration.

   - **Different acceptance model.** LTS has no native acceptance; its theory is about behavioral equivalence. The CSLib `Acceptor` framework (under `Computability.Automata.Acceptors`) uses runs and accepting states — also not applicable. CA acceptance reads a *specific cell at a specific time step* via `AcceptanceScheme(t, p)`, which has no LTS analogue.

   - **State space mismatch.** LTS infrastructure (image-finiteness, finitely-branching, finite-state) assumes the state type is manageable. `Config Q = ℤ → Q` is uncountable for $|Q| \geq 2$. None of the LTS classification results would apply.

   **What an instantiation would look like.** Technically possible:
   ```lean
   def CellAutomaton.toFLTS (C : CellAutomaton α β Q) : FLTS (Config Q) Unit where
     tr := fun c _ => C.next c
   ```
   This gives us `FLTS.mtr` (fold over `List Unit` = iterate `next`), which duplicates `step`. We'd inherit nothing useful from bisimulation or simulation theory — bisimulation on `Config Q` with `Unit` labels just says two configurations evolve identically, which is trivially decidable from equality.

   **Conclusion.** `CellAutomaton` should remain a standalone structure. The CA ↔ LTS connection is a *theorem* to prove (e.g., "a CA induces a deterministic dynamical system on configurations"), not a *design* to impose. Forcing the CA into an LTS interface would add complexity (Unit labels, relation-vs-function encoding, useless infrastructure) without enabling any existing LTS results to be applied.

2. **Naming and placement**: Currently under `Cslib.Computability.CellularAutomata` (sibling of `Automata`). An alternative would be `Cslib.Computability.Automata.CellularAutomata`, grouping all automata together. However, CAs differ substantially from finite-word automata (global synchronous step, infinite configuration space over `ℤ`, no runs/final-state acceptance), and the existing `Automata` subtree shares abstractions (`Acceptors`, language recognition via runs) that don't apply to CA. The current placement mirrors `Machines.SingleTapeTuring`, which is also a sibling of `Automata` rather than nested under it.

3. **Generalization of definitions**: The current formalization is restricted to 1D radius-1 neighborhoods. Some papers (e.g. Delacourt & Poupet 2015) work with arbitrary dimension $d$ and general neighborhoods $N \subseteq \mathbb{Z}^d$. Should the definitions be parameterized over dimension/neighborhood from the start, or is it better to formalize 1D first and generalize later? The 1D case covers the core language recognition theory, and most results (speedup, OCA↔CA, acceptance schemes) are specific to 1D.

4. **`Q` as parameter vs. field — universe polymorphism**: Currently the CSLib version takes `Q` as a parameter: `structure CellAutomaton (α β Q : Type*)`. The original formalization instead makes `Q` a field: `structure CellAutomaton (α β : Type) where Q : Type; ...`. This is a significant design difference.

   With `Q` as a **field** (original), `CellAutomaton α β` is a sigma type — each CA bundles its own state type. This makes existential statements natural: language classes are defined as sets of `tCellAutomaton α`, and `ℒ(CA_rt)` is simply `{ L | ∃ C ∈ CA_rt, L = C.lang }`. Constructions that transform CAs (product, speedup, OCA↔CA conversion) can freely choose different `Q` types — e.g. `Q₁ × Q₂`, `Fin k → Q`, `Option Q` — without changing the external type signature `CellAutomaton α β`. This is essential for the composition pipeline where each stage has a different state space.

   With `Q` as a **parameter** (CSLib version), existential statements require explicit `∃ Q, ∃ C : CellAutomaton α β Q, ...` with universe quantification. Language classes can't be simple sets of CAs — they need to quantify over `Q`. Constructions that change the state type produce a different *type* of CA, making composition chains harder to express uniformly.

   However, the field approach requires `Q` to carry typeclass constraints (the original uses `[Alphabet Q]` requiring `DecidableEq`, `Fintype`, `Inhabited`), which adds overhead. The parameter approach is more conventional in Lean/Mathlib and gives better type inference when `Q` is known.

   This choice primarily affects Phase 2+ (where constructions compose CAs with different state types). For Phase 1 definitions, both approaches work. Feedback on the preferred direction would be valuable before porting the heavier constructions.

   Related: the original uses `Type` (in `Type 0`), while the CSLib version uses `Type*` (universe-polymorphic). For CA language theory all relevant types are finite in practice, so `Type` suffices, but `Type*` is consistent with CSLib/Mathlib conventions.

## Execution Plan

1. ✅ Phase 1 PR (WIP): Core definitions — `Basic.lean`, `Defs.lean`, `Language.lean`, `State.lean`
2. Port base results (Results 1–6) adapting constructions from the original formalization
3. Each phase as a separate PR after the previous is merged

## References

- Smith III, A. R. (1972). *Real-time language recognition by one-dimensional cellular automata.* JCSS 6(3), 233–253.
- Kutrib, M. (2009). *Cellular automata and language theory.* Encyclopedia of Complexity and Systems Science, 800–823.
- Dyer, C. R. (1980). *One-way bounded cellular automata.* Information and Control 44(3), 261–281.
- Grandjean, A., Richard, G. & Terrier, V. (2012). *Linear functional classes over cellular automata.* EPTCS 90, 185–197.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(CellularAutomata): One-Dimensional Cellular Automata for Language Recognition #407

Topic

Scope

Phase 1: Core Definitions (this PR)

Phase 2: Base Results (future PRs)

Design Decisions vs. Standard Literature

Transducer view (`embed`/`project` maps)

Unconstrained border

Parameterized acceptance schemes

Configurations over `ℤ` (0-indexed)

Potential Discussion Points

Execution Plan

References

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

feat(CellularAutomata): One-Dimensional Cellular Automata for Language Recognition #407

Description

Topic

Scope

Phase 1: Core Definitions (this PR)

Phase 2: Base Results (future PRs)

Design Decisions vs. Standard Literature

Transducer view (embed/project maps)

Unconstrained border

Parameterized acceptance schemes

Configurations over ℤ (0-indexed)

Potential Discussion Points

Execution Plan

References

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Transducer view (`embed`/`project` maps)

Configurations over `ℤ` (0-indexed)