Skip to content

Improve CasToComparableText #444

@reckart

Description

@reckart

Is your feature request related to a problem? Please describe.
When inspecting or diffing CAS contents for tests we frequently rely on a simple CSV stringification that:

  • does not preserve rich, human-friendly output (HTML) for easier visual inspection,
  • lacks configurable columns (anchor, covered text, indexed status),
  • produces unstable ordering for multi-valued/annotation features and ambiguous anchors,
  • offers no convenient way to exclude noisy features/types or treat empty strings specially,
  • and forces long covered text into the output making diffs noisy.

I'm often frustrated when test failures produce long, hard‑to‑scan CAS dumps or when small, irrelevant differences (e.g., non-deterministic anchor numbering or list order) make comparisons brittle.

Describe the solution you'd like
Add an enhanced CAS -> comparable text utility with the following capabilities:

  • Output formats: Keep CSV but add an HTML renderer for nicer human-readable tables.
  • Configurable columns: enable/disable an anchor column, an indexed column, and a covered‑text column (with configurable max length and middle-abbreviation).
  • Anchor formatting: anchors include type short name, optional annotation offsets, optional sofa id, optional indexing marker, and stable disambiguation suffixes for duplicate anchors; support optional anchor feature hash suffix.
  • Stable ordering: when multi‑valued features hold annotations, optionally sort them by begin (asc), end (desc), type name to provide deterministic set‑like ordering.
  • Index awareness: mark FSs as indexed and optionally add a dedicated <INDEXED> column; use indexed status as a tie-breaker when ordering.
  • Exclusions: allow regex patterns to exclude specific features or types from rendering (cache regex compilation for performance).
  • Null/empty handling: configurable nullValue, and an option to treat empty strings as null so empty values don’t clutter diffs.
  • Multi‑valued rendering: robust handling of array/list FSs and primitive arrays, rendering them as bracketed lists; handle nested multi-valued structures recursively.
  • Rendering options: omit XML declaration in HTML output and minimal inline styling so HTML is self-contained.
  • Public API knobs: setters/getters for all above flags so callers can tune output for different use cases (compact machine diffs vs human inspection).

This produces a single stable, configurable comparable representation useful for both automated assertions and human debugging.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions