Skip to content

Let stage components select their own distribution #76

@ddeboer

Description

@ddeboer

Problem

Currently Stage.run(dataset, distribution) receives a single pre-resolved Distribution and passes it to all selectors and executors. This works when every component in a stage needs the same distribution type (e.g. a SPARQL endpoint), but breaks down when components need different types — for example, a SPARQL Anything stage would need a CSV distribution, not a SPARQL endpoint.

Proposed direction

  1. Component-level distribution selection — each selector/executor declares what distribution type it needs and receives it from the resolved context, rather than having a single distribution pushed in from Stage.run()
  2. Symmetric selector/executor interfaceStageSelector should receive the dataset/context at iteration time (like Executor.execute() does), instead of being fully configured at construction time

Context

This came up during #75, where SparqlSelector takes a bare URL at construction while SparqlConstructExecutor receives the Distribution at execution time — requiring manual FROM clause construction for the selector.

Related: #18 (SPARQL Anything for non-RDF origin data) would benefit from this, since those stages need non-SPARQL distributions.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions