-
Notifications
You must be signed in to change notification settings - Fork 0
Description
Summary
Add a CommandExecutor that runs an external CLI command as a pipeline step, reading quads from stdin or a file and producing quads (or other output) on stdout or to a file.
Context
Some pipeline steps are best handled by external tools rather than in-process Node.js code. For example, loda-pipeline's post-processing chain:
- Deduplication — Jena's
sparqlCLI running a CONSTRUCT query against an N-Triples file - SHACL validation — Jena's
shacl validateagainst EDM shapes - Format conversion — Jena's
riot --output=rdfxmlfor N-Triples → RDF/XML - EDM XML packaging — a custom Java tool (
Rdf2EdmCl) that splits RDF/XML into individual EDM XML files in a ZIP
These don't fit the SPARQL CONSTRUCT executor model — they're file-in, file-out CLI invocations. A CommandExecutor would make them composable as pipeline steps.
Approach
A CommandExecutor implements the same interface as SparqlConstructExecutor but delegates to an external process:
class CommandExecutor implements Executor {
constructor(options: {
command: string; // e.g. 'riot'
args: string[]; // e.g. ['--output=rdfxml']
// How to pass input: stdin pipe, file path substitution, or no input
input?: 'stdin' | 'file';
// How to read output: stdout pipe or file path
output?: 'stdout' | 'file';
});
}This is also the pattern proposed for SparqlAnythingExecutor in the design doc — spawning java -jar sparql-anything.jar per query, parsing N-Triples from stdout.
Uses @lde/task-runner for process lifecycle management (Docker or host execution).
Subsumes #18
SparqlAnythingExecutor is a specialised CommandExecutor — it spawns java -jar sparql-anything.jar and parses N-Triples from stdout. Rather than building a one-off executor for SPARQL Anything, the generic CommandExecutor covers it and any other CLI tool.
Relates to
- Data Pipelines Framework #78
- SPARQL Anything for non-RDF origin data #18 (SPARQL Anything — a specific case of CommandExecutor)