Skip to content

CommandExecutor for external CLI tools as pipeline steps #81

@ddeboer

Description

@ddeboer

Summary

Add a CommandExecutor that runs an external CLI command as a pipeline step, reading quads from stdin or a file and producing quads (or other output) on stdout or to a file.

Context

Some pipeline steps are best handled by external tools rather than in-process Node.js code. For example, loda-pipeline's post-processing chain:

  1. Deduplication — Jena's sparql CLI running a CONSTRUCT query against an N-Triples file
  2. SHACL validation — Jena's shacl validate against EDM shapes
  3. Format conversion — Jena's riot --output=rdfxml for N-Triples → RDF/XML
  4. EDM XML packaging — a custom Java tool (Rdf2EdmCl) that splits RDF/XML into individual EDM XML files in a ZIP

These don't fit the SPARQL CONSTRUCT executor model — they're file-in, file-out CLI invocations. A CommandExecutor would make them composable as pipeline steps.

Approach

A CommandExecutor implements the same interface as SparqlConstructExecutor but delegates to an external process:

class CommandExecutor implements Executor {
  constructor(options: {
    command: string;       // e.g. 'riot'
    args: string[];        // e.g. ['--output=rdfxml']
    // How to pass input: stdin pipe, file path substitution, or no input
    input?: 'stdin' | 'file';
    // How to read output: stdout pipe or file path
    output?: 'stdout' | 'file';
  });
}

This is also the pattern proposed for SparqlAnythingExecutor in the design doc — spawning java -jar sparql-anything.jar per query, parsing N-Triples from stdout.

Uses @lde/task-runner for process lifecycle management (Docker or host execution).

Subsumes #18

SparqlAnythingExecutor is a specialised CommandExecutor — it spawns java -jar sparql-anything.jar and parses N-Triples from stdout. Rather than building a one-off executor for SPARQL Anything, the generic CommandExecutor covers it and any other CLI tool.

Relates to

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions