Skip to content

arunaengine/rocrate-indexer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

RO-Crate Indexer

rocrate-indexer is a tool for indexing and searching Research Object Crates (RO-Crates). It provides both a Command Line Interface (CLI) and a REST API to manage and query metadata from various RO-Crate sources.

Powered by the Tantivy search engine, it enables full-text and structured searching across complex RO-Crate hierarchies.

Features

  • Flexible Ingestion: Add RO-Crates from local directories, ZIP archives, or remote URLs.
  • Automatic Subcrate Discovery: Recursively detects and indexes nested subcrates.
  • Deep Indexing: Indexes RO-Crates and all of its entities individually.
  • Tantivy Search: High-performance search with support for complex queries (boolean, nested properties, etc.).
  • Dual Interface:
    • rocrate-idx: Powerful CLI for local management and search.
    • rocrate-server: Web server with a REST API and built-in Swagger UI.

Installation

Prerequisites

  • Rust (latest stable, 1.85+ recommended for Edition 2024)

From Source

git clone https://github.com/your-repo/rocrate-indexer.git
cd rocrate-indexer
cargo build --release

The binaries will be available in target/release/rocrate-idx and target/release/rocrate-server.

CLI Usage (rocrate-idx)

The CLI provides several commands to manage your index:

  • add <source>: Add an RO-Crate from a path or URL.
  • search <query>: Search for crates matching a query.
  • list: List all indexed crate IDs.
  • show <crate_id>: Show full metadata JSON for a crate.
  • info <crate_id>: Show summarized info for a crate.
  • remove <crate_id>: Remove a crate from the index.

CLI Examples

# Add from URL
rocrate-idx add https://rocrate.s3.computational.bio.uni-giessen.de/ro-crate-metadata.json

# Add from local
rocrate-idx add ./ro-crate-metadata.json

# Add from local ZIP
rocrate-idx add ./ro-crate.zip

# Search for Person entities
rocrate-idx search "entity_type:Person"

# Search with boolean query
rocrate-idx search "name:reference-genome.fasta.gz AND entity_type:File"

Server Usage (rocrate-server)

The server provides a REST API to manage the index remotely.

# Start the server
rocrate-server

By default, the server runs on http://127.0.0.1:3000. You can change this using environment variables:

  • PORT: Set the port (default: 3000)
  • BIND_ADDR: Set the bind address (default: 127.0.0.1)
  • RUST_LOG: Set log level (e.g., RUST_LOG=info)

API Documentation

Once the server is running, visit http://127.0.0.1:3000/swagger-ui to explore the interactive API documentation.

Search Query Syntax

The search engine supports Tantivy query syntax:

  • e.coli: Simple full-text search.
  • entity_type:Person: Search by a specific type.
  • author.name:Smith: Search by nested property.
  • name:reference-genome.fasta.gz AND entity_type:File: Boolean combination.

Demo

An extensive demo file is available for testing: https://rocrate.s3.computational.bio.uni-giessen.de/ro-crate-metadata.json

Add it easily using the CLI:

rocrate-idx add https://rocrate.s3.computational.bio.uni-giessen.de/ro-crate-metadata.json

Or with a running server:

curl -X POST \
     -H 'Content-Type: application/json' \
     -d '{"url": "https://rocrate.s3.computational.bio.uni-giessen.de/ro-crate-metadata.json"}' \
     'http://localhost:3000/crates/url'

License

The API is licensed under either of

at your option. Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion for Aruna by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.

Contributing

If you have any ideas, suggestions, or issues, please don't hesitate to open an issue and/or PR. Contributions to this project are always welcome ! We appreciate your help in making this project better.

About

Indexing RO-Crates into a Tantivy search index

Resources

License

Apache-2.0, MIT licenses found

Licenses found

Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages