rocrate-indexer is a tool for indexing and searching Research Object Crates (RO-Crates). It provides both a Command Line Interface (CLI) and a REST API to manage and query metadata from various RO-Crate sources.
Powered by the Tantivy search engine, it enables full-text and structured searching across complex RO-Crate hierarchies.
- Flexible Ingestion: Add RO-Crates from local directories, ZIP archives, or remote URLs.
- Automatic Subcrate Discovery: Recursively detects and indexes nested subcrates.
- Deep Indexing: Indexes RO-Crates and all of its entities individually.
- Tantivy Search: High-performance search with support for complex queries (boolean, nested properties, etc.).
- Dual Interface:
rocrate-idx: Powerful CLI for local management and search.rocrate-server: Web server with a REST API and built-in Swagger UI.
- Rust (latest stable, 1.85+ recommended for Edition 2024)
git clone https://github.com/your-repo/rocrate-indexer.git
cd rocrate-indexer
cargo build --releaseThe binaries will be available in target/release/rocrate-idx and target/release/rocrate-server.
The CLI provides several commands to manage your index:
add <source>: Add an RO-Crate from a path or URL.search <query>: Search for crates matching a query.list: List all indexed crate IDs.show <crate_id>: Show full metadata JSON for a crate.info <crate_id>: Show summarized info for a crate.remove <crate_id>: Remove a crate from the index.
# Add from URL
rocrate-idx add https://rocrate.s3.computational.bio.uni-giessen.de/ro-crate-metadata.json
# Add from local
rocrate-idx add ./ro-crate-metadata.json
# Add from local ZIP
rocrate-idx add ./ro-crate.zip
# Search for Person entities
rocrate-idx search "entity_type:Person"
# Search with boolean query
rocrate-idx search "name:reference-genome.fasta.gz AND entity_type:File"The server provides a REST API to manage the index remotely.
# Start the server
rocrate-serverBy default, the server runs on http://127.0.0.1:3000. You can change this using environment variables:
PORT: Set the port (default: 3000)BIND_ADDR: Set the bind address (default: 127.0.0.1)RUST_LOG: Set log level (e.g.,RUST_LOG=info)
Once the server is running, visit http://127.0.0.1:3000/swagger-ui to explore the interactive API documentation.
The search engine supports Tantivy query syntax:
e.coli: Simple full-text search.entity_type:Person: Search by a specific type.author.name:Smith: Search by nested property.name:reference-genome.fasta.gz AND entity_type:File: Boolean combination.
An extensive demo file is available for testing:
https://rocrate.s3.computational.bio.uni-giessen.de/ro-crate-metadata.json
Add it easily using the CLI:
rocrate-idx add https://rocrate.s3.computational.bio.uni-giessen.de/ro-crate-metadata.jsonOr with a running server:
curl -X POST \
-H 'Content-Type: application/json' \
-d '{"url": "https://rocrate.s3.computational.bio.uni-giessen.de/ro-crate-metadata.json"}' \
'http://localhost:3000/crates/url'The API is licensed under either of
- Apache License, Version 2.0 (LICENSE-APACHE or http://www.apache.org/licenses/LICENSE-2.0)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
at your option. Unless you explicitly state otherwise, any contribution intentionally submitted for inclusion for Aruna by you, as defined in the Apache-2.0 license, shall be dual licensed as above, without any additional terms or conditions.
If you have any ideas, suggestions, or issues, please don't hesitate to open an issue and/or PR. Contributions to this project are always welcome ! We appreciate your help in making this project better.