Skip to content

Feat/rdf: Add optional module for ROCrate <-> rdf interop#3

Closed
St4NNi wants to merge 30 commits intomainfrom
feat/rdf
Closed

Feat/rdf: Add optional module for ROCrate <-> rdf interop#3
St4NNi wants to merge 30 commits intomainfrom
feat/rdf

Conversation

@St4NNi
Copy link
Copy Markdown
Member

@St4NNi St4NNi commented Dec 23, 2025

This feature adds a RDF to ROCrate interoperability module that allows to export ROCrates as RDF graph and import ROCrates from RDF graphs. All conversion is funneled through the ROCrate struct.

  • Add RDF serialization/deserialization support via new rdf feature flag
  • Implement bidirectional conversion: ROCrate struct ↔ RDF triples
  • Support multiple RDF formats for export and import: Turtle, N-Triples, RDF/XML (powered by oxrdf/oxrdfio)
  • Add JSON-LD context resolution with embedded RO-Crate 1.1/1.2 schema support
  • Compaction: Contexts are used to compact expanded terms back to valid ROCrate
  • Testing: Add Python (rdflib) compatibility testing to verify semantic equivalence between Rust and Python RDF representations

Future implications: The full ROCrate expansion to RDF triples can be used to improve semantic validation of the ROCrate.

Completes: https://github.com/arunaengine/project-orga/issues/26

This requires the changes from: #2 and should be rebased when this is merged.

@St4NNi St4NNi force-pushed the feat/rdf branch 2 times, most recently from 88b7876 to fecf2bf Compare January 8, 2026 09:31
Copy link
Copy Markdown
Member

@lfbrehm lfbrehm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand all of it, especially RDF semantic-related stuff, but did my best to provide a somewhat deep review. I also skipped tests and examples for the most part, because I trust that you ran the tests before.

Comment on lines +76 to +79
let base_rebuilt = base_parts.join("/");
if base_rebuilt.is_empty() {
return rel_remaining.to_string();
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean, that if relative part is for example ../../../ but absolute only states /root/second-level that it returns ../ which should not be a valid solution? Shouldnt this result in an error?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, haven't thought about it. Will be fixed in the next commit by throwing an error !

|| term.starts_with("../")
|| term.starts_with('#')
|| term.starts_with('/')
|| term.contains('/')
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this a relative IRI instead of an absolute IRI? If it contains a / this can also be absolute, because you the (absolute) base is split in resolve_relative_iri.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This occurs, after absolute iris have already been filtered, out, so the functionality is correct, but I will rename the function and add a check for absolute iris to prevent misuse !

Comment on lines +300 to +302
if best.is_none() || len > best.unwrap().2 {
best = Some((prefix, namespace, len));
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that this cannot fail, but wouldnt it be better to just rewrite this with match guards to avoid unwraps?

match best {
  Some((p, n, l)) if len > l => ...
}

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, even better is map_or:

if best.map_or(true, |(_, _, best_len)| len > best_len) {...

Comment on lines +196 to +199
if subject_str.contains("ro-crate-metadata.json") {
metadata_iri = Some(subject_str.to_string());
break;
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you have multiple subcrates that are possibly defined with ./prefix/ro-crate-metadata.json, this would also match these subcrates, but you only want to find the root crate in this function.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ro-crate spec also states that the root metadata descriptor must have @id ro-crate-metadata.json or ro-crate-metadata.jsonld for legacy crates.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This behavior is directly from the spec:

https://www.researchobject.org/ro-crate/specification/1.2/appendix/relative-uris.html#finding-ro-crate-root-in-rdf-triple-stores

but yes it can match multiple results (and some results that are not the root crate). We should maybe check if the entity matched is a subcrate entity if yes skip it. Btw. contains("ro-crate-metadata.json") also matches: "ro-crate-metadata.jsonld"

Comment on lines +29 to +31
/// Resolve relative IRIs against the provided base IRI.
/// This takes precedence over @base defined in the context.
WithBase(String),
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dont you want also the opposite where @base takes precendence and only if this cannot not be resolved you resolve to the specified one?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You are correct, will be fixed.

Comment on lines +817 to +829
properties.entry(property_name).or_default().push(value);
}
}

// Convert Vec<EntityValue> to EntityValue (single or array)
let mut result = HashMap::new();
for (key, values) in properties {
if values.len() == 1 {
result.insert(key, values.into_iter().next().unwrap());
} else if values.len() > 1 {
result.insert(key, EntityValue::EntityVec(values));
}
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same as above: Why not do this when using the entry api of the HashMap to skip looping again over your results?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The entry API will not really work here ?! I mean I want to iterate all elements from the list. But we could make it more neat using an iterator / filter_map statement.

Comment on lines +855 to +859
let compacted_id = if metadata_iri.contains("ro-crate-metadata.json") {
"ro-crate-metadata.json".to_string()
} else {
context.compact_iri(metadata_iri)
};
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will probably also match subcrates right?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a problem, this only occurs if the correct metadata descriptor has already been found, so this will not target subcrates !

Comment on lines +983 to +987
/// Internal implementation: Convert RDF triples to RoCrate using a ResolvedContext.
fn rdf_to_rocrate_with_context(
triples: Vec<Triple>,
context: ResolvedContext,
) -> Result<RoCrate, RdfError> {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not merge this and the rdf_graph_to_ro_crate fns together, if you use the wrapper fn only anyway?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two wrappers: rdf_graph_to_rocrate and rdf_to_rocrate both use this function (with different context)

Comment on lines +1116 to +1119
let base_iri = base
.map(|b| b.to_string())
.or_else(|| infer_base_from_metadata(&metadata_iri))
.unwrap_or_else(|| "http://example.org/".to_string());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is http://example.org really meant as a default fallback value or just a placeholder that never got replaced?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ups, yes this is leftover from a testing session -> fixed

Comment on lines +65 to +72
Self {
cache: HashMap::new(),
allow_remote: false,
client: reqwest::blocking::Client::builder()
.timeout(std::time::Duration::from_secs(30))
.build()
.expect("Failed to create HTTP client"),
}
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why initialize a client when allow_remote is false?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fair point, will make it optional and initialize it only if allow_remote is set to true. Thinking about this allow_remote should omit the boolean so allow_remote should just set it to true !

St4NNi pushed a commit that referenced this pull request Jan 14, 2026
microchange: Fix typo on "Research" !
@St4NNi St4NNi requested a review from lfbrehm January 15, 2026 13:47
@St4NNi
Copy link
Copy Markdown
Member Author

St4NNi commented Feb 23, 2026

will merge in separate PR into upstream!

@St4NNi St4NNi closed this Feb 23, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants