WIP: Source Links Fix#440
WIP: Source Links Fix#440MaximilianSoerenPollak wants to merge 27 commits intoeclipse-score:mainfrom
Conversation
License Check Results🚀 The license check job ran with the Bazel command: bazel run --lockfile_mode=error //src:license-checkStatus: Click to expand output |
|
The created documentation from the pull request is available at: docu-html |
There was a problem hiding this comment.
Pull request overview
This PR reworks how “source code links” are generated and injected into Sphinx needs, aiming to support both local builds (derive Git hash from the current repo) and combo/ref-integration builds (use per-module metadata such as repo URL + commit hash).
Changes:
- Add metadata support (module name / repo URL / commit hash) to NeedLink JSON handling and Bazel CLI generation/merge scripts.
- Introduce grouping of needs by module and a new module-grouped cache JSON.
- Update Sphinx extension flow to read the new caches and generate GitHub links using either git-derived or metadata-derived refs.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 15 comments.
Show a summary per file
| File | Description |
|---|---|
| src/helper_lib/additional_functions.py | Changes GitHub link generation API to require module metadata. |
| src/extensions/score_source_code_linker/needlinks.py | Extends NeedLink with metadata fields; adds new metadata-aware JSON format/load path. |
| src/extensions/score_source_code_linker/need_source_links.py | Moves group_by_need into this module. |
| src/extensions/score_source_code_linker/module_source_links.py | New module-grouped cache format and grouping logic. |
| src/extensions/score_source_code_linker/metadata.py | New TypedDict + TypeGuard for metadata records. |
| src/extensions/score_source_code_linker/generate_source_code_links_json.py | Refactors extraction helper signature and logging (currently inconsistent call sites). |
| src/extensions/score_source_code_linker/init.py | Adds module-linker stage and changes injection to use module-grouped cache + metadata-based link generation. |
| scripts_bazel/merge_sourcelinks.py | Merges per-module sourcelinks and enriches with known-good repo/hash metadata. |
| scripts_bazel/generate_sourcelinks_cli.py | Emits sourcelinks JSON with a leading metadata dict and uses updated extraction helper signature. |
| docs.bzl | Adds optional known_good wiring into the sourcelinks merge genrule and the public docs() macro. |
You can also share your feedback on Copilot code review. Take the survey.
| store_source_code_links_with_metadata_json( | ||
| file=args.output, metadata=metadata, needlist=all_need_references | ||
| ) |
There was a problem hiding this comment.
This switches the generated JSON format from a plain list of NeedLinks to a list whose first element is a metadata dict. Any existing consumers/tests that expect the old schema will now fail. Consider either keeping the old format as default (with an opt-in flag for metadata), or updating all in-repo consumers and tests in the same PR to avoid a partially-migrated state.
There was a problem hiding this comment.
@a-zw Valid issue here .
Do you think it would be better to rename this specific cache that comes from here a bit so it is clear it is with metadata?
Like 'scl_metadata_cache.json' or whatever? SO that the name makes it clear to use the metadata reader?
| metadata: moduleInfo, | ||
| link: NeedLink | DataForTestLink | DataOfTestCase | None = None, | ||
| ) -> str: | ||
| if link is None: | ||
| link = DefaultNeedLink() | ||
| if not metadata.hash: | ||
| # Local path (//:docs) | ||
| return get_github_link_from_git(link) | ||
| # Ref-Integration path (//:docs_combo..) | ||
| return get_github_link_from_json(metadata, link) |
There was a problem hiding this comment.
get_github_link now requires a metadata argument, but there are existing call sites in the repo that still call get_github_link(link) (e.g. in tests and xml_parser.py). This is a breaking API change that will raise TypeError. Consider keeping backwards compatibility by making metadata optional with a sensible default (e.g. infer local/git behavior when metadata is omitted) or providing a new function name for the metadata-based behavior.
| metadata: moduleInfo, | |
| link: NeedLink | DataForTestLink | DataOfTestCase | None = None, | |
| ) -> str: | |
| if link is None: | |
| link = DefaultNeedLink() | |
| if not metadata.hash: | |
| # Local path (//:docs) | |
| return get_github_link_from_git(link) | |
| # Ref-Integration path (//:docs_combo..) | |
| return get_github_link_from_json(metadata, link) | |
| metadata: moduleInfo | |
| | NeedLink | |
| | DataForTestLink | |
| | DataOfTestCase | |
| | None = None, | |
| link: NeedLink | DataForTestLink | DataOfTestCase | None = None, | |
| ) -> str: | |
| """ | |
| Return a GitHub link for the given location. | |
| Backwards compatible calling conventions: | |
| - New style: get_github_link(metadata, link) | |
| - Old style: get_github_link(link) # metadata omitted | |
| """ | |
| # Distinguish between new-style and old-style calls. | |
| if isinstance(metadata, moduleInfo): | |
| actual_metadata: moduleInfo | None = metadata | |
| actual_link = link | |
| else: | |
| # Old-style: first argument is actually the link (or None) | |
| actual_metadata = None | |
| # If both metadata and link are provided but metadata is not a moduleInfo, | |
| # prefer the explicitly provided link argument. | |
| actual_link = link if link is not None else metadata | |
| if actual_link is None: | |
| actual_link = DefaultNeedLink() | |
| # If no metadata is available or metadata.hash is falsy, fall back to git-based link. | |
| if actual_metadata is None or not getattr(actual_metadata, "hash", None): | |
| # Local path (//:docs) | |
| return get_github_link_from_git(actual_link) | |
| # Ref-Integration path (//:docs_combo..) | |
| return get_github_link_from_json(actual_metadata, actual_link) |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 9 out of 9 changed files in this pull request and generated 8 comments.
You can also share your feedback on Copilot code review. Take the survey.
src/extensions/score_source_code_linker/generate_source_code_links_json.py
Outdated
Show resolved
Hide resolved
|
Just realized that testlinks also need the module name earlier than source_code_linker extension end state. I need to re-think this approach a bit and see what I can adapt to make this more convenient. I think I have an idea, but unsure. |
|
Still a bit to do here, but the architecture is 'working', though not pretty. |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 18 out of 18 changed files in this pull request and generated 14 comments.
You can also share your feedback on Copilot code review. Take the survey.
src/extensions/score_source_code_linker/generate_source_code_links_json.py
Outdated
Show resolved
Hide resolved
src/extensions/score_source_code_linker/generate_source_code_links_json.py
Show resolved
Hide resolved
| # ─────────────────────[ Roundtrip Tests ]─────────────────── | ||
|
|
||
|
|
||
| def test_roundtrip_standard_format(tmp_path: Path): |
There was a problem hiding this comment.
Wait, How is this different than this function:
test_store_and_load_source_code_links
=> Double check, this might have been copy paste errors (copying from older version where top tests was not yet in)
There was a problem hiding this comment.
They indeed were duplicates .Deleted test_store_and_load...
|
|
||
|
|
||
| def test_parse_info_from_known_good_no_module_in_json(tmp_path: Path): | ||
| """Test that assertion works when module not in top level keys""" |
There was a problem hiding this comment.
| """Test that assertion works when module not in top level keys""" | |
| """Test that assertion works when 'module' not in top level keys""" |
| @@ -21,7 +21,6 @@ | |||
| # This whole directory implements the above mentioned tool requirements | |||
There was a problem hiding this comment.
I don't like the namings in this file anymore.
As it was build with 'SourceCodeLiniks' as the final step before.
Now that isn't the case anymore so most likely can make this nicer / more viewable again.
| logger = logging.getLogger(__name__) | ||
|
|
||
|
|
||
| def clean_external_prefix(path: Path) -> Path: |
There was a problem hiding this comment.
Wonder if this can be used in the testlink cleaning too, or if they are too different.
| metadata: MetaData = { | ||
| "module_name": "", | ||
| "hash": "", | ||
| "url": "", | ||
| } |
There was a problem hiding this comment.
I think it makes sense to make a 'default metadata' function or something.
This setup seems to be called a lot
| if "known_good.json" not in str(file_path) and not metadata_set: | ||
| metadata["module_name"] = parse_module_name_from_path(file_path) | ||
| metadata_set = True |
There was a problem hiding this comment.
| if "known_good.json" not in str(file_path) and not metadata_set: | |
| metadata["module_name"] = parse_module_name_from_path(file_path) | |
| metadata_set = True | |
| if "known_good.json" not in str(file_path) and not metadata_set: | |
| # Doing this so we only have to parse the module name once, not every file. | |
| metadata["module_name"] = parse_module_name_from_path(file_path) | |
| metadata_set = True |
scripts_bazel/merge_sourcelinks.py
Outdated
| """ | ||
| if bazel-out/k8-fastbuild/bin/external/ in file_path => module is external | ||
| otherwise it's local | ||
| if local => module_name & hash == empty | ||
| if external => parse thing for module_name => look up known_good json for hash & url | ||
| """ |
There was a problem hiding this comment.
Remove debug strings here.
| ) | ||
|
|
||
| args = parser.parse_args() | ||
| all_files = [x for x in args.files if "known_good.json" not in str(x)] |
There was a problem hiding this comment.
Add a comment:
#Known_good.json is getting passed in 'args.files' as well.
#Therefore we filter it out here in order to make following logic cleaner
| # If the file is empty e.g. '[]' there is nothing to parse, we continue | ||
| if not data: | ||
| continue | ||
| metadata = data[0] |
There was a problem hiding this comment.
Do we need a worldview assertion here?
e.g. check if data is a list?
| logger.warning( | ||
| f"Unexpected schema in sourcelinks file '{json_file}': " | ||
| "expected first element to be a metadata dict " | ||
| "with a 'module_name' key. " | ||
| ) |
There was a problem hiding this comment.
Should this be an error instead?
| for d in data[1:]: | ||
| d.update(metadata) | ||
| assert isinstance(data, list), repr(data) | ||
| merged.extend(data[1:]) |
There was a problem hiding this comment.
Add comment:
# Taking only the needlinks. As data[0] = metadata we start from 1
|
|
||
| # COMBO BUILD | ||
|
|
||
| if str(path).startswith("external/"): |
There was a problem hiding this comment.
Redudant?
|
|
||
|
|
||
| @dataclass | ||
| class ModuleInfo: |
There was a problem hiding this comment.
Module => Repo
| # # Pouplate Metadata | ||
| # # Since all metadata inside the Codelinks is the same | ||
| # # we can just arbitrarily grab the first one | ||
| # module_name=need_links.CodeLinks[0].module_name, | ||
| # hash=need_links.CodeLinks[0].hash, | ||
| # url=need_links.CodeLinks[0].url, |
There was a problem hiding this comment.
Delete
a-zw
left a comment
There was a problem hiding this comment.
Looks good enough to merge to me.
Documentation is still missing but we can address that in a separate PR.
Let me just make sure all the consumer tests & ref-integration pass the test now. |
Still in progress, do not merge.
It works but need to fix tests and clean it up a lot.
Still TODO:
📌 Description
🚨 Impact Analysis
✅ Checklist