feat: Support $ref to load from external JSON/YAML file#105
Conversation
Co-authored-by: njzjz <9496702+njzjz@users.noreply.github.com>
$ref to load from external JSON/YAML file
Co-authored-by: njzjz <9496702+njzjz@users.noreply.github.com>
|
pre-commit.ci autofix |
for more information, see https://pre-commit.ci
Signed-off-by: Jinzhe Zeng <jinzhe.zeng@ustc.edu.cn>
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #105 +/- ##
==========================================
+ Coverage 74.22% 75.14% +0.91%
==========================================
Files 9 9
Lines 811 853 +42
==========================================
+ Hits 602 641 +39
- Misses 209 212 +3 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
njzjz-bot
left a comment
There was a problem hiding this comment.
Thanks for implementing $ref support! However, there is a security concern that should be addressed before merging.
Security Issue: Arbitrary File Read
The current implementation allows reading arbitrary files from the filesystem:
def _load_ref(ref_path: str) -> dict:
with open(ref_path) as f: # No path validation!
return json.load(f)An attacker who can control the input JSON (e.g., from untrusted user input) could read sensitive files:
{"$ref": "/home/user/.config/credentials.json"}Suggested Fix: Add allow_ref Parameter
I recommend adding an allow_ref parameter (defaulting to False) to explicitly enable this feature. This follows the principle of secure by default.
The parameter would need to be added to:
Argument.check(),Argument.normalize(),Argument.check_value(),Argument.normalize_value()Argument._traverse_sub()and_resolve_ref()- Potentially
Variantmethods as well
Example implementation:
def _resolve_ref(d: dict, allow_ref: bool = False) -> None:
if "$ref" not in d:
return
if not allow_ref:
raise ValueError(
"$ref is not allowed by default. "
"Pass allow_ref=True to enable loading from external files."
)
ref_path = d.pop("$ref")
# ... rest of the implementationThis way, existing code remains secure, and users who need $ref can explicitly opt-in.
Would you be open to adding this safeguard?
@copilot Please also scan other methods that call these methods. |
Co-authored-by: njzjz <9496702+njzjz@users.noreply.github.com>
Done in 12d1d49. Added
Users must now explicitly opt in with |
|
@copilot you may also need to add this argument to CLI and notebook. |
Co-authored-by: njzjz <9496702+njzjz@users.noreply.github.com>
Done in 782d6e8. CLI ( Notebook ( |
|
pre-commit.ci autofix |
for more information, see https://pre-commit.ci
There was a problem hiding this comment.
Pull request overview
Adds opt-in support for resolving "$ref" keys by loading and merging external JSON/YAML files before validation/normalization.
Changes:
- Implemented
$refloading/resolution helpers and threadedallow_refthrough traversal, public APIs, CLI, and notebook rendering. - Added YAML optional dependency via a new
[yaml]extra and extended type-check workflow extras. - Added documentation and new unit tests for secure-by-default behavior and supported formats.
Reviewed changes
Copilot reviewed 10 out of 10 changed files in this pull request and generated 8 comments.
Show a summary per file
| File | Description |
|---|---|
dargs/dargs.py |
Core $ref loader/resolver and allow_ref plumbing through traversal/check/normalize. |
dargs/check.py |
Adds allow_ref parameter and passes it through normalize/check helpers. |
dargs/cli.py |
Adds --allow-ref flag and threads it into CLI check flow. |
dargs/notebook.py |
Adds allow_ref to notebook helpers and resolves $ref during rendering. |
pyproject.toml |
Introduces [yaml] extra (pyyaml) and includes it in [test]. |
.github/workflows/type-check.yml |
Adds yaml extra to type-check install set. |
tests/test_ref.py |
New tests for $ref resolution across check/normalize/value APIs and formats. |
docs/ref.md |
New documentation page for $ref feature and allow_ref=True. |
docs/index.rst |
Adds ref page to Sphinx toctree. |
README.md |
Mentions $ref feature in additional features list. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| f"key `{path[-1]}` gets wrong value type, " | ||
| f"requires dict but {type(value).__name__} is given", | ||
| ) | ||
| _resolve_ref(value, allow_ref) |
There was a problem hiding this comment.
Argument.check() / check_value() now mutate user-provided data when allow_ref=True because _resolve_ref() runs inside traversal and modifies dicts in place (pops $ref, merges loaded keys). Checking is typically expected to be side-effect free; consider resolving refs on a copy for check/check_value (e.g., deepcopy at entry when allow_ref=True) or adding an inplace/mutate option to _resolve_ref so validation can use a merged view without modifying the input.
dargs/dargs.py
Outdated
| ext = os.path.splitext(ref_path)[1].lower() | ||
| if ext == ".json": | ||
| with open(ref_path) as f: | ||
| return json.load(f) | ||
| elif ext in (".yml", ".yaml"): | ||
| try: | ||
| import yaml | ||
| except ImportError as e: | ||
| raise ImportError( | ||
| "pyyaml is required to load YAML files referenced by $ref. " | ||
| "Install it with: pip install pyyaml" | ||
| ) from e | ||
| with open(ref_path) as f: | ||
| return yaml.safe_load(f) |
There was a problem hiding this comment.
_load_ref() is documented to return a dict, but json.load() / yaml.safe_load() can legally return non-dicts (e.g., list/scalar) or None (empty YAML). This will raise at merge time ({**loaded, **d}) with a TypeError instead of a clear error. Validate the loaded value is a dict and raise a ValueError with a helpful message when the referenced file doesn't contain a mapping at the top level.
dargs/dargs.py
Outdated
| loaded = _load_ref(ref_path) | ||
| # Merge: loaded content as base, local keys take precedence | ||
| merged = {**loaded, **d} |
There was a problem hiding this comment.
_load_ref() is documented to return a dict, but json.load() / yaml.safe_load() can legally return non-dicts (e.g., list/scalar) or None (empty YAML). This will raise at merge time ({**loaded, **d}) with a TypeError instead of a clear error. Validate the loaded value is a dict and raise a ValueError with a helpful message when the referenced file doesn't contain a mapping at the top level.
| ) | ||
|
|
||
|
|
||
| def _resolve_ref(d: dict, allow_ref: bool = False) -> None: |
There was a problem hiding this comment.
_resolve_ref() only resolves a single $ref occurrence and does not re-run if the loaded content itself contains $ref. This can leave $ref keys in the final merged dict, which can break strict-mode checks (undefined key) and prevents ref-chaining. Consider resolving in a loop (while \"$ref\" in d) and add cycle detection (e.g., track visited ref paths) to avoid infinite self-references.
dargs/dargs.py
Outdated
| ref_path = d.pop("$ref") | ||
| loaded = _load_ref(ref_path) | ||
| # Merge: loaded content as base, local keys take precedence | ||
| merged = {**loaded, **d} | ||
| d.clear() | ||
| d.update(merged) | ||
|
|
||
|
|
There was a problem hiding this comment.
_resolve_ref() only resolves a single $ref occurrence and does not re-run if the loaded content itself contains $ref. This can leave $ref keys in the final merged dict, which can break strict-mode checks (undefined key) and prevents ref-chaining. Consider resolving in a loop (while \"$ref\" in d) and add cycle detection (e.g., track visited ref paths) to avoid infinite self-references.
| ref_path = d.pop("$ref") | |
| loaded = _load_ref(ref_path) | |
| # Merge: loaded content as base, local keys take precedence | |
| merged = {**loaded, **d} | |
| d.clear() | |
| d.update(merged) | |
| # Track visited refs to avoid infinite loops in case of cyclic references. | |
| visited_refs: set[str] = set() | |
| # Resolve chained `$ref`s until no more are present at the top level. | |
| while "$ref" in d: | |
| ref_path = d.pop("$ref") | |
| if ref_path in visited_refs: | |
| raise ValueError(f"Cyclic $ref detected for path: {ref_path!r}") | |
| visited_refs.add(ref_path) | |
| loaded = _load_ref(ref_path) | |
| if not isinstance(loaded, dict): | |
| raise ValueError( | |
| f"Referenced file {ref_path!r} must contain a mapping/object." | |
| ) | |
| # Merge: loaded content as base, local keys take precedence | |
| merged = {**loaded, **d} | |
| d.clear() | |
| d.update(merged) |
dargs/dargs.py
Outdated
| """ | ||
| ext = os.path.splitext(ref_path)[1].lower() | ||
| if ext == ".json": | ||
| with open(ref_path) as f: |
There was a problem hiding this comment.
File reads in _load_ref() rely on platform default encoding. To make behavior consistent across environments (especially for YAML), open the referenced files with an explicit encoding (commonly encoding=\"utf-8\").
dargs/dargs.py
Outdated
| "pyyaml is required to load YAML files referenced by $ref. " | ||
| "Install it with: pip install pyyaml" | ||
| ) from e | ||
| with open(ref_path) as f: |
There was a problem hiding this comment.
File reads in _load_ref() rely on platform default encoding. To make behavior consistent across environments (especially for YAML), open the referenced files with an explicit encoding (commonly encoding=\"utf-8\").
dargs/notebook.py
Outdated
| and isinstance(self.arg, Argument) | ||
| and not (self.arg.repeat and not self.repeat) | ||
| ): | ||
| _resolve_ref(self.data, self.allow_ref) |
There was a problem hiding this comment.
ArgumentData resolves $ref by mutating self.data in place during notebook rendering. For a display/helper API (JSON() / print_html()), mutating the caller’s original data can be surprising. Consider copying the dict before resolving refs (or resolving into a derived structure) so rendering has no side effects.
|
@copilot apply changes based on the comments in this thread |
…ection, no mutation Co-authored-by: njzjz <9496702+njzjz@users.noreply.github.com>
Done in 193d0f9. Applied all review comments:
New tests added for: no-mutation of caller data, non-dict file content, cyclic ref detection, and chained |
|
pre-commit.ci autofix |
for more information, see https://pre-commit.ci
$ref to load from external JSON/YAML file$ref to load from external JSON/YAML file
Adds support for loading dict values from external JSON or YAML files via a
"$ref"key. Before validation or normalization, dargs resolves any"$ref"key by loading the referenced file and merging its contents into the dict, with local keys taking precedence over loaded ones. Chained$refvalues (where a loaded file itself contains$ref) are fully resolved, with cycle detection to prevent infinite loops.Loading from external files is disabled by default (secure by default). Users must explicitly opt in by passing
allow_ref=Trueto the relevant method.Changes Made
dargs/dargs.py: Added_load_refand_resolve_refhelper functions._load_refopens files with explicitencoding="utf-8"and validates the loaded value is adict, raising a clearValueErrorfor non-mapping content._resolve_refraisesValueErrorwhen$refis found butallow_ref=False; resolves chained$refvalues in a loop with cycle detection.checkandcheck_valuedeepcopy input whenallow_ref=Trueso the caller's data is never mutated.allow_refparameter is threaded through_traverse_sub,traverse_value,traverse, and all public methods (check,check_value,normalize,normalize_value).dargs/check.py: Addedallow_ref=Falseparameter to thecheckhelper, passed through tonormalize_valueandcheck_value.dargs/cli.py: Added--allow-refflag to thedargs checkCLI subcommand.dargs/notebook.py: Addedallow_ref=FalsetoJSON(),print_html(), andArgumentData; resolves$refon a copy of the data in_init_subdatato avoid side effects on the caller's data.pyproject.toml: Addedpyyamlas an optional dependency under a new[yaml]extra; also added to[test]extra.tests/test_ref.py: New tests covering secure-by-default behavior, JSON loading, YAML loading (.yml/.yaml), local key override, unsupported file extension, non-dict file content, cyclic ref detection, chained ref resolution, no-mutation of caller data incheck/check_value, andnormalize_value.docs/ref.md: New dedicated documentation page explaining the$reffeature,allow_ref=Truerequirement, supported formats, and a usage example.docs/index.rst: Addedrefto the Sphinx toctree.README.md: Replaced the full$refsection with a single bullet point in "Additional features" linking to the docs.Supported File Formats
.json.yaml,.ymlpip install pyyamlorpip install dargs[yaml]Usage
CLI:
Original prompt
$refto load from external JSON file #104✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.