fix(deploy-on-aws): resolve file paths to prevent path traversal in hook scripts by scottschreckengaust · Pull Request #120 · awslabs/agent-plugins

scottschreckengaust · 2026-04-07T07:15:32Z

Summary

Canonicalize file_path with Path.resolve() before any file operations in post_process_drawio.py and validate_drawio.py to prevent path traversal sequences (e.g., ../../etc/passwd.drawio) in hook input from escaping the expected directory
Move symlink check before resolve() — resolve() follows symlinks, so is_symlink() on the resolved path would always be False
Use resolved path object for all post-resolve file I/O (never reconstruct from the tainted file_path string)
Exit non-zero (1) when refusing symlink .drawio files — a symlink is a suspicious/unusual condition (potential attack), not an expected no-op like "not a drawio file". The hook harness (validate-drawio.sh) uses || true so this is safe there, but CLI callers and CI get an honest signal that the file was refused
Fix nosemgrep inline comments to use full rule IDs (truncated IDs were silently ignored by semgrep)
Addresses Semgrep finding ai.ai-best-practices.hooks-path-traversal

Exit code behavior

Condition	Exit code	Rationale
Not a `.drawio` file	0	Expected no-op — the PostToolUse hook fires on every `Edit`/`Write` call, so non-drawio files are the 99.9% case. Exit 0 means "not my file, nothing to do" which is the standard hook contract for irrelevant files.
Symlink `.drawio` file	1	Security rejection — a symlink drawio file is unusual/suspicious (potential attack vector), qualitatively different from "not my file." CLI callers and CI get an honest signal; the hook harness absorbs it via `\|\| true`.
File too large	0	Policy skip — not an error in the file itself
Parse error	1	Actual processing failure

Test plan

Verify normal .drawio validation still works with both relative and absolute paths
Verify symlink rejection exits non-zero: ln -s /tmp/target.drawio test.drawio && python3 post_process_drawio.py test.drawio; echo $? → should print 1
Run mise run build — passes cleanly

Generated with Claude Code

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.

…ook scripts Canonicalize file_path with Path.resolve() before any file operations in post_process_drawio.py and validate_drawio.py. This prevents path traversal sequences (e.g., "../../etc/passwd.drawio") in hook input from escaping the expected directory. Key ordering: symlink check runs on the original unresolved path (before resolve(), which follows symlinks and would make is_symlink() always False), then resolve() canonicalizes, then all subsequent file I/O uses only the resolved `path` object — never the raw `file_path`. Addresses Semgrep finding: ai.ai-best-practices.hooks-path-traversal Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Canonicalize file_path with Path.resolve() before file operations in post_process_drawio.py and validate_drawio.py to prevent path traversal (CWE-22) from hook input - Move symlink check before resolve() (resolve follows symlinks, so is_symlink() on resolved path is always False) - Use resolved `path` object for all post-resolve file I/O - Fix nosemgrep inline comments to use full rule IDs (truncated IDs were silently ignored by semgrep) Addresses Semgrep: ai.ai-best-practices.hooks-path-traversal Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

A symlink .drawio file is a suspicious/unusual condition (potential attack), not an expected no-op like "not a drawio file." Exit 1 gives CLI callers and CI an honest signal that the file was refused. The hook harness (validate-drawio.sh) uses || true so this is safe there. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* add diagram skill * fix: resolve CI failures — use defusedxml for safe XML parsing, fix markdown lint Replace xml.etree.ElementTree with defusedxml.ElementTree in all Python scripts to address Bandit B314/B405 and Semgrep XXE findings. Add nosec suppression to subprocess import in drawio_url.py (browser URL opener only). Fix 106 markdownlint errors (MD022/MD031/MD032 blank lines, MD001 heading levels) across 7 reference files and SKILL.md. Trim SKILL.md from 515 to 495 lines to satisfy the 500-line SKILL001 limit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add architecture diagram skill * fix: split SKILL.md into references to pass skill-length lint rule Move detailed XML generation rules and layout guidelines from SKILL.md (494 lines) into references/xml-generation-rules.md and references/layout-guidelines.md. SKILL.md is now 219 lines, well under the 300-line warning threshold enforced by SKILL001. * fix: apply dprint formatting to pass fmt:check Run dprint fmt on 7 files (JSON and Markdown) that were not conforming to the project's formatting rules. * fix(diagram-skill): remove defusedxml dependency, fix code quality issues - Replace defusedxml with stdlib xml.etree.ElementTree in all 5 Python scripts and validate-drawio.sh (XML is self-generated, no XXE risk) - Remove pip3 auto-install from PostToolUse hook - Remove unused import sys in fix_step_badges.py - Remove unused style variable in post_process_drawio.py fix_legend_size - Add explanatory comment to except block in post_process_drawio.py * refactor(diagram-skill): consolidate references, delete outdated examples Delete 5 files (-2039 lines): - example-diagrams.md (803 lines): outdated XML examples conflicting with .drawio refs - aws-diagram-guidelines.md (536 lines): internal wiki copy, all rules in style-guide.md - xml-structure.md (209 lines): merged unique parts into group-styles.md + layout-guidelines.md - xml-generation-rules.md (186 lines): trimmed and renamed to xml-rules.md (89 lines) - aws4-shapes.md (231 lines): split into aws4-shapes-services.md (69) + aws4-shapes-resources.md (100) All reference files now under 100-line DESIGN_GUIDELINES limit. All shape names preserved in the split files (zero shape deletions). All cross-references updated to point to new file names. * refactor(diagram-skill): restructure SKILL.md for progressive disclosure - Remove eager "Study" paragraph that loaded ~135k tokens upfront - Move reference loading into Step 3 where references are actually needed - Add example selection table: agent picks 1-2 most relevant .drawio examples per diagram type (reduces ~95k to ~10-18k tokens) - Update cross-references to renamed files (xml-rules.md, aws4-shapes-services.md) - SKILL.md: 226 lines (under 300 limit) * fix(diagram-skill): remove white background and edge label backgrounds for dark mode - Remove background="#FFFFFF" from mxGraphModel XML template in SKILL.md - Change rule from "ALWAYS set background=#FFFFFF" to "NEVER set background" - Change edge label rule from labelBackgroundColor=#ffffff to labelBackgroundColor=none in layout-guidelines.md and xml-rules.md - Both white backgrounds break dark mode adaptive contrast * feat(diagram-skill): support non-AWS architecture diagrams - New references/general-icons.md: maps 40+ non-AWS technologies (PostgreSQL, Docker, Kubernetes, CoreML, HuggingFace, GitHub, etc.) to draw.io shapes with category-appropriate tint colors - Mode A codebase analysis now detects non-AWS tech when no AWS infrastructure files are found (Dockerfiles, DB configs, ML frameworks) - Mixed architectures supported: AWS icons for AWS services, general icons for non-AWS, same layout rules for both - Step 3 loads general-icons.md only when non-AWS services are present * fix(diagram-skill): restore defusedxml, add README, fix markdown lint - Restore defusedxml.ElementTree imports (CI scanners require it) - Replace silent pip3 auto-install with informative error message when defusedxml is missing — no more modifying user's environment - Add scripts/requirements.txt declaring defusedxml>=0.7.1 - Create plugin README.md per DESIGN_GUIDELINES (prerequisites, skills, MCP servers, installation, examples) - Document defusedxml dependency in post-processing.md - Fix 20 markdownlint errors (MD022, MD029, MD031, MD032, MD060) - Apply dprint formatting to table separators * fix(diagram-skill): apply dprint formatting to group-styles.md table * fix(diagram-skill): align validator with skill rules, fix template and post-processor bugs - Reject hardcoded background attribute on mxGraphModel (breaks dark mode) - Reject bare <mxGraphModel> without <mxfile> wrapper per skill contract - Lower font size threshold from 11px to 10px to match skill's font hierarchy - Remove stale fontColor=#232F3E warning (redundant with style guide) - Fix legend-panel parent ID typo in xml-templates-structure.md - Fix .upper() bug in badge skip condition (dead code) - Replace blunt "Do NOT read .drawio" rule with Reference Priority section - Remove duplicate developer_tools entry in aws4-shapes.json that shadowed xray * fix(diagram-skill): harden scripts against injection, error suppression, and resource exhaustion - Replace JSON escaping fallback with safe static message (CWE-838) - Use importlib.util for sibling imports instead of sys.path manipulation (CWE-426) - Fix defusedxml ET.indent() bug by using stdlib for write-time formatting - Replace bare except with specific exception types in hook JSON parser (CWE-396) - Add 2MB file size guard to URL generator before reading into memory (CWE-400) * fix(diagram-skill): add nosec/nosemgrep suppression for stdlib indent() import The xml.etree.ElementTree import is only used for indent() (write-time formatting), not XML parsing. All parsing uses defusedxml. Add inline suppression for Bandit B405 and Semgrep XXE rules to match the existing pattern in drawio_url.py. * fix(diagram-skill): use bare nosemgrep suppression for stdlib indent() import The specific rule ID nosemgrep didn't match because Semgrep uses a composite rule ID. Bare # nosemgrep suppresses all rules on the line. The justification is in the comment: only used for indent(), not parsing. * fix(diagram-skill): use full semgrep rule IDs for nosemgrep suppression * fix(diagram-skill): remove stdlib xml.etree.ElementTree import entirely Drop the indent() call and StdET import that triggered Bandit B405 and Semgrep XXE rules. XML indentation is cosmetic — draw.io and agents handle non-indented XML fine. Added a comment noting that a custom indent helper can be added later if human-readable XML is needed. * fix(diagram-skill): sanitize XML attributes in validator error messages Prevent prompt injection via crafted draw.io cell attributes that flow through the PostToolUse hook into agent systemMessage context. Adds _sanitize_attr() which strips control characters, collapses whitespace, restricts to safe characters, and truncates at 80 chars. Applied to all user-controlled attribute values in error/warning messages. * fix(deploy-on-aws): harden diagram scripts against DoS, symlink writes, and info leakage (#119) Follow-on security hardening for the architecture diagram skill (PR#103): - Add 2MB file size checks before XML parsing in validate_drawio.py and post_process_drawio.py (matching existing drawio_url.py guard) - Add iterparse-based XML depth (50) and element count (50,000) pre-flight check to prevent C-stack overflow from deeply nested files - Reject symlinks in post_process_drawio.py before write-back to prevent symlink-follow write attacks - Wrap fixer pipelines in top-level try/except to prevent unhandled exception tracebacks from leaking file paths and source code into the hook systemMessage (stderr captured via 2>&1) - Add timeout wrappers (10s/5s) around all Python subprocess calls in validate-drawio.sh to prevent runaway processes from blocking the hook Co-authored-by: Scott Schreckengaust <345885+scottschreckengaust@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * fix(deploy-on-aws): resolve file paths to prevent path traversal in hook scripts (#120) * fix(deploy-on-aws): resolve file paths to prevent path traversal in hook scripts Canonicalize file_path with Path.resolve() before any file operations in post_process_drawio.py and validate_drawio.py. This prevents path traversal sequences (e.g., "../../etc/passwd.drawio") in hook input from escaping the expected directory. Key ordering: symlink check runs on the original unresolved path (before resolve(), which follows symlinks and would make is_symlink() always False), then resolve() canonicalizes, then all subsequent file I/O uses only the resolved `path` object — never the raw `file_path`. Addresses Semgrep finding: ai.ai-best-practices.hooks-path-traversal Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(deploy-on-aws): resolve file paths and fix nosemgrep rule IDs - Canonicalize file_path with Path.resolve() before file operations in post_process_drawio.py and validate_drawio.py to prevent path traversal (CWE-22) from hook input - Move symlink check before resolve() (resolve follows symlinks, so is_symlink() on resolved path is always False) - Use resolved `path` object for all post-resolve file I/O - Fix nosemgrep inline comments to use full rule IDs (truncated IDs were silently ignored by semgrep) Addresses Semgrep: ai.ai-best-practices.hooks-path-traversal Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(deploy-on-aws): exit non-zero when refusing symlink drawio files A symlink .drawio file is a suspicious/unusual condition (potential attack), not an expected no-op like "not a drawio file." Exit 1 gives CLI callers and CI an honest signal that the file was refused. The hook harness (validate-drawio.sh) uses || true so this is safe there. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Scott Schreckengaust <345885+scottschreckengaust@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * fix(deploy-on-aws): use os.path.realpath() for semgrep-recognized taint sanitization Replace Path.resolve() with os.path.realpath() for path canonicalization. Both are functionally identical, but semgrep's hooks-path-traversal rule only recognizes os.path.realpath() as a taint sanitizer. This eliminates the CI semgrep finding without needing nosemgrep inline suppressions. Also replaces Path.is_symlink() with os.path.islink() for consistency. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(deploy-on-aws): sync Codex plugin manifest with v1.2.0 diagram skill changes * Update plugins/deploy-on-aws/skills/aws-architecture-diagram/SKILL.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: mayakost <130111194+mayakost@users.noreply.github.com> --------- Signed-off-by: mayakost <130111194+mayakost@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Sam Castro <scoropeza@gmail.com> Co-authored-by: Laith Al-Saadoon <9553966+theagenticguy@users.noreply.github.com> Co-authored-by: Scott Schreckengaust <scottschreckengaust@users.noreply.github.com> Co-authored-by: Scott Schreckengaust <345885+scottschreckengaust@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

scottschreckengaust requested review from a team as code owners April 7, 2026 07:15

scottschreckengaust requested review from krokoko and theagenticguy April 7, 2026 07:15

scottschreckengaust force-pushed the fix/diagram-path-traversal branch from cc60c6b to eceb39a Compare April 7, 2026 07:22

scottschreckengaust requested a review from a team as a code owner April 7, 2026 07:29

scottschreckengaust force-pushed the fix/diagram-path-traversal branch from fb523ef to 5728d7d Compare April 7, 2026 07:33

krokoko approved these changes Apr 7, 2026

View reviewed changes

scottschreckengaust requested review from mayakost and scoropeza April 7, 2026 16:00

mayakost merged commit dbc12d6 into feat/diagram-skill Apr 7, 2026
11 checks passed

mayakost deleted the fix/diagram-path-traversal branch April 7, 2026 16:54

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(deploy-on-aws): resolve file paths to prevent path traversal in hook scripts#120

fix(deploy-on-aws): resolve file paths to prevent path traversal in hook scripts#120
mayakost merged 3 commits intofeat/diagram-skillfrom
fix/diagram-path-traversal

scottschreckengaust commented Apr 7, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

scottschreckengaust commented Apr 7, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Exit code behavior

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

scottschreckengaust commented Apr 7, 2026 •

edited

Loading