Skip to content

fix(deploy-on-aws): resolve file paths to prevent path traversal in hook scripts#120

Merged
mayakost merged 3 commits intofeat/diagram-skillfrom
fix/diagram-path-traversal
Apr 7, 2026
Merged

fix(deploy-on-aws): resolve file paths to prevent path traversal in hook scripts#120
mayakost merged 3 commits intofeat/diagram-skillfrom
fix/diagram-path-traversal

Conversation

@scottschreckengaust
Copy link
Copy Markdown
Member

@scottschreckengaust scottschreckengaust commented Apr 7, 2026

Summary

  • Canonicalize file_path with Path.resolve() before any file operations in post_process_drawio.py and validate_drawio.py to prevent path traversal sequences (e.g., ../../etc/passwd.drawio) in hook input from escaping the expected directory
  • Move symlink check before resolve()resolve() follows symlinks, so is_symlink() on the resolved path would always be False
  • Use resolved path object for all post-resolve file I/O (never reconstruct from the tainted file_path string)
  • Exit non-zero (1) when refusing symlink .drawio files — a symlink is a suspicious/unusual condition (potential attack), not an expected no-op like "not a drawio file". The hook harness (validate-drawio.sh) uses || true so this is safe there, but CLI callers and CI get an honest signal that the file was refused
  • Fix nosemgrep inline comments to use full rule IDs (truncated IDs were silently ignored by semgrep)
  • Addresses Semgrep finding ai.ai-best-practices.hooks-path-traversal

Exit code behavior

Condition Exit code Rationale
Not a .drawio file 0 Expected no-op — the PostToolUse hook fires on every Edit/Write call, so non-drawio files are the 99.9% case. Exit 0 means "not my file, nothing to do" which is the standard hook contract for irrelevant files.
Symlink .drawio file 1 Security rejection — a symlink drawio file is unusual/suspicious (potential attack vector), qualitatively different from "not my file." CLI callers and CI get an honest signal; the hook harness absorbs it via || true.
File too large 0 Policy skip — not an error in the file itself
Parse error 1 Actual processing failure

Test plan

  • Verify normal .drawio validation still works with both relative and absolute paths
  • Verify symlink rejection exits non-zero: ln -s /tmp/target.drawio test.drawio && python3 post_process_drawio.py test.drawio; echo $? → should print 1
  • Run mise run build — passes cleanly

Generated with Claude Code


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.

…ook scripts

Canonicalize file_path with Path.resolve() before any file operations
in post_process_drawio.py and validate_drawio.py. This prevents path
traversal sequences (e.g., "../../etc/passwd.drawio") in hook input
from escaping the expected directory.

Key ordering: symlink check runs on the original unresolved path
(before resolve(), which follows symlinks and would make is_symlink()
always False), then resolve() canonicalizes, then all subsequent file
I/O uses only the resolved `path` object — never the raw `file_path`.

Addresses Semgrep finding: ai.ai-best-practices.hooks-path-traversal

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@scottschreckengaust scottschreckengaust force-pushed the fix/diagram-path-traversal branch from cc60c6b to eceb39a Compare April 7, 2026 07:22
@scottschreckengaust scottschreckengaust requested a review from a team as a code owner April 7, 2026 07:29
- Canonicalize file_path with Path.resolve() before file operations
  in post_process_drawio.py and validate_drawio.py to prevent path
  traversal (CWE-22) from hook input
- Move symlink check before resolve() (resolve follows symlinks, so
  is_symlink() on resolved path is always False)
- Use resolved `path` object for all post-resolve file I/O
- Fix nosemgrep inline comments to use full rule IDs (truncated IDs
  were silently ignored by semgrep)

Addresses Semgrep: ai.ai-best-practices.hooks-path-traversal

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@scottschreckengaust scottschreckengaust force-pushed the fix/diagram-path-traversal branch from fb523ef to 5728d7d Compare April 7, 2026 07:33
A symlink .drawio file is a suspicious/unusual condition (potential
attack), not an expected no-op like "not a drawio file." Exit 1 gives
CLI callers and CI an honest signal that the file was refused. The
hook harness (validate-drawio.sh) uses || true so this is safe there.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@mayakost mayakost merged commit dbc12d6 into feat/diagram-skill Apr 7, 2026
11 checks passed
@mayakost mayakost deleted the fix/diagram-path-traversal branch April 7, 2026 16:54
github-merge-queue bot pushed a commit that referenced this pull request Apr 7, 2026
* add diagram skill

* fix: resolve CI failures — use defusedxml for safe XML parsing, fix markdown lint

Replace xml.etree.ElementTree with defusedxml.ElementTree in all Python
scripts to address Bandit B314/B405 and Semgrep XXE findings. Add nosec
suppression to subprocess import in drawio_url.py (browser URL opener only).
Fix 106 markdownlint errors (MD022/MD031/MD032 blank lines, MD001 heading
levels) across 7 reference files and SKILL.md. Trim SKILL.md from 515 to
495 lines to satisfy the 500-line SKILL001 limit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add architecture diagram skill

* fix: split SKILL.md into references to pass skill-length lint rule

Move detailed XML generation rules and layout guidelines from SKILL.md
(494 lines) into references/xml-generation-rules.md and
references/layout-guidelines.md. SKILL.md is now 219 lines, well under
the 300-line warning threshold enforced by SKILL001.

* fix: apply dprint formatting to pass fmt:check

Run dprint fmt on 7 files (JSON and Markdown) that were not
conforming to the project's formatting rules.

* fix(diagram-skill): remove defusedxml dependency, fix code quality issues

- Replace defusedxml with stdlib xml.etree.ElementTree in all 5 Python
  scripts and validate-drawio.sh (XML is self-generated, no XXE risk)
- Remove pip3 auto-install from PostToolUse hook
- Remove unused import sys in fix_step_badges.py
- Remove unused style variable in post_process_drawio.py fix_legend_size
- Add explanatory comment to except block in post_process_drawio.py

* refactor(diagram-skill): consolidate references, delete outdated examples

Delete 5 files (-2039 lines):
- example-diagrams.md (803 lines): outdated XML examples conflicting with .drawio refs
- aws-diagram-guidelines.md (536 lines): internal wiki copy, all rules in style-guide.md
- xml-structure.md (209 lines): merged unique parts into group-styles.md + layout-guidelines.md
- xml-generation-rules.md (186 lines): trimmed and renamed to xml-rules.md (89 lines)
- aws4-shapes.md (231 lines): split into aws4-shapes-services.md (69) + aws4-shapes-resources.md (100)

All reference files now under 100-line DESIGN_GUIDELINES limit.
All shape names preserved in the split files (zero shape deletions).
All cross-references updated to point to new file names.

* refactor(diagram-skill): restructure SKILL.md for progressive disclosure

- Remove eager "Study" paragraph that loaded ~135k tokens upfront
- Move reference loading into Step 3 where references are actually needed
- Add example selection table: agent picks 1-2 most relevant .drawio
  examples per diagram type (reduces ~95k to ~10-18k tokens)
- Update cross-references to renamed files (xml-rules.md,
  aws4-shapes-services.md)
- SKILL.md: 226 lines (under 300 limit)

* fix(diagram-skill): remove white background and edge label backgrounds for dark mode

- Remove background="#FFFFFF" from mxGraphModel XML template in SKILL.md
- Change rule from "ALWAYS set background=#FFFFFF" to "NEVER set background"
- Change edge label rule from labelBackgroundColor=#ffffff to
  labelBackgroundColor=none in layout-guidelines.md and xml-rules.md
- Both white backgrounds break dark mode adaptive contrast

* feat(diagram-skill): support non-AWS architecture diagrams

- New references/general-icons.md: maps 40+ non-AWS technologies
  (PostgreSQL, Docker, Kubernetes, CoreML, HuggingFace, GitHub, etc.)
  to draw.io shapes with category-appropriate tint colors
- Mode A codebase analysis now detects non-AWS tech when no AWS
  infrastructure files are found (Dockerfiles, DB configs, ML frameworks)
- Mixed architectures supported: AWS icons for AWS services, general
  icons for non-AWS, same layout rules for both
- Step 3 loads general-icons.md only when non-AWS services are present

* fix(diagram-skill): restore defusedxml, add README, fix markdown lint

- Restore defusedxml.ElementTree imports (CI scanners require it)
- Replace silent pip3 auto-install with informative error message
  when defusedxml is missing — no more modifying user's environment
- Add scripts/requirements.txt declaring defusedxml>=0.7.1
- Create plugin README.md per DESIGN_GUIDELINES (prerequisites,
  skills, MCP servers, installation, examples)
- Document defusedxml dependency in post-processing.md
- Fix 20 markdownlint errors (MD022, MD029, MD031, MD032, MD060)
- Apply dprint formatting to table separators

* fix(diagram-skill): apply dprint formatting to group-styles.md table

* fix(diagram-skill): align validator with skill rules, fix template and post-processor bugs

- Reject hardcoded background attribute on mxGraphModel (breaks dark mode)
- Reject bare <mxGraphModel> without <mxfile> wrapper per skill contract
- Lower font size threshold from 11px to 10px to match skill's font hierarchy
- Remove stale fontColor=#232F3E warning (redundant with style guide)
- Fix legend-panel parent ID typo in xml-templates-structure.md
- Fix .upper() bug in badge skip condition (dead code)
- Replace blunt "Do NOT read .drawio" rule with Reference Priority section
- Remove duplicate developer_tools entry in aws4-shapes.json that shadowed xray

* fix(diagram-skill): harden scripts against injection, error suppression, and resource exhaustion

- Replace JSON escaping fallback with safe static message (CWE-838)
- Use importlib.util for sibling imports instead of sys.path manipulation (CWE-426)
- Fix defusedxml ET.indent() bug by using stdlib for write-time formatting
- Replace bare except with specific exception types in hook JSON parser (CWE-396)
- Add 2MB file size guard to URL generator before reading into memory (CWE-400)

* fix(diagram-skill): add nosec/nosemgrep suppression for stdlib indent() import

The xml.etree.ElementTree import is only used for indent() (write-time
formatting), not XML parsing. All parsing uses defusedxml. Add inline
suppression for Bandit B405 and Semgrep XXE rules to match the existing
pattern in drawio_url.py.

* fix(diagram-skill): use bare nosemgrep suppression for stdlib indent() import

The specific rule ID nosemgrep didn't match because Semgrep uses a
composite rule ID. Bare # nosemgrep suppresses all rules on the line.
The justification is in the comment: only used for indent(), not parsing.

* fix(diagram-skill): use full semgrep rule IDs for nosemgrep suppression

* fix(diagram-skill): remove stdlib xml.etree.ElementTree import entirely

Drop the indent() call and StdET import that triggered Bandit B405 and
Semgrep XXE rules. XML indentation is cosmetic — draw.io and agents
handle non-indented XML fine. Added a comment noting that a custom
indent helper can be added later if human-readable XML is needed.

* fix(diagram-skill): sanitize XML attributes in validator error messages

Prevent prompt injection via crafted draw.io cell attributes that flow
through the PostToolUse hook into agent systemMessage context. Adds
_sanitize_attr() which strips control characters, collapses whitespace,
restricts to safe characters, and truncates at 80 chars. Applied to all
user-controlled attribute values in error/warning messages.

* fix(deploy-on-aws): harden diagram scripts against DoS, symlink writes, and info leakage (#119)

Follow-on security hardening for the architecture diagram skill (PR#103):

- Add 2MB file size checks before XML parsing in validate_drawio.py and
  post_process_drawio.py (matching existing drawio_url.py guard)
- Add iterparse-based XML depth (50) and element count (50,000) pre-flight
  check to prevent C-stack overflow from deeply nested files
- Reject symlinks in post_process_drawio.py before write-back to prevent
  symlink-follow write attacks
- Wrap fixer pipelines in top-level try/except to prevent unhandled
  exception tracebacks from leaking file paths and source code into the
  hook systemMessage (stderr captured via 2>&1)
- Add timeout wrappers (10s/5s) around all Python subprocess calls in
  validate-drawio.sh to prevent runaway processes from blocking the hook

Co-authored-by: Scott Schreckengaust <345885+scottschreckengaust@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(deploy-on-aws): resolve file paths to prevent path traversal in hook scripts (#120)

* fix(deploy-on-aws): resolve file paths to prevent path traversal in hook scripts

Canonicalize file_path with Path.resolve() before any file operations
in post_process_drawio.py and validate_drawio.py. This prevents path
traversal sequences (e.g., "../../etc/passwd.drawio") in hook input
from escaping the expected directory.

Key ordering: symlink check runs on the original unresolved path
(before resolve(), which follows symlinks and would make is_symlink()
always False), then resolve() canonicalizes, then all subsequent file
I/O uses only the resolved `path` object — never the raw `file_path`.

Addresses Semgrep finding: ai.ai-best-practices.hooks-path-traversal

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(deploy-on-aws): resolve file paths and fix nosemgrep rule IDs

- Canonicalize file_path with Path.resolve() before file operations
  in post_process_drawio.py and validate_drawio.py to prevent path
  traversal (CWE-22) from hook input
- Move symlink check before resolve() (resolve follows symlinks, so
  is_symlink() on resolved path is always False)
- Use resolved `path` object for all post-resolve file I/O
- Fix nosemgrep inline comments to use full rule IDs (truncated IDs
  were silently ignored by semgrep)

Addresses Semgrep: ai.ai-best-practices.hooks-path-traversal

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(deploy-on-aws): exit non-zero when refusing symlink drawio files

A symlink .drawio file is a suspicious/unusual condition (potential
attack), not an expected no-op like "not a drawio file." Exit 1 gives
CLI callers and CI an honest signal that the file was refused. The
hook harness (validate-drawio.sh) uses || true so this is safe there.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Scott Schreckengaust <345885+scottschreckengaust@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(deploy-on-aws): use os.path.realpath() for semgrep-recognized taint sanitization

Replace Path.resolve() with os.path.realpath() for path canonicalization.
Both are functionally identical, but semgrep's hooks-path-traversal rule
only recognizes os.path.realpath() as a taint sanitizer. This eliminates
the CI semgrep finding without needing nosemgrep inline suppressions.

Also replaces Path.is_symlink() with os.path.islink() for consistency.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(deploy-on-aws): sync Codex plugin manifest with v1.2.0 diagram skill changes

* Update plugins/deploy-on-aws/skills/aws-architecture-diagram/SKILL.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: mayakost <130111194+mayakost@users.noreply.github.com>

---------

Signed-off-by: mayakost <130111194+mayakost@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Sam Castro <scoropeza@gmail.com>
Co-authored-by: Laith Al-Saadoon <9553966+theagenticguy@users.noreply.github.com>
Co-authored-by: Scott Schreckengaust <scottschreckengaust@users.noreply.github.com>
Co-authored-by: Scott Schreckengaust <345885+scottschreckengaust@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants