fix(deploy-on-aws): hardening diagram by scottschreckengaust · Pull Request #119 · awslabs/agent-plugins

scottschreckengaust · 2026-04-07T07:04:54Z

Summary

Follow-on security hardening for the architecture diagram skill (#103), addressing defense-in-depth gaps identified during security review:

DoS prevention: Add 2MB file size guards before XML parsing in validate_drawio.py and post_process_drawio.py (matching existing drawio_url.py), plus iterparse-based XML depth (50 levels) and element count (50,000) pre-flight check to prevent C-stack overflow from deeply nested files
Symlink write protection: Reject symlinks in post_process_drawio.py before write-back to prevent symlink-follow attacks that could overwrite arbitrary XML files
Info leakage prevention: Wrap fixer pipelines in top-level try/except that emits generic errors instead of Python tracebacks (which would leak file paths and source code into the hook systemMessage via 2>&1 capture)
Timeout guards: Add timeout 10/timeout 5 wrappers around all Python subprocess calls in validate-drawio.sh to prevent runaway processes from blocking the hook indefinitely

Files changed

File	Changes
`validate_drawio.py`	File size check, `_check_xml_limits()` iterparse pre-flight, top-level traceback suppression
`post_process_drawio.py`	File size check, symlink rejection, top-level traceback suppression
`validate-drawio.sh`	`timeout` wrappers on all 3 Python subprocess calls

Security findings addressed

Category	Severity	Fix
Unbounded XML parse (DoS)	Low	2MB size cap + depth/count limits
Symlink-follow write	Low	`path.is_symlink()` check
Traceback info leakage (CWE-209)	Low	Generic error messages in top-level except
Hook process hang (DoS)	Low	Shell `timeout` wrappers

Test plan

Verify normal .drawio validation still works: python3 plugins/deploy-on-aws/scripts/lib/validate_drawio.py <valid-file>.drawio
Verify oversized file rejection: create a >2MB .drawio file and confirm it's rejected with a clear error
Verify deep nesting rejection: create a .drawio with >50 nested elements and confirm the pre-flight check catches it
Verify symlink rejection: ln -s /tmp/target.drawio test.drawio and confirm post-processor skips it
Run mise run build — passes cleanly (lint, manifests, cross-refs, formatting, all security scans)

Generated with Claude Code

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.

…s, and info leakage Follow-on security hardening for the architecture diagram skill (PR#103): - Add 2MB file size checks before XML parsing in validate_drawio.py and post_process_drawio.py (matching existing drawio_url.py guard) - Add iterparse-based XML depth (50) and element count (50,000) pre-flight check to prevent C-stack overflow from deeply nested files - Reject symlinks in post_process_drawio.py before write-back to prevent symlink-follow write attacks - Wrap fixer pipelines in top-level try/except to prevent unhandled exception tracebacks from leaking file paths and source code into the hook systemMessage (stderr captured via 2>&1) - Add timeout wrappers (10s/5s) around all Python subprocess calls in validate-drawio.sh to prevent runaway processes from blocking the hook Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* add diagram skill * fix: resolve CI failures — use defusedxml for safe XML parsing, fix markdown lint Replace xml.etree.ElementTree with defusedxml.ElementTree in all Python scripts to address Bandit B314/B405 and Semgrep XXE findings. Add nosec suppression to subprocess import in drawio_url.py (browser URL opener only). Fix 106 markdownlint errors (MD022/MD031/MD032 blank lines, MD001 heading levels) across 7 reference files and SKILL.md. Trim SKILL.md from 515 to 495 lines to satisfy the 500-line SKILL001 limit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add architecture diagram skill * fix: split SKILL.md into references to pass skill-length lint rule Move detailed XML generation rules and layout guidelines from SKILL.md (494 lines) into references/xml-generation-rules.md and references/layout-guidelines.md. SKILL.md is now 219 lines, well under the 300-line warning threshold enforced by SKILL001. * fix: apply dprint formatting to pass fmt:check Run dprint fmt on 7 files (JSON and Markdown) that were not conforming to the project's formatting rules. * fix(diagram-skill): remove defusedxml dependency, fix code quality issues - Replace defusedxml with stdlib xml.etree.ElementTree in all 5 Python scripts and validate-drawio.sh (XML is self-generated, no XXE risk) - Remove pip3 auto-install from PostToolUse hook - Remove unused import sys in fix_step_badges.py - Remove unused style variable in post_process_drawio.py fix_legend_size - Add explanatory comment to except block in post_process_drawio.py * refactor(diagram-skill): consolidate references, delete outdated examples Delete 5 files (-2039 lines): - example-diagrams.md (803 lines): outdated XML examples conflicting with .drawio refs - aws-diagram-guidelines.md (536 lines): internal wiki copy, all rules in style-guide.md - xml-structure.md (209 lines): merged unique parts into group-styles.md + layout-guidelines.md - xml-generation-rules.md (186 lines): trimmed and renamed to xml-rules.md (89 lines) - aws4-shapes.md (231 lines): split into aws4-shapes-services.md (69) + aws4-shapes-resources.md (100) All reference files now under 100-line DESIGN_GUIDELINES limit. All shape names preserved in the split files (zero shape deletions). All cross-references updated to point to new file names. * refactor(diagram-skill): restructure SKILL.md for progressive disclosure - Remove eager "Study" paragraph that loaded ~135k tokens upfront - Move reference loading into Step 3 where references are actually needed - Add example selection table: agent picks 1-2 most relevant .drawio examples per diagram type (reduces ~95k to ~10-18k tokens) - Update cross-references to renamed files (xml-rules.md, aws4-shapes-services.md) - SKILL.md: 226 lines (under 300 limit) * fix(diagram-skill): remove white background and edge label backgrounds for dark mode - Remove background="#FFFFFF" from mxGraphModel XML template in SKILL.md - Change rule from "ALWAYS set background=#FFFFFF" to "NEVER set background" - Change edge label rule from labelBackgroundColor=#ffffff to labelBackgroundColor=none in layout-guidelines.md and xml-rules.md - Both white backgrounds break dark mode adaptive contrast * feat(diagram-skill): support non-AWS architecture diagrams - New references/general-icons.md: maps 40+ non-AWS technologies (PostgreSQL, Docker, Kubernetes, CoreML, HuggingFace, GitHub, etc.) to draw.io shapes with category-appropriate tint colors - Mode A codebase analysis now detects non-AWS tech when no AWS infrastructure files are found (Dockerfiles, DB configs, ML frameworks) - Mixed architectures supported: AWS icons for AWS services, general icons for non-AWS, same layout rules for both - Step 3 loads general-icons.md only when non-AWS services are present * fix(diagram-skill): restore defusedxml, add README, fix markdown lint - Restore defusedxml.ElementTree imports (CI scanners require it) - Replace silent pip3 auto-install with informative error message when defusedxml is missing — no more modifying user's environment - Add scripts/requirements.txt declaring defusedxml>=0.7.1 - Create plugin README.md per DESIGN_GUIDELINES (prerequisites, skills, MCP servers, installation, examples) - Document defusedxml dependency in post-processing.md - Fix 20 markdownlint errors (MD022, MD029, MD031, MD032, MD060) - Apply dprint formatting to table separators * fix(diagram-skill): apply dprint formatting to group-styles.md table * fix(diagram-skill): align validator with skill rules, fix template and post-processor bugs - Reject hardcoded background attribute on mxGraphModel (breaks dark mode) - Reject bare <mxGraphModel> without <mxfile> wrapper per skill contract - Lower font size threshold from 11px to 10px to match skill's font hierarchy - Remove stale fontColor=#232F3E warning (redundant with style guide) - Fix legend-panel parent ID typo in xml-templates-structure.md - Fix .upper() bug in badge skip condition (dead code) - Replace blunt "Do NOT read .drawio" rule with Reference Priority section - Remove duplicate developer_tools entry in aws4-shapes.json that shadowed xray * fix(diagram-skill): harden scripts against injection, error suppression, and resource exhaustion - Replace JSON escaping fallback with safe static message (CWE-838) - Use importlib.util for sibling imports instead of sys.path manipulation (CWE-426) - Fix defusedxml ET.indent() bug by using stdlib for write-time formatting - Replace bare except with specific exception types in hook JSON parser (CWE-396) - Add 2MB file size guard to URL generator before reading into memory (CWE-400) * fix(diagram-skill): add nosec/nosemgrep suppression for stdlib indent() import The xml.etree.ElementTree import is only used for indent() (write-time formatting), not XML parsing. All parsing uses defusedxml. Add inline suppression for Bandit B405 and Semgrep XXE rules to match the existing pattern in drawio_url.py. * fix(diagram-skill): use bare nosemgrep suppression for stdlib indent() import The specific rule ID nosemgrep didn't match because Semgrep uses a composite rule ID. Bare # nosemgrep suppresses all rules on the line. The justification is in the comment: only used for indent(), not parsing. * fix(diagram-skill): use full semgrep rule IDs for nosemgrep suppression * fix(diagram-skill): remove stdlib xml.etree.ElementTree import entirely Drop the indent() call and StdET import that triggered Bandit B405 and Semgrep XXE rules. XML indentation is cosmetic — draw.io and agents handle non-indented XML fine. Added a comment noting that a custom indent helper can be added later if human-readable XML is needed. * fix(diagram-skill): sanitize XML attributes in validator error messages Prevent prompt injection via crafted draw.io cell attributes that flow through the PostToolUse hook into agent systemMessage context. Adds _sanitize_attr() which strips control characters, collapses whitespace, restricts to safe characters, and truncates at 80 chars. Applied to all user-controlled attribute values in error/warning messages. * fix(deploy-on-aws): harden diagram scripts against DoS, symlink writes, and info leakage (#119) Follow-on security hardening for the architecture diagram skill (PR#103): - Add 2MB file size checks before XML parsing in validate_drawio.py and post_process_drawio.py (matching existing drawio_url.py guard) - Add iterparse-based XML depth (50) and element count (50,000) pre-flight check to prevent C-stack overflow from deeply nested files - Reject symlinks in post_process_drawio.py before write-back to prevent symlink-follow write attacks - Wrap fixer pipelines in top-level try/except to prevent unhandled exception tracebacks from leaking file paths and source code into the hook systemMessage (stderr captured via 2>&1) - Add timeout wrappers (10s/5s) around all Python subprocess calls in validate-drawio.sh to prevent runaway processes from blocking the hook Co-authored-by: Scott Schreckengaust <345885+scottschreckengaust@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * fix(deploy-on-aws): resolve file paths to prevent path traversal in hook scripts (#120) * fix(deploy-on-aws): resolve file paths to prevent path traversal in hook scripts Canonicalize file_path with Path.resolve() before any file operations in post_process_drawio.py and validate_drawio.py. This prevents path traversal sequences (e.g., "../../etc/passwd.drawio") in hook input from escaping the expected directory. Key ordering: symlink check runs on the original unresolved path (before resolve(), which follows symlinks and would make is_symlink() always False), then resolve() canonicalizes, then all subsequent file I/O uses only the resolved `path` object — never the raw `file_path`. Addresses Semgrep finding: ai.ai-best-practices.hooks-path-traversal Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(deploy-on-aws): resolve file paths and fix nosemgrep rule IDs - Canonicalize file_path with Path.resolve() before file operations in post_process_drawio.py and validate_drawio.py to prevent path traversal (CWE-22) from hook input - Move symlink check before resolve() (resolve follows symlinks, so is_symlink() on resolved path is always False) - Use resolved `path` object for all post-resolve file I/O - Fix nosemgrep inline comments to use full rule IDs (truncated IDs were silently ignored by semgrep) Addresses Semgrep: ai.ai-best-practices.hooks-path-traversal Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(deploy-on-aws): exit non-zero when refusing symlink drawio files A symlink .drawio file is a suspicious/unusual condition (potential attack), not an expected no-op like "not a drawio file." Exit 1 gives CLI callers and CI an honest signal that the file was refused. The hook harness (validate-drawio.sh) uses || true so this is safe there. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Scott Schreckengaust <345885+scottschreckengaust@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * fix(deploy-on-aws): use os.path.realpath() for semgrep-recognized taint sanitization Replace Path.resolve() with os.path.realpath() for path canonicalization. Both are functionally identical, but semgrep's hooks-path-traversal rule only recognizes os.path.realpath() as a taint sanitizer. This eliminates the CI semgrep finding without needing nosemgrep inline suppressions. Also replaces Path.is_symlink() with os.path.islink() for consistency. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(deploy-on-aws): sync Codex plugin manifest with v1.2.0 diagram skill changes * Update plugins/deploy-on-aws/skills/aws-architecture-diagram/SKILL.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: mayakost <130111194+mayakost@users.noreply.github.com> --------- Signed-off-by: mayakost <130111194+mayakost@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Sam Castro <scoropeza@gmail.com> Co-authored-by: Laith Al-Saadoon <9553966+theagenticguy@users.noreply.github.com> Co-authored-by: Scott Schreckengaust <scottschreckengaust@users.noreply.github.com> Co-authored-by: Scott Schreckengaust <345885+scottschreckengaust@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

scottschreckengaust requested review from a team as code owners April 7, 2026 07:04

scottschreckengaust requested review from krokoko and theagenticguy April 7, 2026 07:04

scottschreckengaust changed the title ~~fix(deploy-on-aws): harden diagram scripts against DoS, symlink writes, and info leakage~~ fix(deploy-on-aws): hardening diagram Apr 7, 2026

scottschreckengaust merged commit 575876c into feat/diagram-skill Apr 7, 2026
8 checks passed

scottschreckengaust deleted the fix/diagram-security-hardening branch April 7, 2026 07:08

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(deploy-on-aws): hardening diagram#119

fix(deploy-on-aws): hardening diagram#119
scottschreckengaust merged 1 commit intofeat/diagram-skillfrom
fix/diagram-security-hardening

scottschreckengaust commented Apr 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

scottschreckengaust commented Apr 7, 2026

Summary

Files changed

Security findings addressed

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant