fix(deploy-on-aws): hardening diagram#119
Merged
scottschreckengaust merged 1 commit intofeat/diagram-skillfrom Apr 7, 2026
Merged
fix(deploy-on-aws): hardening diagram#119scottschreckengaust merged 1 commit intofeat/diagram-skillfrom
scottschreckengaust merged 1 commit intofeat/diagram-skillfrom
Conversation
…s, and info leakage Follow-on security hardening for the architecture diagram skill (PR#103): - Add 2MB file size checks before XML parsing in validate_drawio.py and post_process_drawio.py (matching existing drawio_url.py guard) - Add iterparse-based XML depth (50) and element count (50,000) pre-flight check to prevent C-stack overflow from deeply nested files - Reject symlinks in post_process_drawio.py before write-back to prevent symlink-follow write attacks - Wrap fixer pipelines in top-level try/except to prevent unhandled exception tracebacks from leaking file paths and source code into the hook systemMessage (stderr captured via 2>&1) - Add timeout wrappers (10s/5s) around all Python subprocess calls in validate-drawio.sh to prevent runaway processes from blocking the hook Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
github-merge-queue bot
pushed a commit
that referenced
this pull request
Apr 7, 2026
* add diagram skill * fix: resolve CI failures — use defusedxml for safe XML parsing, fix markdown lint Replace xml.etree.ElementTree with defusedxml.ElementTree in all Python scripts to address Bandit B314/B405 and Semgrep XXE findings. Add nosec suppression to subprocess import in drawio_url.py (browser URL opener only). Fix 106 markdownlint errors (MD022/MD031/MD032 blank lines, MD001 heading levels) across 7 reference files and SKILL.md. Trim SKILL.md from 515 to 495 lines to satisfy the 500-line SKILL001 limit. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * Add architecture diagram skill * fix: split SKILL.md into references to pass skill-length lint rule Move detailed XML generation rules and layout guidelines from SKILL.md (494 lines) into references/xml-generation-rules.md and references/layout-guidelines.md. SKILL.md is now 219 lines, well under the 300-line warning threshold enforced by SKILL001. * fix: apply dprint formatting to pass fmt:check Run dprint fmt on 7 files (JSON and Markdown) that were not conforming to the project's formatting rules. * fix(diagram-skill): remove defusedxml dependency, fix code quality issues - Replace defusedxml with stdlib xml.etree.ElementTree in all 5 Python scripts and validate-drawio.sh (XML is self-generated, no XXE risk) - Remove pip3 auto-install from PostToolUse hook - Remove unused import sys in fix_step_badges.py - Remove unused style variable in post_process_drawio.py fix_legend_size - Add explanatory comment to except block in post_process_drawio.py * refactor(diagram-skill): consolidate references, delete outdated examples Delete 5 files (-2039 lines): - example-diagrams.md (803 lines): outdated XML examples conflicting with .drawio refs - aws-diagram-guidelines.md (536 lines): internal wiki copy, all rules in style-guide.md - xml-structure.md (209 lines): merged unique parts into group-styles.md + layout-guidelines.md - xml-generation-rules.md (186 lines): trimmed and renamed to xml-rules.md (89 lines) - aws4-shapes.md (231 lines): split into aws4-shapes-services.md (69) + aws4-shapes-resources.md (100) All reference files now under 100-line DESIGN_GUIDELINES limit. All shape names preserved in the split files (zero shape deletions). All cross-references updated to point to new file names. * refactor(diagram-skill): restructure SKILL.md for progressive disclosure - Remove eager "Study" paragraph that loaded ~135k tokens upfront - Move reference loading into Step 3 where references are actually needed - Add example selection table: agent picks 1-2 most relevant .drawio examples per diagram type (reduces ~95k to ~10-18k tokens) - Update cross-references to renamed files (xml-rules.md, aws4-shapes-services.md) - SKILL.md: 226 lines (under 300 limit) * fix(diagram-skill): remove white background and edge label backgrounds for dark mode - Remove background="#FFFFFF" from mxGraphModel XML template in SKILL.md - Change rule from "ALWAYS set background=#FFFFFF" to "NEVER set background" - Change edge label rule from labelBackgroundColor=#ffffff to labelBackgroundColor=none in layout-guidelines.md and xml-rules.md - Both white backgrounds break dark mode adaptive contrast * feat(diagram-skill): support non-AWS architecture diagrams - New references/general-icons.md: maps 40+ non-AWS technologies (PostgreSQL, Docker, Kubernetes, CoreML, HuggingFace, GitHub, etc.) to draw.io shapes with category-appropriate tint colors - Mode A codebase analysis now detects non-AWS tech when no AWS infrastructure files are found (Dockerfiles, DB configs, ML frameworks) - Mixed architectures supported: AWS icons for AWS services, general icons for non-AWS, same layout rules for both - Step 3 loads general-icons.md only when non-AWS services are present * fix(diagram-skill): restore defusedxml, add README, fix markdown lint - Restore defusedxml.ElementTree imports (CI scanners require it) - Replace silent pip3 auto-install with informative error message when defusedxml is missing — no more modifying user's environment - Add scripts/requirements.txt declaring defusedxml>=0.7.1 - Create plugin README.md per DESIGN_GUIDELINES (prerequisites, skills, MCP servers, installation, examples) - Document defusedxml dependency in post-processing.md - Fix 20 markdownlint errors (MD022, MD029, MD031, MD032, MD060) - Apply dprint formatting to table separators * fix(diagram-skill): apply dprint formatting to group-styles.md table * fix(diagram-skill): align validator with skill rules, fix template and post-processor bugs - Reject hardcoded background attribute on mxGraphModel (breaks dark mode) - Reject bare <mxGraphModel> without <mxfile> wrapper per skill contract - Lower font size threshold from 11px to 10px to match skill's font hierarchy - Remove stale fontColor=#232F3E warning (redundant with style guide) - Fix legend-panel parent ID typo in xml-templates-structure.md - Fix .upper() bug in badge skip condition (dead code) - Replace blunt "Do NOT read .drawio" rule with Reference Priority section - Remove duplicate developer_tools entry in aws4-shapes.json that shadowed xray * fix(diagram-skill): harden scripts against injection, error suppression, and resource exhaustion - Replace JSON escaping fallback with safe static message (CWE-838) - Use importlib.util for sibling imports instead of sys.path manipulation (CWE-426) - Fix defusedxml ET.indent() bug by using stdlib for write-time formatting - Replace bare except with specific exception types in hook JSON parser (CWE-396) - Add 2MB file size guard to URL generator before reading into memory (CWE-400) * fix(diagram-skill): add nosec/nosemgrep suppression for stdlib indent() import The xml.etree.ElementTree import is only used for indent() (write-time formatting), not XML parsing. All parsing uses defusedxml. Add inline suppression for Bandit B405 and Semgrep XXE rules to match the existing pattern in drawio_url.py. * fix(diagram-skill): use bare nosemgrep suppression for stdlib indent() import The specific rule ID nosemgrep didn't match because Semgrep uses a composite rule ID. Bare # nosemgrep suppresses all rules on the line. The justification is in the comment: only used for indent(), not parsing. * fix(diagram-skill): use full semgrep rule IDs for nosemgrep suppression * fix(diagram-skill): remove stdlib xml.etree.ElementTree import entirely Drop the indent() call and StdET import that triggered Bandit B405 and Semgrep XXE rules. XML indentation is cosmetic — draw.io and agents handle non-indented XML fine. Added a comment noting that a custom indent helper can be added later if human-readable XML is needed. * fix(diagram-skill): sanitize XML attributes in validator error messages Prevent prompt injection via crafted draw.io cell attributes that flow through the PostToolUse hook into agent systemMessage context. Adds _sanitize_attr() which strips control characters, collapses whitespace, restricts to safe characters, and truncates at 80 chars. Applied to all user-controlled attribute values in error/warning messages. * fix(deploy-on-aws): harden diagram scripts against DoS, symlink writes, and info leakage (#119) Follow-on security hardening for the architecture diagram skill (PR#103): - Add 2MB file size checks before XML parsing in validate_drawio.py and post_process_drawio.py (matching existing drawio_url.py guard) - Add iterparse-based XML depth (50) and element count (50,000) pre-flight check to prevent C-stack overflow from deeply nested files - Reject symlinks in post_process_drawio.py before write-back to prevent symlink-follow write attacks - Wrap fixer pipelines in top-level try/except to prevent unhandled exception tracebacks from leaking file paths and source code into the hook systemMessage (stderr captured via 2>&1) - Add timeout wrappers (10s/5s) around all Python subprocess calls in validate-drawio.sh to prevent runaway processes from blocking the hook Co-authored-by: Scott Schreckengaust <345885+scottschreckengaust@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * fix(deploy-on-aws): resolve file paths to prevent path traversal in hook scripts (#120) * fix(deploy-on-aws): resolve file paths to prevent path traversal in hook scripts Canonicalize file_path with Path.resolve() before any file operations in post_process_drawio.py and validate_drawio.py. This prevents path traversal sequences (e.g., "../../etc/passwd.drawio") in hook input from escaping the expected directory. Key ordering: symlink check runs on the original unresolved path (before resolve(), which follows symlinks and would make is_symlink() always False), then resolve() canonicalizes, then all subsequent file I/O uses only the resolved `path` object — never the raw `file_path`. Addresses Semgrep finding: ai.ai-best-practices.hooks-path-traversal Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(deploy-on-aws): resolve file paths and fix nosemgrep rule IDs - Canonicalize file_path with Path.resolve() before file operations in post_process_drawio.py and validate_drawio.py to prevent path traversal (CWE-22) from hook input - Move symlink check before resolve() (resolve follows symlinks, so is_symlink() on resolved path is always False) - Use resolved `path` object for all post-resolve file I/O - Fix nosemgrep inline comments to use full rule IDs (truncated IDs were silently ignored by semgrep) Addresses Semgrep: ai.ai-best-practices.hooks-path-traversal Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(deploy-on-aws): exit non-zero when refusing symlink drawio files A symlink .drawio file is a suspicious/unusual condition (potential attack), not an expected no-op like "not a drawio file." Exit 1 gives CLI callers and CI an honest signal that the file was refused. The hook harness (validate-drawio.sh) uses || true so this is safe there. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> --------- Co-authored-by: Scott Schreckengaust <345885+scottschreckengaust@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> * fix(deploy-on-aws): use os.path.realpath() for semgrep-recognized taint sanitization Replace Path.resolve() with os.path.realpath() for path canonicalization. Both are functionally identical, but semgrep's hooks-path-traversal rule only recognizes os.path.realpath() as a taint sanitizer. This eliminates the CI semgrep finding without needing nosemgrep inline suppressions. Also replaces Path.is_symlink() with os.path.islink() for consistency. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com> * fix(deploy-on-aws): sync Codex plugin manifest with v1.2.0 diagram skill changes * Update plugins/deploy-on-aws/skills/aws-architecture-diagram/SKILL.md Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com> Signed-off-by: mayakost <130111194+mayakost@users.noreply.github.com> --------- Signed-off-by: mayakost <130111194+mayakost@users.noreply.github.com> Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com> Co-authored-by: Sam Castro <scoropeza@gmail.com> Co-authored-by: Laith Al-Saadoon <9553966+theagenticguy@users.noreply.github.com> Co-authored-by: Scott Schreckengaust <scottschreckengaust@users.noreply.github.com> Co-authored-by: Scott Schreckengaust <345885+scottschreckengaust@users.noreply.github.com> Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-on security hardening for the architecture diagram skill (#103), addressing defense-in-depth gaps identified during security review:
validate_drawio.pyandpost_process_drawio.py(matching existingdrawio_url.py), plusiterparse-based XML depth (50 levels) and element count (50,000) pre-flight check to prevent C-stack overflow from deeply nested filespost_process_drawio.pybefore write-back to prevent symlink-follow attacks that could overwrite arbitrary XML filestry/exceptthat emits generic errors instead of Python tracebacks (which would leak file paths and source code into the hooksystemMessagevia2>&1capture)timeout 10/timeout 5wrappers around all Python subprocess calls invalidate-drawio.shto prevent runaway processes from blocking the hook indefinitelyFiles changed
validate_drawio.py_check_xml_limits()iterparse pre-flight, top-level traceback suppressionpost_process_drawio.pyvalidate-drawio.shtimeoutwrappers on all 3 Python subprocess callsSecurity findings addressed
path.is_symlink()checktimeoutwrappersTest plan
.drawiovalidation still works:python3 plugins/deploy-on-aws/scripts/lib/validate_drawio.py <valid-file>.drawio.drawiofile and confirm it's rejected with a clear error.drawiowith >50 nested elements and confirm the pre-flight check catches itln -s /tmp/target.drawio test.drawioand confirm post-processor skips itmise run build— passes cleanly (lint, manifests, cross-refs, formatting, all security scans)Generated with Claude Code
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.