Skip to content

fix(deploy-on-aws): hardening diagram#119

Merged
scottschreckengaust merged 1 commit intofeat/diagram-skillfrom
fix/diagram-security-hardening
Apr 7, 2026
Merged

fix(deploy-on-aws): hardening diagram#119
scottschreckengaust merged 1 commit intofeat/diagram-skillfrom
fix/diagram-security-hardening

Conversation

@scottschreckengaust
Copy link
Copy Markdown
Member

Summary

Follow-on security hardening for the architecture diagram skill (#103), addressing defense-in-depth gaps identified during security review:

  • DoS prevention: Add 2MB file size guards before XML parsing in validate_drawio.py and post_process_drawio.py (matching existing drawio_url.py), plus iterparse-based XML depth (50 levels) and element count (50,000) pre-flight check to prevent C-stack overflow from deeply nested files
  • Symlink write protection: Reject symlinks in post_process_drawio.py before write-back to prevent symlink-follow attacks that could overwrite arbitrary XML files
  • Info leakage prevention: Wrap fixer pipelines in top-level try/except that emits generic errors instead of Python tracebacks (which would leak file paths and source code into the hook systemMessage via 2>&1 capture)
  • Timeout guards: Add timeout 10/timeout 5 wrappers around all Python subprocess calls in validate-drawio.sh to prevent runaway processes from blocking the hook indefinitely

Files changed

File Changes
validate_drawio.py File size check, _check_xml_limits() iterparse pre-flight, top-level traceback suppression
post_process_drawio.py File size check, symlink rejection, top-level traceback suppression
validate-drawio.sh timeout wrappers on all 3 Python subprocess calls

Security findings addressed

Category Severity Fix
Unbounded XML parse (DoS) Low 2MB size cap + depth/count limits
Symlink-follow write Low path.is_symlink() check
Traceback info leakage (CWE-209) Low Generic error messages in top-level except
Hook process hang (DoS) Low Shell timeout wrappers

Test plan

  • Verify normal .drawio validation still works: python3 plugins/deploy-on-aws/scripts/lib/validate_drawio.py <valid-file>.drawio
  • Verify oversized file rejection: create a >2MB .drawio file and confirm it's rejected with a clear error
  • Verify deep nesting rejection: create a .drawio with >50 nested elements and confirm the pre-flight check catches it
  • Verify symlink rejection: ln -s /tmp/target.drawio test.drawio and confirm post-processor skips it
  • Run mise run build — passes cleanly (lint, manifests, cross-refs, formatting, all security scans)

Generated with Claude Code


By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.

…s, and info leakage

Follow-on security hardening for the architecture diagram skill (PR#103):

- Add 2MB file size checks before XML parsing in validate_drawio.py and
  post_process_drawio.py (matching existing drawio_url.py guard)
- Add iterparse-based XML depth (50) and element count (50,000) pre-flight
  check to prevent C-stack overflow from deeply nested files
- Reject symlinks in post_process_drawio.py before write-back to prevent
  symlink-follow write attacks
- Wrap fixer pipelines in top-level try/except to prevent unhandled
  exception tracebacks from leaking file paths and source code into the
  hook systemMessage (stderr captured via 2>&1)
- Add timeout wrappers (10s/5s) around all Python subprocess calls in
  validate-drawio.sh to prevent runaway processes from blocking the hook

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@scottschreckengaust scottschreckengaust requested review from a team as code owners April 7, 2026 07:04
@scottschreckengaust scottschreckengaust changed the title fix(deploy-on-aws): harden diagram scripts against DoS, symlink writes, and info leakage fix(deploy-on-aws): hardening diagram Apr 7, 2026
@scottschreckengaust scottschreckengaust merged commit 575876c into feat/diagram-skill Apr 7, 2026
8 checks passed
@scottschreckengaust scottschreckengaust deleted the fix/diagram-security-hardening branch April 7, 2026 07:08
github-merge-queue bot pushed a commit that referenced this pull request Apr 7, 2026
* add diagram skill

* fix: resolve CI failures — use defusedxml for safe XML parsing, fix markdown lint

Replace xml.etree.ElementTree with defusedxml.ElementTree in all Python
scripts to address Bandit B314/B405 and Semgrep XXE findings. Add nosec
suppression to subprocess import in drawio_url.py (browser URL opener only).
Fix 106 markdownlint errors (MD022/MD031/MD032 blank lines, MD001 heading
levels) across 7 reference files and SKILL.md. Trim SKILL.md from 515 to
495 lines to satisfy the 500-line SKILL001 limit.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* Add architecture diagram skill

* fix: split SKILL.md into references to pass skill-length lint rule

Move detailed XML generation rules and layout guidelines from SKILL.md
(494 lines) into references/xml-generation-rules.md and
references/layout-guidelines.md. SKILL.md is now 219 lines, well under
the 300-line warning threshold enforced by SKILL001.

* fix: apply dprint formatting to pass fmt:check

Run dprint fmt on 7 files (JSON and Markdown) that were not
conforming to the project's formatting rules.

* fix(diagram-skill): remove defusedxml dependency, fix code quality issues

- Replace defusedxml with stdlib xml.etree.ElementTree in all 5 Python
  scripts and validate-drawio.sh (XML is self-generated, no XXE risk)
- Remove pip3 auto-install from PostToolUse hook
- Remove unused import sys in fix_step_badges.py
- Remove unused style variable in post_process_drawio.py fix_legend_size
- Add explanatory comment to except block in post_process_drawio.py

* refactor(diagram-skill): consolidate references, delete outdated examples

Delete 5 files (-2039 lines):
- example-diagrams.md (803 lines): outdated XML examples conflicting with .drawio refs
- aws-diagram-guidelines.md (536 lines): internal wiki copy, all rules in style-guide.md
- xml-structure.md (209 lines): merged unique parts into group-styles.md + layout-guidelines.md
- xml-generation-rules.md (186 lines): trimmed and renamed to xml-rules.md (89 lines)
- aws4-shapes.md (231 lines): split into aws4-shapes-services.md (69) + aws4-shapes-resources.md (100)

All reference files now under 100-line DESIGN_GUIDELINES limit.
All shape names preserved in the split files (zero shape deletions).
All cross-references updated to point to new file names.

* refactor(diagram-skill): restructure SKILL.md for progressive disclosure

- Remove eager "Study" paragraph that loaded ~135k tokens upfront
- Move reference loading into Step 3 where references are actually needed
- Add example selection table: agent picks 1-2 most relevant .drawio
  examples per diagram type (reduces ~95k to ~10-18k tokens)
- Update cross-references to renamed files (xml-rules.md,
  aws4-shapes-services.md)
- SKILL.md: 226 lines (under 300 limit)

* fix(diagram-skill): remove white background and edge label backgrounds for dark mode

- Remove background="#FFFFFF" from mxGraphModel XML template in SKILL.md
- Change rule from "ALWAYS set background=#FFFFFF" to "NEVER set background"
- Change edge label rule from labelBackgroundColor=#ffffff to
  labelBackgroundColor=none in layout-guidelines.md and xml-rules.md
- Both white backgrounds break dark mode adaptive contrast

* feat(diagram-skill): support non-AWS architecture diagrams

- New references/general-icons.md: maps 40+ non-AWS technologies
  (PostgreSQL, Docker, Kubernetes, CoreML, HuggingFace, GitHub, etc.)
  to draw.io shapes with category-appropriate tint colors
- Mode A codebase analysis now detects non-AWS tech when no AWS
  infrastructure files are found (Dockerfiles, DB configs, ML frameworks)
- Mixed architectures supported: AWS icons for AWS services, general
  icons for non-AWS, same layout rules for both
- Step 3 loads general-icons.md only when non-AWS services are present

* fix(diagram-skill): restore defusedxml, add README, fix markdown lint

- Restore defusedxml.ElementTree imports (CI scanners require it)
- Replace silent pip3 auto-install with informative error message
  when defusedxml is missing — no more modifying user's environment
- Add scripts/requirements.txt declaring defusedxml>=0.7.1
- Create plugin README.md per DESIGN_GUIDELINES (prerequisites,
  skills, MCP servers, installation, examples)
- Document defusedxml dependency in post-processing.md
- Fix 20 markdownlint errors (MD022, MD029, MD031, MD032, MD060)
- Apply dprint formatting to table separators

* fix(diagram-skill): apply dprint formatting to group-styles.md table

* fix(diagram-skill): align validator with skill rules, fix template and post-processor bugs

- Reject hardcoded background attribute on mxGraphModel (breaks dark mode)
- Reject bare <mxGraphModel> without <mxfile> wrapper per skill contract
- Lower font size threshold from 11px to 10px to match skill's font hierarchy
- Remove stale fontColor=#232F3E warning (redundant with style guide)
- Fix legend-panel parent ID typo in xml-templates-structure.md
- Fix .upper() bug in badge skip condition (dead code)
- Replace blunt "Do NOT read .drawio" rule with Reference Priority section
- Remove duplicate developer_tools entry in aws4-shapes.json that shadowed xray

* fix(diagram-skill): harden scripts against injection, error suppression, and resource exhaustion

- Replace JSON escaping fallback with safe static message (CWE-838)
- Use importlib.util for sibling imports instead of sys.path manipulation (CWE-426)
- Fix defusedxml ET.indent() bug by using stdlib for write-time formatting
- Replace bare except with specific exception types in hook JSON parser (CWE-396)
- Add 2MB file size guard to URL generator before reading into memory (CWE-400)

* fix(diagram-skill): add nosec/nosemgrep suppression for stdlib indent() import

The xml.etree.ElementTree import is only used for indent() (write-time
formatting), not XML parsing. All parsing uses defusedxml. Add inline
suppression for Bandit B405 and Semgrep XXE rules to match the existing
pattern in drawio_url.py.

* fix(diagram-skill): use bare nosemgrep suppression for stdlib indent() import

The specific rule ID nosemgrep didn't match because Semgrep uses a
composite rule ID. Bare # nosemgrep suppresses all rules on the line.
The justification is in the comment: only used for indent(), not parsing.

* fix(diagram-skill): use full semgrep rule IDs for nosemgrep suppression

* fix(diagram-skill): remove stdlib xml.etree.ElementTree import entirely

Drop the indent() call and StdET import that triggered Bandit B405 and
Semgrep XXE rules. XML indentation is cosmetic — draw.io and agents
handle non-indented XML fine. Added a comment noting that a custom
indent helper can be added later if human-readable XML is needed.

* fix(diagram-skill): sanitize XML attributes in validator error messages

Prevent prompt injection via crafted draw.io cell attributes that flow
through the PostToolUse hook into agent systemMessage context. Adds
_sanitize_attr() which strips control characters, collapses whitespace,
restricts to safe characters, and truncates at 80 chars. Applied to all
user-controlled attribute values in error/warning messages.

* fix(deploy-on-aws): harden diagram scripts against DoS, symlink writes, and info leakage (#119)

Follow-on security hardening for the architecture diagram skill (PR#103):

- Add 2MB file size checks before XML parsing in validate_drawio.py and
  post_process_drawio.py (matching existing drawio_url.py guard)
- Add iterparse-based XML depth (50) and element count (50,000) pre-flight
  check to prevent C-stack overflow from deeply nested files
- Reject symlinks in post_process_drawio.py before write-back to prevent
  symlink-follow write attacks
- Wrap fixer pipelines in top-level try/except to prevent unhandled
  exception tracebacks from leaking file paths and source code into the
  hook systemMessage (stderr captured via 2>&1)
- Add timeout wrappers (10s/5s) around all Python subprocess calls in
  validate-drawio.sh to prevent runaway processes from blocking the hook

Co-authored-by: Scott Schreckengaust <345885+scottschreckengaust@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(deploy-on-aws): resolve file paths to prevent path traversal in hook scripts (#120)

* fix(deploy-on-aws): resolve file paths to prevent path traversal in hook scripts

Canonicalize file_path with Path.resolve() before any file operations
in post_process_drawio.py and validate_drawio.py. This prevents path
traversal sequences (e.g., "../../etc/passwd.drawio") in hook input
from escaping the expected directory.

Key ordering: symlink check runs on the original unresolved path
(before resolve(), which follows symlinks and would make is_symlink()
always False), then resolve() canonicalizes, then all subsequent file
I/O uses only the resolved `path` object — never the raw `file_path`.

Addresses Semgrep finding: ai.ai-best-practices.hooks-path-traversal

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(deploy-on-aws): resolve file paths and fix nosemgrep rule IDs

- Canonicalize file_path with Path.resolve() before file operations
  in post_process_drawio.py and validate_drawio.py to prevent path
  traversal (CWE-22) from hook input
- Move symlink check before resolve() (resolve follows symlinks, so
  is_symlink() on resolved path is always False)
- Use resolved `path` object for all post-resolve file I/O
- Fix nosemgrep inline comments to use full rule IDs (truncated IDs
  were silently ignored by semgrep)

Addresses Semgrep: ai.ai-best-practices.hooks-path-traversal

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(deploy-on-aws): exit non-zero when refusing symlink drawio files

A symlink .drawio file is a suspicious/unusual condition (potential
attack), not an expected no-op like "not a drawio file." Exit 1 gives
CLI callers and CI an honest signal that the file was refused. The
hook harness (validate-drawio.sh) uses || true so this is safe there.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Scott Schreckengaust <345885+scottschreckengaust@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>

* fix(deploy-on-aws): use os.path.realpath() for semgrep-recognized taint sanitization

Replace Path.resolve() with os.path.realpath() for path canonicalization.
Both are functionally identical, but semgrep's hooks-path-traversal rule
only recognizes os.path.realpath() as a taint sanitizer. This eliminates
the CI semgrep finding without needing nosemgrep inline suppressions.

Also replaces Path.is_symlink() with os.path.islink() for consistency.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

* fix(deploy-on-aws): sync Codex plugin manifest with v1.2.0 diagram skill changes

* Update plugins/deploy-on-aws/skills/aws-architecture-diagram/SKILL.md

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Signed-off-by: mayakost <130111194+mayakost@users.noreply.github.com>

---------

Signed-off-by: mayakost <130111194+mayakost@users.noreply.github.com>
Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
Co-authored-by: Sam Castro <scoropeza@gmail.com>
Co-authored-by: Laith Al-Saadoon <9553966+theagenticguy@users.noreply.github.com>
Co-authored-by: Scott Schreckengaust <scottschreckengaust@users.noreply.github.com>
Co-authored-by: Scott Schreckengaust <345885+scottschreckengaust@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant