Skip to content

🐛 Fix code_generator.py: undefined variables cause identical boilerplate instead of unique tutorial-specific output #122

@anthropic-code-agent

Description

@anthropic-code-agent

Problem Summary

The code_generator.py module produces identical boilerplate code for all videos instead of generating unique, tutorial-specific output. This is caused by undefined variables in the _build_generation_context method that crash during generation or result in empty/generic content.

Root Cause Analysis

Critical Bug Location

File: src/youtube_extension/backend/code_generator.py
Method: _build_generation_context (lines 1200-1290)

Undefined Variables (Line 1275-1290)

# Line 1275-1282: Undefined variables used in dict update
extracted_info.update({
    "title": title,
    "technologies": technologies,
    "features": features,  # ❌ UNDEFINED - never assigned in this method!
    "tutorial_steps": tutorial_steps,
    "project_type": project_config.get("type", ...),
    "build_plan": build_plan,
})

# Line 1284-1290: Undefined variables in return dict
return {
    "extracted_info": extracted_info,
    "build_plan": build_plan,
    "summary": summary,  # ❌ UNDEFINED - never assigned in this method!
    "key_concepts": key_concepts,  # ❌ UNDEFINED - never assigned in this method!
    "build_plan": build_plan
}

Test Evidence

Running pytest tests/test_code_generator_unique_output.py produces:

ERROR: ❌ Project generation failed: name 'features' is not defined

All 4 generation tests fail with the same NameError.

Impact

  1. Broken video-to-code pipeline: All videos produce identical generic boilerplate or crash
  2. Lost customization: Video-specific features, summaries, and concepts are missing
  3. Failed tests: 4 out of 6 tests in test_code_generator_unique_output.py fail
  4. Poor user experience: Users expect tutorial-specific output but get generic templates

Additional Issues Found

1. Unreachable Code Block (Line 1259)

# Line 1257-1258
if not tutorial_steps:
    tutorial_steps = self._derive_tutorial_steps(ai_analysis, video_analysis)

else:  # Line 1259 - DEAD CODE!
    # This fallback block can never execute because:
    # - Lines 1253-1257 ensure tutorial_steps is always defined
    # - No execution path can reach this else clause

2. Incorrect BuildStep Field Access (Line 1244)

# Line 1243-1246: Accessing non-existent 'title' field
"tutorial_steps": [
    f"{step.get('title')}: {step.get('description')}"  # ❌ BuildStep has no 'title'
    for step in build_plan.get("steps", [])[:8]
],

BuildStep dataclass (from build_plan.py:27-68) has these fields:

  • order (int)
  • action (str)
  • target_file (str)
  • description (str)
  • code_content (str)
  • dependencies (list[str])

No title field exists, so step.get('title') always returns None.

Proposed Fix

Fix 1: Define Missing Variables

Extract features, summary, and key_concepts from the video analysis data:

# Before line 1275, add:
features = extracted_info.get("features", [])
summary = video_analysis.get("summary") or ai_analysis.get("Content Summary", "")
key_concepts = video_analysis.get("key_concepts", []) or ai_analysis.get("Key Concepts", [])

Fix 2: Remove Dead Code

Delete the unreachable else block (lines 1259-1273).

Fix 3: Fix BuildStep Field Access

Replace step.get('title') with correct field:

"tutorial_steps": [
    f"Step {step.get('order', i+1)}: {step.get('description', step.get('action', ''))}"
    for i, step in enumerate(build_plan.get("steps", [])[:8])
],

Testing

After fixes, all tests in test_code_generator_unique_output.py should pass:

  • test_different_videos_produce_different_js
  • test_different_videos_produce_different_css
  • test_same_video_produces_deterministic_output
  • test_title_appears_in_all_generated_files

Architecture Context

The video-to-code pipeline has 3 stages:

  1. Stage 1: Video Analysis (transcript extraction, AI analysis)
  2. Stage 2: Semantic Parsing (BuildPlan extraction) - works correctly ✅
  3. Stage 3: Code Generation - broken due to undefined variables ❌

The architecture is sound (fingerprinting, AI fallback, BuildPlan parsing all work). The issue is purely in the variable extraction and flow between BuildPlan and code generation.

Related Files

  • src/youtube_extension/backend/code_generator.py (main bug location)
  • src/youtube_extension/backend/build_plan.py (BuildStep dataclass)
  • src/youtube_extension/backend/ai_code_generator.py (AI generation path)
  • tests/test_code_generator_unique_output.py (failing tests)

Labels

  • bug - Critical functionality broken
  • code-generation - Affects video-to-code pipeline
  • high-priority - All generation tests fail

Acceptance Criteria

  • All 6 tests in test_code_generator_unique_output.py pass
  • Different videos produce different main.js content
  • Different videos produce different CSS (accent colors)
  • Video titles appear in all generated files
  • No NameError or AttributeError in generation logs

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions