Generate production-ready code in 6 programming languages with AI-powered self-consistency prompting
- Overview
- Architecture
- System Components
- AI Model & Methodology
- Installation & Setup
- Usage Guide
- Security & Guardrails
- Logging System
- API Documentation
- Technical Workflow
- Performance & Optimization
- Troubleshooting
Code Wizard is an intelligent code generation platform that leverages advanced AI models to generate production-ready code across multiple programming languages. It combines sophisticated prompting techniques with strict security guardrails to ensure both quality and safety.
β Multi-Language Support: Python, JavaScript, Java, C++, C, SQL β Self-Consistency Prompting: Generates 9 different solutions and picks the best β Real-time Progress Tracking: Visual progress bar during generation β Timestamped Logging: Detailed logs for every execution β Security Guardrails: Prevents malicious code patterns β Beautiful UI: 3D animated interface with modern design β One-Click Copy: Easy code copying to clipboard β Language-Specific Bot Names: Unique personality for each language
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FRONTEND (Browser) β
β Beautiful UI with 3D Animations β
β index.html β
ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ
β HTTP/REST
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β FASTAPI BACKEND (main.py) β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β API Routes β β
β β β’ /api/generate - Code generation β β
β β β’ /health - Health check β β
β β β’ /api/languages - Supported languages β β
β β β’ /api/guardrails - Security rules β β
β β β’ /api/logs - Recent log files β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Middleware & Validation β β
β β β’ CORS Configuration β β
β β β’ Request Validation β β
β β β’ Security Filtering β β
β β β’ Error Handling β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Logging System β β
β β β’ Timestamped log files (logs/) β β
β β β’ Console output β β
β β β’ Request/Response tracking β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β CODE GENERATION AGENT (agent.py) β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Language Configurations β β
β β β’ Language-specific system prompts β β
β β β’ Syntax validators β β
β β β’ Code quality scorers β β
β β β’ Fallback templates β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β Self-Consistency Engine β β
β β β’ Generate 9 solutions with different temps β β
β β β’ Score each solution (0-10+) β β
β β β’ Return best solution β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
β β LLM Interface (Qwen2.5-Coder-7B) β β
β β β’ Local GGUF model loading β β
β β β’ Inference with llama-cpp-python β β
β β β’ Multiple sampling strategies β β
β βββββββββββββββββββββββββββββββββββββββββββββββββββββββ β
ββββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β LOCAL LLM MODEL (Qwen2.5-Coder-7B) β
β Quantized GGUF Format (4-5GB) β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Technology Stack:
- Pure HTML5, CSS3, JavaScript (No frameworks)
- WebGL-inspired 3D animated backgrounds
- Responsive design (Mobile, Tablet, Desktop)
Key Features:
- Language selection buttons (6 languages)
- Textarea for prompt input
- Real-time progress bar with gradient animation
- Code output with syntax highlighting
- One-click copy functionality
- Error message display
- Security guardrails display
Animations:
- Floating blob animations (3D effect)
- Button hover effects
- Smooth transitions
- Loading spinner
- Progress bar with glow effect
Framework: FastAPI (Async Python Web Framework)
Components:
main.py
βββ Logging Setup
β βββ Timestamped log files (logs/codewizard_YYYYMMDD_HHMMSS.log)
β βββ File handler (DEBUG level)
β βββ Console handler (INFO level)
β
βββ API Routes
β βββ GET / - Serve index.html
β βββ GET /health - Health check endpoint
β βββ POST /api/generate - Code generation
β βββ GET /api/languages - Language info
β βββ GET /api/guardrails - Security rules
β βββ GET /api/logs - Log file listing
β
βββ Validation Layer
β βββ Prompt validation (max 1000 chars)
β βββ Language validation
β βββ Security pattern detection
β βββ Sanitization
β
βββ Error Handling
β βββ HTTP exception handlers
β βββ Graceful error messages
β βββ Detailed logging
β
βββ Middleware
βββ CORS configuration
βββ Request logging
βββ Response handling
Security Guardrails (16 patterns):
- SQL injection patterns (DROP TABLE, DELETE FROM, TRUNCATE)
- Code execution (eval, exec, system, os.system)
- Dangerous imports (import)
- Credential exposure (password, api_key, secret)
- System commands (rm -rf, chmod 777, sudo)
- Network exploitation (curl exec, wget exec)
Core Technology: Qwen2.5-Coder-7B (7 Billion Parameters)
Key Components:
Algorithm:
1. Generate 9 different code solutions
2. Use varying temperatures (0.1 to 0.9) for diversity
3. Score each solution based on quality metrics
4. Return the highest-scoring solution
Why 9 samples?
- Statistically significant diversity
- Computationally efficient (vs 16-25 samples)
- Good balance between quality and speed
Temperatures Used: [0.1, 0.2, 0.3, 0.4, 0.5, 0.5, 0.7, 0.9, 0.9]
- Low (0.1-0.3): Conservative, focused code
- Medium (0.4-0.5): Balanced exploration
- High (0.7-0.9): Creative variations
Benefit: Generates diverse solutions to select best
Scoring Criteria:
βββββββββββββββββββββββββββββββββββ¬βββββββββ
β Criteria β Points β
βββββββββββββββββββββββββββββββββββΌβββββββββ€
β Optimal length (50-1000 chars) β 2.0 β
β Function/Class definition β 3.0 β
β Return statement β 2.0 β
β Documentation/Comments β 1.0 β
β Type hints β 1.0 β
β Logic keywords present β 2.0 β
β Keyword matching (from prompt) β 0.5x β
βββββββββββββββββββββββββββββββββββΌβββββββββ€
β Penalties: β β
β - TODO/FIXME patterns β -5.0 β
β - Incomplete code β -5.0 β
βββββββββββββββββββββββββββββββββββ΄βββββββββ
Max Score: ~15-20 points
Each language has a custom system prompt:
# Python Prompt includes:
- Type hints examples
- Python idioms
- No markdown requirement
- Standard library usage
# JavaScript Prompt includes:
- ES6+ syntax
- Async/await patterns
- Modern JavaScript conventions
# Java Prompt includes:
- Class structure
- Java naming conventions
- Proper OOP patterns
# SQL Prompt includes:
- Query optimization tips
- GROUP BY patterns
- JOIN examplesRaw LLM Output
β
[1] Remove Markdown (```, ```python, etc.)
β
[2] Extract Code Section
β
[3] Remove Explanatory Text
β
[4] Check Minimum Length (>15 chars)
β
[5] Detect Bad Patterns (TODO, FIXME, pass)
β
[6] Syntax Validation (Python: ast.parse)
β
[7] Return Valid Code or Empty String
When LLM fails (model not available):
- Pattern-matched fallback templates
- Language-specific boilerplate
- Proper structure and syntax
- Ready-to-run code
Model Name: Qwen2.5-Coder-7B-Instruct
- Parameters: 7 Billion
- Quantization: Q5_K_M (GGUF format)
- Size: ~4.7 GB
- Architecture: Transformer-based
- Training Data: Code + general knowledge
- Context Window: 4096 tokens
MODEL_PARAMS = {
"model_path": "./models/qwen2.5-coder-7b-instruct-q5_k_m.gguf",
"n_ctx": 4096, # Context window size
"n_threads": 8, # CPU threads
"n_gpu_layers": 0, # 0 = CPU only
"verbose": False # Suppress debug output
}
INFERENCE_SETTINGS = {
"max_tokens": 1500, # Maximum output length
"temperature": 0.1-0.9, # Varies by sample
"top_p": 0.9, # Nucleus sampling
"repeat_penalty": 1.15, # Avoid repetition
"stop": ["Prompt:", "\n\n\n\n", "Output:"]
}Traditional Approach:
Prompt β Model β Single Output
(Deterministic)
Self-Consistency Approach:
Prompt β Model β Solution 1 (Score: 8.5)
β Solution 2 (Score: 7.2)
β Solution 3 (Score: 9.1) β Selected
β Model β Solution 4 (Score: 6.8)
...
Benefits:
- β Higher quality through selection
- β Diversity reduces errors
- β Objective scoring eliminates bias
- β Fallback for failed attempts
Input Prompt: ~100-200 tokens
System Prompt: ~400-500 tokens
Output Code: ~200-600 tokens
βββββββββββββββββββββββββββββ
Per Sample: ~700-1300 tokens
9 Samples: ~6,300-11,700 tokens
Total Generation: ~11,700 tokens (average)
- Python 3.8+
- 5GB free disk space (for model)
- 8GB+ RAM recommended
- Modern web browser
cd code-wizard# Create virtual environment
python -m venv venv
# Activate virtual environment
# On Windows:
venv\Scripts\activate
# On macOS/Linux:
source venv/bin/activate
# Install Python packages
pip install fastapi uvicorn pydantic llama-cpp-python# Create models directory
mkdir models
# Download Qwen2.5-Coder-7B (requires git-lfs)
# Option 1: Using huggingface-cli
pip install huggingface-hub
huggingface-cli download Qwen/Qwen2.5-Coder-7B-Instruct-GGUF \
qwen2.5-coder-7b-instruct-q5_k_m.gguf \
--local-dir ./models
# Option 2: Manual download from Hugging Face
# Visit: https://huggingface.co/Qwen/Qwen2.5-Coder-7B-Instruct-GGUF
# Download: qwen2.5-coder-7b-instruct-q5_k_m.gguf
# Place in: ./models/# Check model file
ls -lh models/qwen2.5-coder-7b-instruct-q5_k_m.gguf
# Should show ~4.7GB file# Start FastAPI server
python main.py
# Server will start at http://localhost:8000
# API Docs: http://localhost:8000/docsOpen browser and navigate to:
http://localhost:8000
-
Open Application
- Navigate to
http://localhost:8000 - See beautiful dashboard with animations
- Navigate to
-
Select Language
- Click on one of 6 language buttons
- Button will highlight with gradient
-
Describe Your Code
- Type natural language description
- Examples:
- "Write a function to check if a number is prime"
- "Create a user login system"
- "Build a SQL query to get top 10 products"
-
Generate Code
- Click "Generate Code" button
- Watch progress bar animate
- Code appears in right panel
-
Copy Code
- Click "Copy Code" button
- Code is copied to clipboard
- Confirmation message appears
-
Clear & Repeat
- Click "Clear" to reset
- Start new code generation
curl -X POST http://localhost:8000/api/generate \
-H "Content-Type: application/json" \
-d '{
"prompt": "Write a function to count vowels in a string",
"language": "python"
}'{
"code": "def count_vowels(text: str) -> int:\n vowels = \"aeiouAEIOU\"\n return sum(1 for c in text if c in vowels)",
"language": "python",
"prompt": "Write a function to count vowels in a string",
"timestamp": "2024-01-15T10:30:45.123456",
"bot_name": "PyWizard",
"status": "success",
"generation_time": 2.34
}curl http://localhost:8000/healthcurl http://localhost:8000/api/languagesUser Input
β
βββββββββββββββββββββββββββββββββββ
β 1. Length Check β Max 1000 chars
βββββββββββββββββββββββββββββββββββ€
β 2. Pattern Matching β 16 regex patterns
βββββββββββββββββββββββββββββββββββ€
β 3. Keyword Detection β Dangerous keywords
βββββββββββββββββββββββββββββββββββ€
β 4. Validation Response β Pass/Reject
βββββββββββββββββββββββββββββββββββ
β
[Approved] β Code Generation
[Rejected] β Error Message to User
SQL Injection:
- DROP TABLE
- DELETE FROM
- TRUNCATE TABLE
Code Execution:
- eval()
- exec()
- system()
- os.system()
Dangerous Imports:
- __import__
Credential Exposure:
- password =
- api_key =
- secret =
System Commands:
- rm -rf
- chmod 777
- sudo
- curl ... exec
- wget ... exec
200 OK - Code generated successfully
400 Bad Request - Invalid language or failed validation
403 Forbidden - Security pattern detected
503 Service Error - Agent not initialized
500 Server Error - Unexpected error
logs/
βββ codewizard_20240115_090000.log
βββ codewizard_20240115_091530.log
βββ codewizard_20240115_092245.log
βββ codewizard_20240115_095010.log
2024-01-15 09:00:00 - [INFO] - __main__ - π CODE WIZARD API - APPLICATION STARTUP
2024-01-15 09:00:01 - [INFO] - __main__ - β
FastAPI application initialized
2024-01-15 09:00:15 - [INFO] - __main__ - π₯ NEW CODE GENERATION REQUEST
2024-01-15 09:00:15 - [INFO] - __main__ - π€ Language: python
2024-01-15 09:00:15 - [INFO] - __main__ - π Prompt: Write a function to count vowels...
2024-01-15 09:00:15 - [INFO] - __main__ - β
Prompt validation passed
2024-01-15 09:00:15 - [INFO] - __main__ - π Starting code generation for python...
2024-01-15 09:02:45 - [INFO] - agent.py - β
Code generated successfully in 2.34s
2024-01-15 09:02:45 - [INFO] - __main__ - β
Code generated successfully in 2.34s
2024-01-15 09:02:45 - [INFO] - __main__ - π Generated code length: 234 characters
DEBUG - Detailed diagnostic information
INFO - General informational messages
WARNING - Warning messages for suspicious activity
ERROR - Error messages with stack traces
# View recent logs
tail -f logs/codewizard_*.log
# Get specific run logs
cat logs/codewizard_20240115_090000.log
# List all log files
ls -lh logs/http://localhost:8000
POST /api/generate
Content-Type: application/json
Request:
{
"prompt": string, // Max 1000 characters
"language": string // python|javascript|java|cpp|c|sql
}
Response:
{
"code": string,
"language": string,
"prompt": string,
"timestamp": string,
"bot_name": string,
"status": string,
"generation_time": float
}
Examples:
Prompt: "count vowels in string"
Language: "python"
Response Status:
200 - Success
400 - Invalid input
403 - Restricted pattern
500 - Server error
GET /health
Response:
{
"status": "healthy",
"service": "Code Wizard API",
"timestamp": string,
"uptime": string
}
GET /api/languages
Response:
{
"languages": ["python", "javascript", "java", "cpp", "c", "sql"],
"bots": {
"python": "PyWizard",
"javascript": "ScriptMaster",
...
},
"count": 6
}
GET /api/guardrails
Response:
{
"guardrails": [string],
"max_prompt_length": 1000,
"security_patterns_count": 16
}
GET /api/logs
Response:
{
"logs": [string],
"total": number
}
User Interface (Browser)
β
βββ [User Input]
β’ Language selection
β’ Code prompt description
β’ Click "Generate"
β
HTTP POST /api/generate
β
FastAPI Route Handler (main.py)
ββ Log request details
ββ Validate language
ββ Validate prompt
β ββ Check length (<1000 chars)
β ββ Check security patterns
β ββ Return error if invalid
ββ Call CodeGeneratorAgent
β
Agent Initialization (agent.py)
ββ Load Qwen2.5-Coder LLM
ββ Setup language-specific prompt
β
Self-Consistency Generation Loop
ββ Iteration 1 (temp=0.1)
β ββ LLM inference
β ββ Extract code
β ββ Score: 8.5
ββ Iteration 2 (temp=0.2)
β ββ LLM inference
β ββ Extract code
β ββ Score: 7.2
ββ ...
ββ Iteration 9 (temp=0.9)
β ββ LLM inference
β ββ Extract code
β ββ Score: 6.5
β
Select Best Solution
ββ Find max score (9.1)
ββ Return associated code
β
Return Response
ββ Serialize to JSON
ββ Include metadata
ββ Log generation time
ββ Log code stats
β
HTTP 200 Response
{
"code": "...",
"language": "python",
"generation_time": 2.34,
...
}
β
Browser Display
ββ Stop progress bar
ββ Show generated code
ββ Enable copy button
ββ Hide loading spinner
β
User Action
ββ Copy code (one-click)
ββ Clear and try again
ββ Refine prompt
Total Time: ~2-4 seconds (average)
Breakdown:
ββ API Round-trip: 100-200ms
ββ Validation: 50-100ms
ββ Model Loading: 500-1000ms (first run only)
ββ 9x LLM Inference: 1000-2000ms (most time)
ββ Post-processing: 100-200ms
At Rest:
ββ Python processes: ~100-150MB
ββ LLM Model: ~4.7GB (loaded once)
ββ Total: ~4.8GB
During Generation:
ββ Base overhead: ~500MB
ββ Generation buffers: ~200MB
ββ Peak: ~5.5GB
Model Quantization:
- Q5_K_M format reduces model size by 60%
- Minimal quality loss vs FP32
- Faster inference speed
Temperature Diversity:
- Varying temperatures (0.1-0.9) prevents overfitting
- Reduces similar solution duplicates
- Better coverage of solution space
Async Processing:
- FastAPI handles multiple concurrent requests
- Non-blocking I/O operations
- Scalable to many users
Code Scoring:
- Avoids manual review
- Objective selection criteria
- Consistent quality
Error: Model not found at ./models/qwen2.5-coder-7b-instruct-q5_k_m.gguf
Solution:
1. Download model from Hugging Face
2. Place in ./models/ directory
3. Verify file size (~4.7GB)
4. Restart application
Error: RuntimeError: CUDA out of memory or RAM full
Solution:
1. Close other applications
2. Ensure 8GB+ available RAM
3. Set n_gpu_layers=0 (CPU only)
4. Reduce context window (n_ctx=2048)
5. Use GPU if available (n_gpu_layers > 0)
Typical: 2-4 seconds
Slow: >10 seconds
Causes:
- Low system RAM (swap usage)
- High CPU usage from other apps
- Slow disk I/O
Solutions:
1. Close background apps
2. Increase available RAM
3. Use SSD for better I/O
4. Monitor logs for errors
Error: Address already in use (port 8000)
Solution 1: Use different port
python main.py --port 8001
Solution 2: Kill existing process
# Windows:
netstat -ano | findstr :8000
taskkill /PID <PID> /F
# macOS/Linux:
lsof -i :8000
kill -9 <PID>
Error: JSON decode error or malformed response
Solution:
1. Check logs for error messages
2. Verify prompt validation (max 1000 chars)
3. Check for security pattern issues
4. Restart FastAPI server
5. Clear browser cache
Generated code seems incomplete or incorrect
Possible Causes:
1. Prompt is ambiguous
2. Language not well-suited for task
3. Model hallucinates (rare)
Solutions:
1. Rephrase prompt more specifically
2. Try different language
3. Check logs for generation score
4. Try again (different temperature sampling)
# In main.py, change logging level
logging.basicConfig(level=logging.DEBUG)# Check log file for generation metrics
logs/codewizard_*.log
# Key metrics to watch:
- Generation time (goal: 2-4s)
- Code length (optimal: 100-500 chars)
- Solution scores (goal: >8.0)# Monitor CPU/Memory during generation
top # macOS/Linux
taskmgr # Windows GUI
wsl-manager # WSL
# Expected:
- CPU: 60-90%
- RAM: 70-90% (during generation)Potential improvements:
-
Model Upgrades
- Qwen2.5-Coder-32B (larger, better quality)
- Specialized models for each language
- Fine-tuned models for specific domains
-
Feature Additions
- Code explanation/documentation
- Automated code testing
- Performance optimization suggestions
- Integration with IDEs (VS Code plugin)
-
UI Improvements
- Dark/Light theme toggle
- Code syntax highlighting
- Multiple tabs for different languages
- Prompt history
-
API Enhancements
- WebSocket for streaming responses
- Batch code generation
- Code comparison tool
- Rating/feedback system
-
Security
- Rate limiting per IP
- API key authentication
- Usage analytics
- Abuse detection
This project uses open-source components:
- FastAPI: MIT License
- Qwen2.5-Coder: Qwen License (Commercial use allowed)
- llama-cpp-python: MIT License
To contribute improvements:
- Test changes locally
- Verify logging works
- Check security guardrails
- Update documentation
- Submit improvements
For issues or questions:
- Check logs:
logs/codewizard_*.log - Review troubleshooting section
- Check API documentation:
/docs - Monitor health:
/healthendpoint
Understand the technology:
- Self-Consistency Prompting: Paper
- Qwen Models: GitHub
- FastAPI: Documentation
- llama.cpp: GitHub
Made with β€οΈ by Code Wizard Team Last Updated: January 2024# Code-Generator-Agent