Automated testing and validation system for Freshworks marketplace apps with intelligent error learning.
Validates Platform 3.0 compliance, Crayons UI usage, and learns from validation failures to improve app generation over time.
Two Testing Modes:
- Generate & Test - Create new apps from prompts and validate
- Evaluate - Test existing apps without regeneration
Key Features:
- β Quick Setup Script - Interactive criteria creation and app setup
- β FDK validation with detailed error reporting
- β Platform 3.0 compliance checking (5 criteria)
- β Crayons UI component detection
- β Automated error learning and pattern detection
- β Custom requirements tracking
- β 100-point scoring with letter grades (A-F)
benchmarking/
βββ automate_test.py # Main automation script
βββ setup_test.py # Quick setup for criteria & apps
βββ convert_criteria.py # Plain text to JSON converter (NEW!)
βββ error_learner.py # Error pattern detection & learning
βββ requirements.txt # Python dependencies
βββ example-criteria.json # Example criteria template
βββ use-cases/ # Test case definitions
βββ test-criteria/ # Validation criteria per app
βββ results/ # Benchmarking scores & reports
βββ test-apps/ # Sample test applications
βββ .dev/ # Error learning data
βββ comparison/error_database.json
βββ planning/AUTO_SKILL_UPDATES.md
pip install -r requirements.txt
npm install -g @freshworks/fdkcd benchmarking
# MODE 1: Generate & Test New Apps
python3 automate_test.py --app APP003
# MODE 2: Evaluate Existing Apps (Quick Setup)
# Step 1: Setup accepts BOTH plain text and JSON!
python3 setup_test.py APP001
# (paste plain text OR JSON, type 'END')
# Step 2: Run evaluation with saved criteria file
python3 automate_test.py --evaluate test-apps/APP001 --app-id APP001 --requirements test-criteria/APP001-criteria.json
# OR with comma-separated requirements
python3 automate_test.py --evaluate test-apps/APP001 --app-id APP001 --requirements "OAuth,Webhooks"
# MODE 3: Evaluate Existing Apps (Direct)
python3 automate_test.py --evaluate test-apps/zapier --app-id APP005
python3 automate_test.py --evaluate test-apps/zapier --requirements "OAuth,Events,Sync"
# Error Learning & Statistics
python3 automate_test.py --show-stats
python3 automate_test.py --generate-skill-updatesStep 1: Choose or Create a Use Case
# View available use cases
cat use-cases/use_cases.json | grep '"id"'
# Available: APP001, APP002, APP003, APP004, APP005, APP006, APP007Step 2: Run the Test Command
python3 automate_test.py --app APP003Step 3: Generate the App
The script will:
- Display the prompt for the app
- Wait for you to generate the app in a separate Cursor window
- Prompt you to press ENTER when ready
Step 4: Press ENTER to Validate
The script will automatically:
- Run FDK validation
- Check Platform 3.0 compliance
- Detect Crayons usage
- Calculate score and grade
Step 5: Review Results
# View results
cat results/APP003_result.json
# Check for errors
python3 automate_test.py --show-statsStep 1: Run Setup Script
python3 setup_test.py APP001Step 2: Paste Your Criteria
You can paste either plain text OR JSON:
Plain Text (Easiest):
Requirements:
- OAuth 2.0
- Webhooks
- Platform 3.0
Features:
- Request templates
- Data methods
Description:
Ticket automation app
END
Or JSON:
{
"requirements": ["OAuth 2.0", "Webhooks", "Platform 3.0"],
"expected_files": ["manifest.json", "server/server.js"],
"description": "Ticket automation app"
}
ENDStep 3: Copy Your App
# Copy your existing app to test-apps/
cp -r /path/to/your/app/* test-apps/APP001/Step 4: Run Evaluation
# Use the command provided by setup script
python3 automate_test.py --evaluate test-apps/APP001 --app-id APP001 --requirements test-criteria/APP001-criteria.jsonStep 5: Review Results
cat results/APP001_result.jsonStep 1: Copy App to test-apps/
cp -r /path/to/my-freshdesk-app test-apps/my-appStep 2: Run Evaluation
# Basic evaluation
python3 automate_test.py --evaluate test-apps/my-app
# With custom requirements
python3 automate_test.py --evaluate test-apps/my-app --requirements "OAuth 2.0,Webhooks,Crayons UI"
# With custom app ID
python3 automate_test.py --evaluate test-apps/my-app --app-id MY_APP_001Step 3: Review Results
# Results saved as EVAL_MY-APP_result.json or MY_APP_001_result.json
cat results/EVAL_MY-APP_result.jsonStep 1: Prepare Apps
# Copy multiple apps
cp -r /path/to/app1 test-apps/app1
cp -r /path/to/app2 test-apps/app2
cp -r /path/to/app3 test-apps/app3Step 2: Test Each App
python3 automate_test.py --evaluate test-apps/app1 --app-id APP_001
python3 automate_test.py --evaluate test-apps/app2 --app-id APP_002
python3 automate_test.py --evaluate test-apps/app3 --app-id APP_003Step 3: View Statistics
# View error patterns across all tests
python3 automate_test.py --show-stats
# Generate improvement suggestions
python3 automate_test.py --generate-skill-updatesStep 4: Compare Results
# View all results
ls -lh results/
# Compare scores
grep -A 3 '"score"' results/APP_00*.jsonWhat Gets Checked:
-
β FDK Validation (20 points)
- Runs
fdk validate - Checks for syntax errors
- Validates manifest.json structure
- Runs
-
β File Structure (20 points)
- Verifies required files exist
- Checks manifest.json, server.js (if serverless), app files
-
β Platform 3.0 Compliance (40 points)
- Platform version 3.0 (8 pts)
- Uses 'modules' structure (8 pts)
- No whitelisted-domains (8 pts)
- Engines block present (8 pts)
- Correct location placement (8 pts)
-
β Crayons Usage (20 points)
- Detects fw-* components
- Checks for Crayons imports
Result File Structure:
{
"app_id": "APP001",
"timestamp": "2026-02-26T10:30:00",
"score": {
"total_score": 85.0,
"percentage": 85.0,
"grade": "B",
"breakdown": {
"fdk_validation": 20,
"file_structure": 20,
"platform3_compliance": 32,
"crayons_usage": 13
}
},
"validation": {
"success": true,
"errors": [],
"warnings": []
},
"platform3_compliance": {
"platform_version_3_0": true,
"modules_structure": true,
"no_whitelisted_domains": true,
"engines_present": true,
"correct_location_placement": true
},
"requirements_met": {
"OAuth 2.0": true,
"Webhooks": true,
"Platform 3.0": true
}
}Scenario 1: Test Before Deployment
# Copy production app
cp -r /path/to/prod-app test-apps/prod-app
# Validate
python3 automate_test.py --evaluate test-apps/prod-app --app-id PROD_V1
# Check score (should be A or B)
grep '"grade"' results/PROD_V1_result.jsonScenario 2: Compare Two Versions
# Test version 1
python3 automate_test.py --evaluate test-apps/app-v1 --app-id APP_V1
# Test version 2
python3 automate_test.py --evaluate test-apps/app-v2 --app-id APP_V2
# Compare scores
diff results/APP_V1_result.json results/APP_V2_result.jsonScenario 3: Track Improvements
# Test 1: Initial version
python3 automate_test.py --evaluate test-apps/my-app --app-id TEST_001
# Result: Grade C (75%)
# Fix issues based on results
# ... make changes ...
# Test 2: After fixes
python3 automate_test.py --evaluate test-apps/my-app --app-id TEST_002
# Result: Grade B (85%)
# Compare
diff results/TEST_001_result.json results/TEST_002_result.jsonScenario 4: Validate Specific Requirements
# Test with specific requirements
python3 automate_test.py --evaluate test-apps/oauth-app \
--requirements "OAuth 2.0,Token refresh,Error handling,Crayons UI,Platform 3.0"
# Check which requirements were met
grep -A 10 '"requirements_met"' results/EVAL_OAUTH-APP_result.jsonLow Score? Check These:
-
FDK Validation Failed (0/20 points)
# Run FDK validate manually to see errors cd test-apps/my-app fdk validate
-
Missing Files (0/20 points)
# Check what files are expected cat test-criteria/APP001-criteria.json | grep expected_files # List what you have ls -la test-apps/APP001/
-
Platform 3.0 Issues (0-32/40 points)
# Check manifest.json cat test-apps/my-app/manifest.json | grep platform cat test-apps/my-app/manifest.json | grep modules cat test-apps/my-app/manifest.json | grep engines
-
No Crayons (0/20 points)
# Search for Crayons components grep -r "fw-" test-apps/my-app/app/ grep -r "crayons" test-apps/my-app/
# Test predefined use case
python3 automate_test.py --app APP003
# Setup new test
python3 setup_test.py APP001
# Evaluate existing app
python3 automate_test.py --evaluate test-apps/my-app
# Evaluate with requirements
python3 automate_test.py --evaluate test-apps/my-app --requirements "OAuth,Webhooks"
# View statistics
python3 automate_test.py --show-stats
# Generate suggestions
python3 automate_test.py --generate-skill-updates
# View results
cat results/APP001_result.json
# List all results
ls -lh results/Apps are scored on a 100-point scale with letter grades (A-F):
| Category | Weight | Points | Description |
|---|---|---|---|
| FDK Validation | 20% | 20 | Pass/fail FDK validation |
| File Structure | 20% | 20 | Required files present |
| Platform 3.0 Compliance | 40% | 40 | 5 checks Γ 8 pts each |
| Crayons Usage | 20% | 20 | UI component library usage |
Platform 3.0 Compliance Checks:
- Platform version 3.0
- Uses 'modules' structure
- No whitelisted-domains
- Engines block present
- Correct location placement (auto-pass for background/serverless apps)
Grade Scale: A (90-100) β’ B (80-89) β’ C (70-79) β’ D (60-69) β’ F (<60)
Example Result:
{
"app_id": "APP003",
"score": {
"total_score": 85.0,
"percentage": 85.0,
"grade": "B"
},
"validation": {
"success": true,
"platform_errors": [],
"lint_errors": []
},
"platform3_compliance": {
"platform_version_3_0": true,
"modules_structure": true,
"no_whitelisted_domains": true,
"engines_present": true,
"correct_location_placement": true
}
}What each grade means:
- A (90-100): Production-ready, follows all best practices
- B (80-89): Good quality, minor improvements needed
- C (70-79): Functional but needs attention to standards
- D (60-69): Multiple issues, requires fixes before deployment
- F (<60): Significant problems, major refactoring needed
Automatically tracks validation failures and generates improvement suggestions.
How it works:
- Records FDK validation errors with context
- Identifies patterns across multiple apps
- Generates actionable skill improvements
- Tracks which patterns have been resolved
Common patterns tracked:
deprecated_request_api- Using old request methodsasync_no_await- Async functions without awaitrequest_schema_error- Incorrect request template structureinvalid_location- Wrong location placementoauth_integrations- OAuth config structure issues
Commands:
# View error statistics
python3 error_learner.py stats
# Generate skill improvement suggestions
python3 error_learner.py suggest
# View generated suggestions
cat .dev/planning/AUTO_SKILL_UPDATES.mdExample output:
π Error Learning Statistics
Total errors recorded: 2
Unique patterns: 4
Fixed patterns: 0
Unfixed patterns: 4
Most common patterns:
β request_schema_error: 3 occurrences
β deprecated_request_api: 2 occurrences
β async_no_await: 2 occurrences
β product_field_deprecated: 1 occurrences
Data stored in:
- Error Database:
.dev/comparison/error_database.json - Skill Updates:
.dev/planning/AUTO_SKILL_UPDATES.md
# Test APP003 (Freshdesk-GitHub Integration)
python3 automate_test.py --app APP003
# Opens prompt, you generate in separate Cursor window, then validates# Step 1: Setup with interactive input
python3 setup_test.py APP001
# Option A: Paste plain text (EASIEST!)
# Requirements:
# - OAuth 2.0
# - Webhooks
# - Platform 3.0
#
# Features:
# - Request templates
# - Data methods
# - Custom iParams
#
# Description:
# Ticket automation app
#
# Type 'END' and press Enter
# OR Option B: Paste JSON
# {
# "requirements": ["OAuth 2.0", "Webhooks", "Platform 3.0"],
# "expected_files": ["manifest.json", "server/server.js"],
# "description": "Ticket automation app"
# }
# Type 'END' and press Enter
# Step 2: Copy your app
cp -r /path/to/your/app/* test-apps/APP001/
# Step 3: Run evaluation using saved criteria file
python3 automate_test.py --evaluate test-apps/APP001 --app-id APP001 --requirements test-criteria/APP001-criteria.json# Setup and copy app in one command
python3 setup_test.py APP001 --app-path /path/to/your/app
# Then run evaluation
python3 automate_test.py --evaluate test-apps/APP001 --app-id APP001# After running setup_test.py, use the saved criteria file
python3 automate_test.py --evaluate test-apps/APP001 --app-id APP001 --requirements test-criteria/APP001-criteria.json
# This loads:
# - All requirements from the criteria file
# - Expected files list
# - Automatically uses them for validation
# Result: APP001_result.json with all requirements tracked# Copy your app to test-apps/
cp -r ~/my-freshdesk-app test-apps/my-app
# Evaluate it
python3 automate_test.py --evaluate test-apps/my-app
# Result: EVAL_MY-APP_result.json with score and grade# Evaluate with specific requirements
python3 automate_test.py --evaluate test-apps/oauth-app \
--requirements "OAuth 2.0,Token refresh,Webhook support,Crayons UI"
# Requirements are tracked in the results file# Evaluate version 1
python3 automate_test.py --evaluate test-apps/app-v1 --app-id APP_V1
# Evaluate version 2
python3 automate_test.py --evaluate test-apps/app-v2 --app-id APP_V2
# Compare results/APP_V1_result.json vs results/APP_V2_result.json# Run multiple tests
python3 automate_test.py --app APP001
python3 automate_test.py --app APP002
python3 automate_test.py --app APP003
# View error statistics
python3 automate_test.py --show-stats
# Generate improvement suggestions
python3 automate_test.py --generate-skill-updates7 predefined test cases covering various app types:
| ID | Name | Type | Product |
|---|---|---|---|
| APP001 | MS Teams Presence Checker | Frontend | Freshservice |
| APP002 | Freshservice-Asana Sync | Serverless | Freshservice |
| APP003 | Freshdesk-GitHub Integration | Frontend | Freshdesk |
| APP004 | Password Generator | Frontend | Freshservice |
| APP005 | Freshdesk-Zapier Contact Sync | Serverless | Freshdesk |
| APP006 | Jira-Freshdesk OAuth Sync | Serverless | Freshdesk |
| APP007 | Ticket Field Validation | Frontend | Freshdesk |
Interactive mode - paste your criteria in ANY format:
python3 setup_test.py APP001
# Option A: Plain Text (EASIEST!)
# TIP: Include "Frontend" or "Serverless" to auto-detect app type!
Requirements:
- Frontend
- OAuth 2.0
- Webhooks
- Platform 3.0
Features:
- Request templates
- Data methods
- Custom iParams
Description:
Ticket automation with external API
# Type 'END' and press Enter
# Auto-detects: Frontend app (no server/server.js expected)
# Option B: JSON format (also works!)
{
"requirements": ["OAuth 2.0", "Webhooks", "Platform 3.0"],
"expected_files": ["manifest.json", "server/server.js"],
"description": "Ticket automation with external API"
}
# Type 'END' and press Enter
# Script automatically:
# 1. Detects format (plain text or JSON)
# 2. Fixes common typos (erverless β Serverless, iparam β iParam, etc.)
# 3. Detects app type (Frontend/Serverless) from requirements
# 4. Generates appropriate expected files based on app type
# 5. Saves criteria to test-criteria/APP001-criteria.json
# 6. Creates test-apps/APP001/ directory
# 7. Shows evaluation command to runWith existing app (one command):
# Copy app automatically
python3 setup_test.py APP001 --app-path /path/to/your/app
# Or with criteria file
python3 setup_test.py APP001 --criteria-file my-criteria.json --app-path /path/to/appThen evaluate:
# Use saved criteria file (RECOMMENDED)
python3 automate_test.py --evaluate test-apps/APP001 --app-id APP001 --requirements test-criteria/APP001-criteria.json
# OR use comma-separated requirements
python3 automate_test.py --evaluate test-apps/APP001 --app-id APP001 --requirements "OAuth 2.0,Webhooks,Platform 3.0"Step 1: Add to use-cases/use_cases.json:
{
"id": "APP008",
"name": "My Custom App",
"app_type": "Frontend",
"product": "freshdesk",
"prompt": "Description of your app",
"expected_files": ["manifest.json", "app/index.html", "app/scripts/app.js"]
}Step 2: Run python3 automate_test.py --app APP008
Step 1: Copy app to benchmarking/test-apps/YOUR-APP-NAME/
Step 2: Evaluate with requirements:
# Option A: Use existing use case ID
python3 automate_test.py --evaluate test-apps/YOUR-APP-NAME --app-id APP008
# Option B: Provide custom requirements
python3 automate_test.py --evaluate test-apps/YOUR-APP-NAME --requirements "OAuth,API integration,Crayons UI"
# Option C: Just validate (no requirements)
python3 automate_test.py --evaluate test-apps/YOUR-APP-NAMEStep 3: View results in results/ folder
βββββββββββββββββββ
β Define Use Case β
β (use_cases.json)β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ ββββββββββββββββββββ
β Run Test Script βββββββΆβ Generate in β
β --app APP008 β β Separate Cursor β
ββββββββββ¬βββββββββ ββββββββββ¬ββββββββββ
β β
ββββββββββββββββββββββββββ
β Press ENTER
βΌ
βββββββββββββββββββ
β FDK Validation β
β + Compliance β
β + Scoring β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Results JSON β
β + Error Learningβ
βββββββββββββββββββ
Commands:
# 1. Add use case to use-cases/use_cases.json
# 2. Run test
python3 automate_test.py --app APP008
# 3. Generate in separate Cursor window
# 4. Press ENTER to validate
# 5. Check results in results/APP008_result.jsonβββββββββββββββββββ
β Run setup_test β
β python3 setup_ β
β test.py APP001 β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Paste Criteria β
β JSON (or file) β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ ββββββββββββββββββββ
β Criteria Saved β β Directory Createdβ
β test-criteria/ β β test-apps/APP001 β
ββββββββββ¬βββββββββ ββββββββββ¬ββββββββββ
β β
ββββββββββββββ¬ββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββ
β Copy App or Generate β
β cp -r /path/to/app test-apps/ β
ββββββββββ¬βββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββ
β Run Evaluation β
β (command shown β
β by setup) β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β FDK Validation β
β + Compliance β
β + Scoring β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ ββββββββββββββββ
β Results JSON βββββββΆβ Fix Issues β
β + Error Learningβ β Re-evaluate β
βββββββββββββββββββ ββββββββββββββββ
Commands:
# 1. Quick setup with interactive criteria
python3 setup_test.py APP001
# (paste criteria JSON)
# 2. Copy your app
cp -r /path/to/my-app test-apps/APP001/
# 3. Run evaluation (command provided by setup script)
python3 automate_test.py --evaluate test-apps/APP001 --app-id APP001 --requirements "..."
# 4. Check results
cat results/APP001_result.jsonβββββββββββββββββββ
β Existing App β
β (any source) β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Copy to β
β test-apps/ β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β Run Evaluation β
β --evaluate PATH β
β (+ requirements)β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ
β FDK Validation β
β + Compliance β
β + Scoring β
ββββββββββ¬βββββββββ
β
βΌ
βββββββββββββββββββ ββββββββββββββββ
β Results JSON βββββββΆβ Fix Issues β
β + Error Learningβ β Re-evaluate β
βββββββββββββββββββ ββββββββββββββββ
Commands:
# 1. Copy app to test-apps/
cp -r /path/to/my-app test-apps/my-app
# 2. Evaluate with requirements
python3 automate_test.py --evaluate test-apps/my-app --requirements "OAuth,Crayons UI,Platform 3.0"
# 3. Or use existing use case ID
python3 automate_test.py --evaluate test-apps/my-app --app-id APP005
# 4. Check results
cat results/EVAL_MY-APP_result.jsonThe --evaluate flag enables testing already-generated apps without regeneration:
Benefits:
- β Test apps from any source (manual, AI-generated, production)
- β No need to regenerate or modify existing apps
- β Track custom requirements alongside standard checks
- β Compare multiple versions of the same app
- β Validate apps before deployment
What it checks:
- FDK validation (pass/fail)
- File structure (auto-detected or from use case)
- Platform 3.0 compliance (5 checks)
- Crayons UI usage
- Custom requirements (if provided)
Example use cases:
# Validate a production app before update
python3 automate_test.py --evaluate test-apps/prod-app
# Check if app meets specific requirements
python3 automate_test.py --evaluate test-apps/oauth-app --requirements "OAuth 2.0,Token refresh,Error handling"
# Compare two versions
python3 automate_test.py --evaluate test-apps/v1 --app-id APP_V1
python3 automate_test.py --evaluate test-apps/v2 --app-id APP_V2- Check error stats regularly after test runs
- Generate suggestions when patterns reach 2+ occurrences
- Review and apply skill updates to improve app generation
- Re-run tests to verify improvements
- Use evaluation mode to validate apps from any source
- Track requirements for better quality assurance
python3 automate_test.py [OPTIONS]
Options:
--app APP_ID Test a predefined use case (e.g., APP001)
--evaluate PATH Evaluate an existing app at PATH
--app-id ID Custom app ID for evaluation results
--requirements "R1,R2" Comma-separated requirements to track
--show-stats Display error learning statistics
--generate-skill-updates Generate improvement suggestions
--benchmark-dir PATH Custom benchmark directory (default: ~/benchmark-test)
-h, --help Show help messageExamples:
# Generate and test
python3 automate_test.py --app APP003
# Evaluate existing
python3 automate_test.py --evaluate test-apps/my-app
# Evaluate with requirements
python3 automate_test.py --evaluate test-apps/my-app --requirements "OAuth,Webhooks"
# Custom app ID
python3 automate_test.py --evaluate test-apps/my-app --app-id CUSTOM_001
# Statistics
python3 automate_test.py --show-stats
# Generate suggestions
python3 automate_test.py --generate-skill-updatesIssue: "FDK not found"
# Solution: Install FDK globally
npm install -g @freshworks/fdkIssue: "Use case not found"
# Solution: Check use-cases/use_cases.json for valid IDs
cat use-cases/use_cases.json | grep '"id"'Issue: "App path not found"
# Solution: Use relative path from benchmarking/ or absolute path
python3 automate_test.py --evaluate test-apps/my-app # relative
python3 automate_test.py --evaluate /full/path/to/app # absoluteIssue: Validation passes but score is low
- Check Platform 3.0 compliance (40 points)
- Verify Crayons usage (20 points)
- Ensure all expected files are present (20 points)
Issue: Error learning not working
# Check if error_learner.py exists
ls error_learner.py
# Check error database
cat .dev/comparison/error_database.jsonLast Updated: February 26, 2026 β’ Internal Freshworks tool for marketplace app quality assurance