Skip to content

Add Modular Submission Evaluation Pipeline (Validation → Execution → Scoring)#152

Merged
aviralsaxena16 merged 2 commits intoOpenLake:mainfrom
aviralsaxena16:Adding-evaluation-pipeline
Mar 22, 2026
Merged

Add Modular Submission Evaluation Pipeline (Validation → Execution → Scoring)#152
aviralsaxena16 merged 2 commits intoOpenLake:mainfrom
aviralsaxena16:Adding-evaluation-pipeline

Conversation

@aviralsaxena16
Copy link
Member

@aviralsaxena16 aviralsaxena16 commented Mar 22, 2026

🚀 Add Modular Submission Evaluation Pipeline (Validation → Execution → Scoring)

🔍 Overview

This PR introduces a structured evaluation pipeline for Canonforces submissions, transforming the existing submission flow into a modular, extensible system.

The pipeline now processes each submission through clearly defined stages:

Validation → Sandbox Execution → Scoring → Result

This improves reliability, maintainability, and aligns the system with real-world evaluation pipelines used in research and benchmarking systems.


✨ Key Changes

1️⃣ Validation Layer

  • Added a dedicated validation module to verify:

    • code presence and length
    • language selection
  • Prevents invalid submissions before execution

📁 src/lib/pipeline/validation.ts


2️⃣ Sandbox Execution Layer

  • Introduced an abstraction over Judge0 execution
  • Encapsulates execution logic into a reusable function

📁 src/lib/pipeline/execution.ts

  • Ensures all code runs in a controlled sandbox environment
  • Decouples execution from UI logic

3️⃣ Scoring Layer

  • Added structured scoring system:

    • evaluates outputs against test cases
    • computes pass/fail per test case
    • generates overall score (%)

📁 src/lib/pipeline/scoring.ts

Example output:

{
  "total": 5,
  "passed": 4,
  "score": 80,
  "results": [
    { "status": "Accepted" },
    { "status": "Wrong Answer" }
  ]
}

4️⃣ Unified Pipeline Orchestrator

  • Created a central pipeline controller:

📁 src/lib/pipeline/index.ts

  • Handles:

    • validation
    • execution loop
    • scoring
  • Returns structured responses with stage-level feedback


5️⃣ UI Integration

  • Updated submission flow to reflect pipeline stages:

    • "Running Validation..."
    • "Running Execution..."
    • "Scoring submission..."
  • Displays structured results in existing Output component


🧠 Why This Matters

This refactor introduces pipeline-based architecture, which:

  • improves separation of concerns
  • enables easy extension (e.g., new evaluators, metrics)
  • ensures deterministic evaluation flow
  • mirrors real-world systems used in benchmarking and research pipelines

🔬 Alignment with Advanced Evaluation Systems

The new architecture aligns with systems that:

  • validate structured inputs
  • execute code in sandboxed environments
  • compute standardized metrics
  • generate reproducible results

This makes Canonforces closer to a general-purpose evaluation framework, not just a CP platform.


📦 New File Structure

src/lib/pipeline/
 ├ validation.ts
 ├ execution.ts
 ├ scoring.ts
 └ index.ts

🧪 Future Improvements

  • Add JSON/PDF report generation for submissions
  • Introduce configurable scoring weights
  • Add batch evaluation for contests
  • Integrate Docker-based execution for full isolation
  • Add performance metrics (runtime, memory)

✅ Result

Canonforces now includes a modular, extensible evaluation pipeline that:

  • validates inputs
  • executes code safely
  • scores submissions systematically
  • produces structured outputs

This significantly enhances the platform's engineering depth and scalability.

Summary by CodeRabbit

  • New Features
    • Added comprehensive code submission pipeline with validation, execution, and automated scoring of test cases.
    • Improved error handling for code submissions with clear validation and execution stage feedback.
    • Enhanced submission results display with per-test case status tracking and percentage scoring.

@vercel
Copy link

vercel bot commented Mar 22, 2026

@aviralsaxena16 is attempting to deploy a commit to the aviralsaxena16's projects Team on Vercel.

A member of the Team first needs to authorize it.

@github-actions
Copy link

🎉 Thanks for Your Contribution to CanonForces! ☺️

We'll review it as soon as possible. In the meantime, please:

  • ✅ Double-check the file changes.
  • ✅ Ensure that all commits are clean and meaningful.
  • ✅ Link the PR to its related issue (e.g., Closes #123).
  • ✅ Resolve any unaddressed review comments promptly.

💬 Need help or want faster feedback?
Join our Discord 👉 CanonForces Discord

Thanks again for contributing 🙌 – @aviralsaxena16!
cc: @aviralsaxena16

@coderabbitai
Copy link

coderabbitai bot commented Mar 22, 2026

Caution

Review failed

Pull request was closed or merged during review

Walkthrough

This pull request introduces a modular pipeline system for code submission processing. Four new TypeScript modules are created to handle submission validation, sandbox execution, test case scoring, and orchestration. The Dockerfile adds a runtime environment variable, and the question submission handler is refactored to delegate logic to the new pipeline system instead of handling it inline.

Changes

Cohort / File(s) Summary
Pipeline Infrastructure
src/lib/pipeline/validation.ts, src/lib/pipeline/execution.ts, src/lib/pipeline/scoring.ts, src/lib/pipeline/index.ts
Four new modules added: validation checks code/language; execution POSTs to sandbox API; scoring compares outputs and computes percentage; orchestrating pipeline coordinates the three-stage workflow with error branching.
Docker Configuration
Dockerfile
Added GEMINI_API_KEY=test-key environment variable alongside existing build-time API key variable.
Question Submission Handler
src/pages/questions/[id].tsx
Refactored handleSubmit to use runPipeline, replacing inline validation/execution/scoring logic with pipeline orchestration; updated error handling and result display; removed Firestore local history persistence.

Sequence Diagram

sequenceDiagram
    participant Client as Client<br/>(QuestionBar)
    participant Pipeline as runPipeline
    participant Validator as validateSubmission
    participant Executor as executeInSandbox
    participant Scorer as scoreSubmission
    participant API as /api/hello

    Client->>Pipeline: runPipeline({code, language, testCases})
    Pipeline->>Validator: validateSubmission({code, language})
    Validator-->>Pipeline: {valid, error?}
    
    alt Validation Failed
        Pipeline-->>Client: {stage: "validation", error}
    else Validation Passed
        loop For each testCase
            Pipeline->>Executor: executeInSandbox({language, code, input})
            Executor->>API: POST {language, codeValue, input}
            API-->>Executor: {data: {run}}
            Executor-->>Pipeline: ExecutionResult | null
        end
        
        alt Execution Failed
            Pipeline-->>Client: {stage: "execution", error}
        else Execution Passed
            Pipeline->>Scorer: scoreSubmission(executionOutputs, testCases)
            Scorer-->>Pipeline: ScoreResult {passed, total, score, results}
            Pipeline-->>Client: {stage: "completed", data: ScoreResult}
        end
    end
Loading

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐰 From validation to sandbox we hop with delight,
Each test case executing shines oh so bright,
The scoring reveals which submissions are blessed,
A pipeline of magic—putting code to the test! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name Status Explanation Resolution
Description check ❓ Inconclusive The PR description comprehensively covers the pipeline architecture, new modules, rationale, and future improvements, but does not follow the repository's required template structure. Consider restructuring the description to match the template: use the prescribed sections (Related Issue, Changes Introduced, Why This Change, Testing, Documentation Updates, Checklist, etc.) to ensure consistency with repository standards.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately captures the main change: introducing a modular submission evaluation pipeline with clear stages (Validation → Execution → Scoring).
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Tip

You can disable sequence diagrams in the walkthrough.

Disable the reviews.sequence_diagrams setting to disable sequence diagrams in the walkthrough.

@aviralsaxena16 aviralsaxena16 merged commit 596c584 into OpenLake:main Mar 22, 2026
6 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant