Add Modular Submission Evaluation Pipeline (Validation → Execution → Scoring) by aviralsaxena16 · Pull Request #152 · OpenLake/canonforces

aviralsaxena16 · 2026-03-22T13:27:53Z

🚀 Add Modular Submission Evaluation Pipeline (Validation → Execution → Scoring)

🔍 Overview

This PR introduces a structured evaluation pipeline for Canonforces submissions, transforming the existing submission flow into a modular, extensible system.

The pipeline now processes each submission through clearly defined stages:

Validation → Sandbox Execution → Scoring → Result

This improves reliability, maintainability, and aligns the system with real-world evaluation pipelines used in research and benchmarking systems.

✨ Key Changes

1️⃣ Validation Layer

Added a dedicated validation module to verify:
- code presence and length
- language selection
Prevents invalid submissions before execution

📁 src/lib/pipeline/validation.ts

2️⃣ Sandbox Execution Layer

Introduced an abstraction over Judge0 execution
Encapsulates execution logic into a reusable function

📁 src/lib/pipeline/execution.ts

Ensures all code runs in a controlled sandbox environment
Decouples execution from UI logic

3️⃣ Scoring Layer

Added structured scoring system:
- evaluates outputs against test cases
- computes pass/fail per test case
- generates overall score (%)

📁 src/lib/pipeline/scoring.ts

Example output:

{
  "total": 5,
  "passed": 4,
  "score": 80,
  "results": [
    { "status": "Accepted" },
    { "status": "Wrong Answer" }
  ]
}

4️⃣ Unified Pipeline Orchestrator

Created a central pipeline controller:

📁 src/lib/pipeline/index.ts

Handles:
- validation
- execution loop
- scoring
Returns structured responses with stage-level feedback

5️⃣ UI Integration

Updated submission flow to reflect pipeline stages:
- "Running Validation..."
- "Running Execution..."
- "Scoring submission..."
Displays structured results in existing Output component

🧠 Why This Matters

This refactor introduces pipeline-based architecture, which:

improves separation of concerns
enables easy extension (e.g., new evaluators, metrics)
ensures deterministic evaluation flow
mirrors real-world systems used in benchmarking and research pipelines

🔬 Alignment with Advanced Evaluation Systems

The new architecture aligns with systems that:

validate structured inputs
execute code in sandboxed environments
compute standardized metrics
generate reproducible results

This makes Canonforces closer to a general-purpose evaluation framework, not just a CP platform.

📦 New File Structure

src/lib/pipeline/
 ├ validation.ts
 ├ execution.ts
 ├ scoring.ts
 └ index.ts

🧪 Future Improvements

Add JSON/PDF report generation for submissions
Introduce configurable scoring weights
Add batch evaluation for contests
Integrate Docker-based execution for full isolation
Add performance metrics (runtime, memory)

✅ Result

Canonforces now includes a modular, extensible evaluation pipeline that:

validates inputs
executes code safely
scores submissions systematically
produces structured outputs

This significantly enhances the platform's engineering depth and scalability.

Summary by CodeRabbit

New Features
- Added comprehensive code submission pipeline with validation, execution, and automated scoring of test cases.
- Improved error handling for code submissions with clear validation and execution stage feedback.
- Enhanced submission results display with per-test case status tracking and percentage scoring.

vercel · 2026-03-22T13:27:57Z

@aviralsaxena16 is attempting to deploy a commit to the aviralsaxena16's projects Team on Vercel.

A member of the Team first needs to authorize it.

github-actions · 2026-03-22T13:28:01Z

🎉 Thanks for Your Contribution to CanonForces! ☺️

We'll review it as soon as possible. In the meantime, please:

✅ Double-check the file changes.
✅ Ensure that all commits are clean and meaningful.
✅ Link the PR to its related issue (e.g., Closes #123).
✅ Resolve any unaddressed review comments promptly.

💬 Need help or want faster feedback?
Join our Discord 👉 CanonForces Discord

Thanks again for contributing 🙌 – @aviralsaxena16!
cc: @aviralsaxena16

coderabbitai · 2026-03-22T13:28:09Z

Caution

Review failed

Pull request was closed or merged during review

Walkthrough

This pull request introduces a modular pipeline system for code submission processing. Four new TypeScript modules are created to handle submission validation, sandbox execution, test case scoring, and orchestration. The Dockerfile adds a runtime environment variable, and the question submission handler is refactored to delegate logic to the new pipeline system instead of handling it inline.

Changes

Cohort / File(s)	Summary
Pipeline Infrastructure `src/lib/pipeline/validation.ts`, `src/lib/pipeline/execution.ts`, `src/lib/pipeline/scoring.ts`, `src/lib/pipeline/index.ts`	Four new modules added: validation checks code/language; execution POSTs to sandbox API; scoring compares outputs and computes percentage; orchestrating pipeline coordinates the three-stage workflow with error branching.
Docker Configuration `Dockerfile`	Added `GEMINI_API_KEY=test-key` environment variable alongside existing build-time API key variable.
Question Submission Handler `src/pages/questions/[id].tsx`	Refactored `handleSubmit` to use `runPipeline`, replacing inline validation/execution/scoring logic with pipeline orchestration; updated error handling and result display; removed Firestore local history persistence.

Sequence Diagram

sequenceDiagram
    participant Client as Client<br/>(QuestionBar)
    participant Pipeline as runPipeline
    participant Validator as validateSubmission
    participant Executor as executeInSandbox
    participant Scorer as scoreSubmission
    participant API as /api/hello

    Client->>Pipeline: runPipeline({code, language, testCases})
    Pipeline->>Validator: validateSubmission({code, language})
    Validator-->>Pipeline: {valid, error?}
    
    alt Validation Failed
        Pipeline-->>Client: {stage: "validation", error}
    else Validation Passed
        loop For each testCase
            Pipeline->>Executor: executeInSandbox({language, code, input})
            Executor->>API: POST {language, codeValue, input}
            API-->>Executor: {data: {run}}
            Executor-->>Pipeline: ExecutionResult | null
        end
        
        alt Execution Failed
            Pipeline-->>Client: {stage: "execution", error}
        else Execution Passed
            Pipeline->>Scorer: scoreSubmission(executionOutputs, testCases)
            Scorer-->>Pipeline: ScoreResult {passed, total, score, results}
            Pipeline-->>Client: {stage: "completed", data: ScoreResult}
        end
    end

Estimated Code Review Effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Poem

🐰 From validation to sandbox we hop with delight,
Each test case executing shines oh so bright,
The scoring reveals which submissions are blessed,
A pipeline of magic—putting code to the test! ✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 inconclusive)

Check name	Status	Explanation	Resolution
Description check	❓ Inconclusive	The PR description comprehensively covers the pipeline architecture, new modules, rationale, and future improvements, but does not follow the repository's required template structure.	Consider restructuring the description to match the template: use the prescribed sections (Related Issue, Changes Introduced, Why This Change, Testing, Documentation Updates, Checklist, etc.) to ensure consistency with repository standards.

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately captures the main change: introducing a modular submission evaluation pipeline with clear stages (Validation → Execution → Scoring).
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

Tip

You can disable sequence diagrams in the walkthrough.

Disable the reviews.sequence_diagrams setting to disable sequence diagrams in the walkthrough.

aviralsaxena16 added 2 commits March 22, 2026 18:54

Added pipeline based evaluation of the code

dd35583

Modified the docker pipeline

723b2c4

aviralsaxena16 requested a review from Jagath-P as a code owner March 22, 2026 13:27

aviralsaxena16 merged commit 596c584 into OpenLake:main Mar 22, 2026
6 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add Modular Submission Evaluation Pipeline (Validation → Execution → Scoring)#152

Add Modular Submission Evaluation Pipeline (Validation → Execution → Scoring)#152
aviralsaxena16 merged 2 commits intoOpenLake:mainfrom
aviralsaxena16:Adding-evaluation-pipeline

aviralsaxena16 commented Mar 22, 2026 •

edited by coderabbitai bot

Loading

Uh oh!

vercel bot commented Mar 22, 2026

Uh oh!

github-actions bot commented Mar 22, 2026

Uh oh!

coderabbitai bot commented Mar 22, 2026 •

edited

Loading

Review failed

❌ Failed checks (1 inconclusive)

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

aviralsaxena16 commented Mar 22, 2026 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!