LLM4UnitTests-SC is a platform that automates the generation of unit tests for Solidity smart contracts using Large Language Models (LLMs).
It was developed as part of an academic end-of-year project at the National Engineering School of Sfax (ENIS) to explore the potential of LLMs in automating blockchain testing workflows.
The system integrates Hardhat, Node.js, Spring Boot, and React into a unified environment that:
- Builds and optimizes prompts
- Interacts with local LLMs
- Validates generated code syntax
- Executes smart contract tests
- Calculates code coverage
- Presents results through an interactive web interface
- Automate the generation, validation, and execution of smart contract unit tests.
- Evaluate the performance of multiple LLMs.
- Compare prompt engineering strategies for improving code quality and coverage.
- Provide developers with a user-friendly platform for AI-assisted testing.
| Evaluated Models |
|---|
| Codestral |
| DeepSeek-R1 (7B / 14B) |
| Llama 3 (8B) |
| Phi-4 (14B) |
LLM4UnitTests-SC follows a 3-tier architecture:
- User interface for uploading smart contracts, configuring prompts, and visualizing test results.
- Displays coverage metrics, syntax errors, and execution outcomes.
- Enables downloading generated test reports (PDF).
- Handles prompt construction, LLM interaction, and process orchestration.
- Exposes REST APIs for live feedback and test progress.
- Integrates Spring AI to connect with local and API-based LLMs.
- Uses Babel Parser for syntax validation and correction.
- Executes Solidity test suites through Hardhat.
- Generates coverage reports and sends structured results to the backend.
- Upload Solidity smart contracts
- Generate prompts dynamically (5 types)
- Send prompts to selected LLM (API or local)
- Filter and clean raw LLM output
- Validate JavaScript syntax (Babel Parser)
- Execute tests (Hardhat) and calculate coverage
- Visualize metrics in real time
- Download PDF summary of results
Five prompt configurations were implemented and tested:
- Type 1: Zero-shot (no contract code)
- Type 2: Zero-shot with example test
- Type 3: Zero-shot with contract code
- Type 4: One-shot with contract code (best performing)
- Type 5: Zero-shot with ABI
- Syntax Error Rate (via Babel Parser)
- Coverage (Statements, Branches, Functions, Lines — via Hardhat Coverage)
- Human Intervention (amount of manual correction needed)
| Category | Tools / Frameworks |
|---|---|
| Smart Contracts | Solidity |
| Test Framework | Hardhat (Mocha + Chai) |
| Syntax Validation | Node.js (Babel Parser) |
| Backend | Spring Boot + Spring AI |
| Frontend | React |
| LLM Runtime | Ollama (local models) |
| Visualization | Hardhat Coverage + React Dashboard |
- Node.js ≥ 18
- Java ≥ 17
- npm
- Hardhat
- Maven
- Ollama (for local LLMs)