PentestAgent is a novel LLM-driven penetration testing framework to automate intelligence gathering, vulnerability analysis, and exploitation stages, reducing manual intervention.
The framework is modular and consists of the following components:
- Reconnaissance Agent: Gathers intelligence about the target system.
- Planning Agent: Identifies and prioritizes vulnerabilities and potential exploits.
- Execution Agent: Attempts to execute selected exploits in a controlled environment.
For further is information, please refer to our paper.
Note: We recommend deploying this project on a Kali Linux environment for better compatibility with penetration testing tools and workflows.
git clone https://github.com/nbshenxm/pentest-agent.git
cd pentest-agent
Several environment variables need to be filled in. If you are not familiar with environment variables, set them in the .env file.
Required:
PDCP_API_KEY: ProjectDiscovery API key for accessing CVE data and vulnerability information.GITLAB_TOKEN: GitLab token for ExploitDB access.GITHUB_KEY: GitHub token for searching repositories and issues.INDEX_STORAGE_DIR: Directory to store vector indexes for RAG.PLANNING_OUTPUT_DIR: Directory to save planning results.LOG_DIR: Directory to store logs.
Optional:
http_proxy,https_proxy: If using a proxy or VPN.
Python version: 3.12
Use a virtual environment:
python3 -m venv .venv
source .venv/bin/activate
python -m pip install -r requirements.txt
or with Conda:
conda create -n pentest python=3.12
conda activate pentest
python -m pip install -r requirements.txt
CVEMAP is needed to fetch CVE-related information. Follow their installation instructions.
Specify the LLM provider, model name, temperature, and API key.
Set the model used for parsing CVE entries and its generation temperature.
Scoring criteria for evaluating CVEs:
- Vulnerability type
- Exploit maturity
- Remote exploitability
- Attack complexity
- Source weighting (ExploitDB, GitHub, Google)
Reconnaissance Agent:
current_topic: Topic identifier for current CVE task.target_ip: IP address of the target host.
Planning Agent:
model: LLM Model used for searching exploits and analyzing vulnerability data.keyword,app,version: Target application details.vuln_type: Type of vulnerability to focus on.cvemap_fuzzy_search: Enable fuzzy search for CVE matching.output_dir: Directory to save analysis results.
Execution Agent:
current_topic: Task/topic identifier.doc_dir: Directory containing exploit scripts or documents.target_ip,target_port: IP and port of target host.attacker_ip: IP of attacker's machine.command_to_execute: Payload to validate exploitation.model: LLM Model used for exploit execution guidance.
- File:
pentest_agent/agents/recon_agent.py - Function: Given a target IP, gathers system and service info.
- Usage: Set the topic, LLM model, and IP, then run the script.
python pentest_agent/agents/recon_agent.py
- File:
pentest_agent/agents/planning_agent.py - Function: Identifies relevant CVEs and associated exploits from multiple sources.
- Sources:
- GitHub repositories and issues
- ExploitDB entries
- Google search results
- Features: Multi-source intelligence aggregation with configurable LLM backends
- Usage: Set the model and application information.
python pentest_agent/agents/planning_agent.py
- File:
pentest_agent/agents/execution_agent.py - Function: Executes selected exploits based on previous analysis and collected context.
- Usage: Set the topic, exploit document path, and target info.
python pentest_agent/agents/execution_agent.py
PentestAgent provides Docker support for isolated execution of each agent.
Configure all agent parameters under the models, cve, cve_scoring, and runtime sections.
Example .env content:
GITHUB_KEY=your_github_token
OPENAI_API_KEY=your_openai_key
HUGGING_FACE_TOKEN=your_hf_token
INDEX_STORAGE_DIR=/path/to/indexes
PLANNING_OUTPUT_DIR=/path/to/output
LOG_DIR=/path/to/logs
cd pentest_agent/docker
docker-compose up --build -d recon
cd pentest_agent/docker
docker-compose up --build -d planning
cd pentest_agent/docker
docker-compose up --build -d execution
We adopt Vulhub for evaluating the system. Vulhub provides Docker-based vulnerable environments with real-world CVEs.
We select vulnerabilities based on the following criteria:
- Must have a valid CVE ID
- Must include a CVSS v3.x score
- Additional labels include:
- CWE ID
- Exploitability sub-score
- Difficulty levels derived from the CVSS vector
It's been a while since we performed our evaluation. We are working on including some new scenarios in addition to the VulHub in the benchmark, as well as evaluating PentestAgent on a variety of advanced LLM backbones. We will publish our results on the benchmark these works are finished.
Feel free to open an issue if you:
- Encounter any bugs
- Have suggestions for improvement
- Would like to contribute features or benchmarks
We welcome community contributions!