Intelligent Multi-Source Paper Digest Generator
Fetch, classify, and summarize papers from multiple sources with AI-powered digests
Paper Claw serves two types of users:
|
I want to set up daily paper digests for my research field
|
I want to integrate Paper Claw into my agent workflow
|
# 1. Clone repository
git clone https://github.com/PigeonDan1/paper_claw.git
cd paper_claw
# 2. Install dependencies
pip install -r requirements.txt
# 3. Configure environment
cp .env.example .env
cp config/recipients.example.json config/recipients.json
# 4. Edit config/default.json to select your research categories
# (See "ArXiv Categories" section below)
# 5. Run
python scripts/main.py --day 2026-03-11Paper Claw provides 170+ arXiv categories in config/arxiv_categories.json. The default configuration is set for Speech & Audio research, but you can easily customize it for your field.
How to customize:
- Open
config/arxiv_categories.jsonto browse available categories - Find your category codes (e.g.,
cs.CLfor NLP,cs.CVfor Computer Vision) - Edit
config/default.json→sources.arxiv.categories
Example configurations:
// For NLP Research
{
"sources": {
"arxiv": {
"enabled": true,
"categories": [
{"id": "cs.CL", "name": "Computation and Language", "url": "https://arxiv.org/list/cs.CL/recent"},
{"id": "cs.LG", "name": "Machine Learning", "url": "https://arxiv.org/list/cs.LG/recent"}
]
}
}
}
// For Computer Vision
{
"sources": {
"arxiv": {
"enabled": true,
"categories": [
{"id": "cs.CV", "name": "Computer Vision", "url": "https://arxiv.org/list/cs.CV/recent"},
{"id": "cs.MM", "name": "Multimedia", "url": "https://arxiv.org/list/cs.MM/recent"}
]
}
}
}Popular combinations:
| Field | Categories |
|---|---|
| Speech & Audio (Default) | cs.SD, eess.AS |
| AI/ML | cs.AI, cs.LG, cs.CL, cs.CV |
| NLP | cs.CL, cs.LG |
| Computer Vision | cs.CV, cs.MM |
| Computational Biology | q-bio.BM, q-bio.GN, q-bio.NC |
Edit .env:
# QQ Mail Example
SMTP_HOST=smtp.qq.com
SMTP_PORT=465
SMTP_USER=your@qq.com
SMTP_PASS=your-auth-code # Not your password!
# Gmail Example
# SMTP_HOST=smtp.gmail.com
# SMTP_PORT=465
# SMTP_USER=your@gmail.com
# SMTP_PASS=your-app-passwordEdit config/recipients.json:
{
"recipients": [
{"email": "you@example.com", "name": "Your Name", "enabled": true},
{"email": "colleague@lab.edu", "name": "Colleague", "enabled": true}
]
}Personalized Greeting: Each recipient will see a personalized greeting in their email:
👋 Your Name, hello! This is PJ's paper assistant bringing you today's academic digest~
The name field is used for this greeting and is displayed in the email header.
Edit .env with at least one API key:
# Option 1: DeepSeek (Recommended for Chinese)
DEEPSEEK_API_KEY=sk-xxx
DEEPSEEK_API_BASE=https://models.sjtu.edu.cn/api/v1 # Optional: custom endpoint
# Option 2: Kimi (Moonshot)
MOONSHOT_API_KEY=sk-xxx
# Option 3: OpenAI
OPENAI_API_KEY=sk-xxx
# Option 4: Claude
ANTHROPIC_API_KEY=sk-xxx
# Option 5: Gemini
GOOGLE_API_KEY=xxx
# Option 6: DashScope (Aliyun)
DASHSCOPE_API_KEY=sk-xxxAuto-fallback chain: DeepSeek → Kimi → OpenAI → Claude → Gemini → Rule-based
Edit config/default.json to set default:
{
"llm": {
"default_provider": "deepseek",
"providers": {
"deepseek": {
"model": "deepseek-v3"
}
}
}
}Define how papers are categorized in your digest. The default is set for Speech & Audio research:
{
"classification": {
"categories": [
{
"name": "ASR",
"labels": {"zh": "语音识别", "en": "Speech Recognition"},
"keywords": ["asr", "speech recognition", "automatic speech"]
},
{
"name": "TTS",
"labels": {"zh": "语音合成", "en": "Speech Synthesis"},
"keywords": ["tts", "text-to-speech", "speech synthesis"]
}
]
}
}Structure explained:
name: Category ID (used internally)labels: Display names in different languages (zh,en,ja,ko, etc.)keywords: Keywords for automatic classification (case-insensitive matching)
Example for NLP research:
{
"classification": {
"categories": [
{
"name": "LLM",
"labels": {"zh": "大语言模型", "en": "Large Language Models"},
"keywords": ["llm", "large language model", "gpt", "transformer"]
},
{
"name": "RAG",
"labels": {"zh": "检索增强", "en": "Retrieval-Augmented Generation"},
"keywords": ["rag", "retrieval", "knowledge base", "embedding"]
}
]
}
}# Today's papers (default language from config)
python scripts/main.py
# Specific date with language
python scripts/main.py --day 2026-03-11 --language zh
# Date range
python scripts/main.py --start-date 2026-03-01 --end-date 2026-03-11
# Generate digest without sending email
python scripts/main.py --day 2026-03-11 --no-email
# Preview recipients before sending
python scripts/main.py --day 2026-03-11 --previewGitHub Actions (Recommended):
- Fork this repository
- Go to Settings → Secrets → Actions
- Add secrets:
SMTP_HOST,SMTP_USER,SMTP_PASS,DEEPSEEK_API_KEY, etc. - Workflow runs daily at 01:00 UTC (09:00 CST)
Local Cron (Linux/Mac):
# Edit crontab
crontab -e
# Add line for daily 9 AM run
0 9 * * * cd /path/to/paper_claw && python scripts/main.pyWindows Task Scheduler:
$Action = New-ScheduledTaskAction -Execute "python.exe" -Argument "scripts/main.py"
$Trigger = New-ScheduledTaskTrigger -Daily -At "09:00"
Register-ScheduledTask -TaskName "PaperClaw" -Action $Action -Trigger $TriggerPaper Claw provides a standardized Skill interface for AI agents (OpenClaw, Kimi, etc.)
No manual configuration needed! Agents can instantly configure Paper Claw for any research field:
from skill.example import list_presets, apply_preset
# Step 1: Browse available presets
presets = list_presets()
# → [{"id": "nlp", "name": "NLP & LLM"},
# {"id": "computer_vision", "name": "Computer Vision"}, ...]
# Step 2: Apply with one line
apply_preset("nlp") # Automatically configures arXiv + classificationAvailable Presets:
| Preset | Research Field | ArXiv Categories | Paper Categories |
|---|---|---|---|
🎙️ speech_audio |
Speech & Audio | cs.SD, eess.AS | Speech LLM, ASR, TTS... |
📝 nlp |
NLP & LLM | cs.CL, cs.LG, cs.AI | LLM, RAG, Agents... |
👁️ computer_vision |
Computer Vision | cs.CV, cs.MM | Image Gen, Detection... |
🧠 general_ai |
General AI/ML | cs.AI, cs.LG... | Deep Learning, RL... |
from skill.example import fetch_papers, get_digest_content
# Fetch and summarize papers
result = fetch_papers(day="2026-03-11", language="zh")
content = get_digest_content("2026-03-11", format="summary")| Tool | Purpose | Parameters |
|---|---|---|
list_presets |
List available presets | - |
apply_preset |
Apply preset configuration | preset_id |
preview_preset |
Preview preset without applying | preset_id |
fetch_papers |
Fetch from configured sources | day, language |
configure_sources |
Update arXiv categories | sources |
configure_categories |
Update classification | categories |
configure_recipients |
Update email list | recipients |
configure_language |
Set output language | language |
get_digest_content |
Retrieve generated digest | date, format |
send_digest |
Send email digest | date |
{
"skill": "paper_claw",
"version": "2.0.0",
"config": {
"preset": "nlp",
"language": "zh",
"llm": "deepseek"
}
}📖 skill/SKILL.md — Full integration guide
🔧 skill/tools.json — Tool schema definitions
💡 skill/example.py — Python usage examples
📋 skill/_meta.json — Agent metadata
We provide 170+ arXiv subject categories in config/arxiv_categories.json.
| Code | Name | Description |
|---|---|---|
| cs | Computer Science | AI, ML, NLP, CV, etc. |
| eess | Electrical Engineering | Signal Processing, Audio |
| physics | Physics | Optics, etc. |
| q-bio | Quantitative Biology | Genomics, etc. |
| q-fin | Quantitative Finance | Risk, Portfolio |
| stat | Statistics | ML, Methodology |
| math | Mathematics | Theory |
- Open
config/arxiv_categories.json - Find your category code (e.g.,
cs.CL) - Add to
config/default.json:
{
"sources": {
"arxiv": {
"categories": [
{"id": "cs.CL", "name": "Computation and Language", "url": "https://arxiv.org/list/cs.CL/recent"}
]
}
}
}Supported: 🇨🇳 中文 · 🇺🇸 English · 🇯🇵 日本語 · 🇰🇷 한국어 · 🇩🇪 Deutsch · 🇫🇷 Français · 🇪🇸 Español
# Command line
python scripts/main.py --language ja # Japanese
# Or set default in config/default.json
{"language": {"default": "zh"}}paper_claw/
├── config/
│ ├── default.json # Main configuration
│ ├── arxiv_categories.json # 170+ available categories
│ └── recipients.json # Email recipients (git-ignored)
├── skill/ # 🤖 Agent Skill interface
│ ├── SKILL.md # Agent integration guide
│ ├── tools.json # Tool schema
│ └── example.py # Usage examples
├── scripts/
│ ├── main.py # Entry point
│ ├── llm_client.py # Multi-LLM client
│ └── process_papers.py # Classification & summarization
├── templates/
│ └── email_template.html.j2 # Email HTML template
└── content/posts/ # Generated digests
New Features:
--no-emailflag: Generate digest locally without sending emails--previewflag: Preview recipient list before sending- Personalized email greetings using recipient names
Bug Fixes:
- Fixed LLM batch processing return value bug that could cause empty results
- Improved API error handling and logging
- arXiv integration (170+ categories)
- Multi-LLM support (DeepSeek, Kimi, OpenAI, Claude, Gemini)
- Multi-language support (7 languages)
- Email delivery with HTML + Markdown
- Agent Skill interface (OpenClaw compatible)
- CNKI (知网) integration
- Web UI
- RSS feed export
MIT License © 2026 Paper Claw Contributors
⭐ Star this repo if you find it helpful!


