Important
GPUSentry (version 1.0.1) is now up-to-date!
This version for GPUSentry contains more powerful features:
- Auto Scripting
- Analyzing Logger for daily & monthly
- Automatic Monitoring and Alerting
- Scheduled Reporting
- Feishu Webhook Integration
GPUSentry is a command-line tool for monitoring GPU status in real-time. It provides a continuously updating display of GPU utilization, memory usage, temperature, and other relevant metrics by leveraging the gpustat utility.
For researchers in the field of AI, CUDA out of memory is likely the most unwelcome error they can encounter. Instead of repeatedly typing nvidia-smi into the terminal to check GPU memory usage, why not set up a simple and user-friendly monitoring tool to keep an eye on GPU usage?
We want it to be simple, and fast enough as a loyal sentry!
- Real-time GPU monitoring dashboard using nvitop
- Continuous data collection and local database storage
- Configurable monitoring intervals
- Logging system with file and console output
- Data retrieval and analysis capabilities
- Scheduled reporting (daily/weekly/monthly)
- Feishu Webhook integration with text and chart support
- Custom time range reports (minute-level granularity)
- LLM-powered intelligent analysis
- Database reset and statistics functionality
git clone https://github.com/xiyuanyang-code/GPUSentry.git
cd GPUSentry
# we recommend using uv
uv sync
sourve .venv/bin/activate
uv pip install -e .
# if you do not have uv, you can also use it directly in pip
pip install -e .Copy config.example.yaml into config.yaml.
# GPUSentry Configuration File
# Feishu Webhook Configuration
feishu:
keyword: "GPUSentry"
webhook_url: "https://open.feishu.cn/open-apis/bot/v2/hook/your-hook"
# Monitoring Settings
monitoring:
interval: 5 # Monitoring interval (seconds)
enable_logging: true # Whether to enable logging
LLM:
model_name: deepseek-chat
OPENAI_API_KEY: sk-your-api-key
BASE_URL: https://api.deepseek.com
# Reporting Settings
# todo to be done in the future
# alert settings
# todo to be done in the future- For message sending of alert and notifications, you are required to create a Feishu Bot and get the webhook-url.
- Configure your LLM api-key for OpenAI SDK format.
gpusentryorgpusentry board: Launch GPU monitoring dashboardgpusentry backend: Start background monitoring servicegpusentry backend --interval 10: Start with custom collection interval (in seconds)gpusentry reset: Reset database and generate statisticsgpusentry reset --force: Force reset database without confirmationgpusentry send N: Send report for the last N minutes to Feishu Webhook
All the code in this project is written by LLM, with specifications given clearly in spec.
The project integrates OpenAI API for intelligent analysis of GPU usage patterns. To enable this feature, configure your API key in the config.yaml file or set it as an environment variable.