GPUSentry

Important

GPUSentry (version 1.0.1) is now up-to-date!

This version for GPUSentry contains more powerful features:

Auto Scripting
Analyzing Logger for daily & monthly
Automatic Monitoring and Alerting
Scheduled Reporting
Feishu Webhook Integration

Introduction

GPUSentry is a command-line tool for monitoring GPU status in real-time. It provides a continuously updating display of GPU utilization, memory usage, temperature, and other relevant metrics by leveraging the gpustat utility.

For researchers in the field of AI, CUDA out of memory is likely the most unwelcome error they can encounter. Instead of repeatedly typing nvidia-smi into the terminal to check GPU memory usage, why not set up a simple and user-friendly monitoring tool to keep an eye on GPU usage?

We want it to be simple, and fast enough as a loyal sentry!

Features

Real-time GPU monitoring dashboard using nvitop
Continuous data collection and local database storage
Configurable monitoring intervals
Logging system with file and console output
Data retrieval and analysis capabilities
Scheduled reporting (daily/weekly/monthly)
Feishu Webhook integration with text and chart support
Custom time range reports (minute-level granularity)
LLM-powered intelligent analysis
Database reset and statistics functionality

Usage

Installation

git clone https://github.com/xiyuanyang-code/GPUSentry.git
cd GPUSentry

# we recommend using uv
uv sync
sourve .venv/bin/activate
uv pip install -e .

# if you do not have uv, you can also use it directly in pip
pip install -e .

Configurations

Copy config.example.yaml into config.yaml.

# GPUSentry Configuration File

# Feishu Webhook Configuration
feishu:
  keyword: "GPUSentry"
  webhook_url: "https://open.feishu.cn/open-apis/bot/v2/hook/your-hook"

# Monitoring Settings
monitoring:
  interval: 5  # Monitoring interval (seconds)
  enable_logging: true  # Whether to enable logging

LLM:
  model_name: deepseek-chat
  OPENAI_API_KEY: sk-your-api-key
  BASE_URL: https://api.deepseek.com

# Reporting Settings
# todo to be done in the future

# alert settings
# todo to be done in the future

For message sending of alert and notifications, you are required to create a Feishu Bot and get the webhook-url.
Configure your LLM api-key for OpenAI SDK format.

Basic Commands

gpusentry or gpusentry board: Launch GPU monitoring dashboard
gpusentry backend: Start background monitoring service
gpusentry backend --interval 10: Start with custom collection interval (in seconds)
gpusentry reset: Reset database and generate statistics
gpusentry reset --force: Force reset database without confirmation
gpusentry send N: Send report for the last N minutes to Feishu Webhook

LLM Usage

All the code in this project is written by LLM, with specifications given clearly in spec.

The project integrates OpenAI API for intelligent analysis of GPU usage patterns. To enable this feature, configure your API key in the config.yaml file or set it as an environment variable.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
spec		spec
src/gpusentry		src/gpusentry
.gitignore		.gitignore
README-ZH.md		README-ZH.md
README.md		README.md
config.example.yaml		config.example.yaml
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPUSentry

Introduction

Features

Usage

Installation

Configurations

Basic Commands

LLM Usage

About

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

GPUSentry

Introduction

Features

Usage

Installation

Configurations

Basic Commands

LLM Usage

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Contributors

Uh oh!

Languages