-
Notifications
You must be signed in to change notification settings - Fork 1.2k
Description
Operating System
- macOS
- Linux
- Windows
Deployment Method
- Docker
- non-Docker
CUDA Usage
- Yes
- No
Training Process Details (if applicable)
No response
Second-Me version
latest master
Describe the bug
This is as far as I've been able to bring my video drivers and CUDA:
| NVIDIA-SMI 555.42.02 Driver Version: 555.42.02 CUDA Version: 12.5 | |-----------------------------------------+------------------------+----------------------+ | GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC | | Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. | | | | MIG M. | |=========================================+========================+======================| | 0 NVIDIA GeForce RTX 3090 Off | 00000000:24:00.0 On | N/A | | 31% 60C P0 106W / 370W | 1977MiB / 24576MiB | 6% Default | | | | N/A |
I've had to back off to Ubuntu 22.04 for compatibility. In spite of answering yes to the CUDA question I end up with the following output at the end of the build (no errors in the actual build, as I've made a number of changes to the Dockerfile as workarounds)
`
[+] Building 2/2
✔ backend Built 0.0s
✔ frontend Built 0.0s
[+] Running 3/3
✔ Network second-me_second-me-network Created 0.1s
✔ Container second-me-backend Started 0.5s
✔ Container second-me-frontend Started 0.6s
Container startup complete
Check CUDA support with: make docker-check-cuda
/r/P/Second-Me on 🌱 master [!?] via 🐍 v3.12.3 took 16m0s
→ make docker-check-cuda
Checking CUDA support in Docker containers...
Running CUDA support check in backend container:
=== GPU Support Check ===
llama-server binary found, checking for CUDA linkage...
❌ llama-server is not linked with CUDA libraries
Container was built without CUDA support
🔍 NVIDIA GPU is available at runtime, but llama-server doesn't support CUDA
To enable GPU support, rebuild using: make docker-up (and select CUDA support when prompted)
No GPU support detected in backend container
`
Current Behavior
The container starts without CUDA support
Expected Behavior
The container starts propertly with CUDA support
Reproduction Steps
Use the following Dockerfile.backend.cuda:
`
FROM nvidia/cuda:12.4.1-base-ubuntu22.04
Set working directory
WORKDIR /app
Add build argument to conditionally skip llama.cpp build
ARG SKIP_LLAMA_BUILD=false
Install system dependencies with noninteractive mode to avoid prompts
#RUN apt-get update && DEBIAN_FRONTEND=noninteractive apt-get install -y \
build-essential cmake git curl wget lsof vim unzip sqlite3 \
python3-pip python3-venv python3-full python3-poetry pipx \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* \
&& ln -sf /usr/bin/python3 /usr/bin/python
ENV DEBIAN_FRONTEND=noninteractive
Install Python 3.12.2 from source
RUN apt-get update &&
apt-get install -y
build-essential libssl-dev zlib1g-dev libbz2-dev
libreadline-dev libsqlite3-dev wget curl llvm
libncursesw5-dev xz-utils tk-dev libxml2-dev libxmlsec1-dev
libffi-dev liblzma-dev git unzip cmake sqlite3
cuda-toolkit-12-4
&& cd /usr/src &&
wget https://www.python.org/ftp/python/3.12.2/Python-3.12.2.tgz &&
tar xzf Python-3.12.2.tgz && cd Python-3.12.2 &&
./configure --enable-optimizations &&
make -j"$(nproc)" && make altinstall &&
ln -sf /usr/local/bin/python3.12 /usr/bin/python &&
curl -sS https://bootstrap.pypa.io/get-pip.py | python &&
rm -rf /var/lib/apt/lists/*
Create a virtual environment to avoid PEP 668 restrictions
RUN python -m venv /app/venv
ENV PATH="/app/venv/bin:$PATH"
ENV VIRTUAL_ENV="/app/venv"
Use the virtual environment's pip to install packages
RUN pip install --upgrade pip
&& pip install poetry
&& poetry config virtualenvs.create false
Create directories
RUN mkdir -p /app/dependencies /app/data/sqlite /app/data/chroma_db /app/logs /app/run /app/resources
Copy dependency files - Files that rarely change
COPY dependencies/graphrag-1.2.1.dev27.tar.gz /app/dependencies/
COPY dependencies/llama.cpp.zip /app/dependencies/
Copy GPU checker script
COPY docker/app/check_gpu_support.sh /app/
COPY docker/app/check_torch_cuda.py /app/
RUN chmod +x /app/check_gpu_support.sh
Unpack llama.cpp and build with CUDA support (conditionally, based on SKIP_LLAMA_BUILD)
RUN if [ "$SKIP_LLAMA_BUILD" = "false" ]; then
echo "=====================================================================" &&
echo "STARTING LLAMA.CPP BUILD WITH CUDA SUPPORT - THIS WILL TAKE SOME TIME" &&
echo "=====================================================================" &&
LLAMA_LOCAL_ZIP="dependencies/llama.cpp.zip" &&
echo "Using local llama.cpp archive..." &&
unzip -q "$LLAMA_LOCAL_ZIP" &&
cd llama.cpp &&
mkdir -p build &&
cd build &&
echo "Starting CMake configuration with CUDA support..." &&
cmake -DGGML_CUDA=ON
-DCMAKE_BUILD_TYPE=Release
-DBUILD_SHARED_LIBS=OFF
-DLLAMA_NATIVE=ON
.. &&
echo "Starting build process (this will take several minutes)..." &&
cmake --build . --config Release -j$(nproc) --verbose &&
echo "Build completed successfully" &&
chmod +x /app/llama.cpp/build/bin/llama-server /app/llama.cpp/build/bin/llama-cli &&
echo "====================================================================" &&
echo "CUDA BUILD COMPLETED SUCCESSFULLY! GPU ACCELERATION IS NOW AVAILABLE" &&
echo "====================================================================";
else
echo "=====================================================================" &&
echo "SKIPPING LLAMA.CPP BUILD (SKIP_LLAMA_BUILD=$SKIP_LLAMA_BUILD)" &&
echo "Using existing llama.cpp build from Docker volume" &&
echo "=====================================================================" &&
LLAMA_LOCAL_ZIP="dependencies/llama.cpp.zip" &&
echo "Just unpacking llama.cpp archive (no build)..." &&
unzip -q "$LLAMA_LOCAL_ZIP" &&
cd llama.cpp &&
mkdir -p build;
fi
Mark as GPU-optimized build for runtime reference
RUN mkdir -p /app/data &&
echo "{ "gpu_optimized": true, "optimized_on": "$(date -u +"%Y-%m-%dT%H:%M:%SZ")" }" > /app/data/gpu_optimized.json &&
echo "Created GPU-optimized marker file"
Copy project configuration - Files that occasionally change
COPY pyproject.toml README.md /app/
Fix for potential package installation issues with Poetry
RUN pip install --upgrade setuptools wheel
RUN poetry install --no-interaction --no-root || poetry install --no-interaction --no-root --without dev
RUN pip install --force-reinstall dependencies/graphrag-1.2.1.dev27.tar.gz
Copy source code - Files that frequently change
COPY docker/ /app/docker/
COPY lpm_kernel/ /app/lpm_kernel/
Check module import
RUN python -c "import lpm_kernel; print('Module import check passed')"
Set environment variables
ENV PYTHONUNBUFFERED=1
PYTHONPATH=/app
BASE_DIR=/app/data
LOCAL_LOG_DIR=/app/logs
RUN_DIR=/app/run
RESOURCES_DIR=/app/resources
APP_ROOT=/app
FLASK_APP=lpm_kernel.app
LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
Expose ports
EXPOSE 8002 8080
Set the startup command
CMD ["bash", "-c", "echo 'Checking SQLite database...' && if [ ! -s /app/data/sqlite/lpm.db ]; then echo 'SQLite database not found or empty, initializing...' && mkdir -p /app/data/sqlite && sqlite3 /app/data/sqlite/lpm.db '.read /app/docker/sqlite/init.sql' && echo 'SQLite database initialized successfully' && echo 'Tables created:' && sqlite3 /app/data/sqlite/lpm.db '.tables'; else echo 'SQLite database already exists, skipping initialization'; fi && echo 'Checking ChromaDB...' && if [ ! -d /app/data/chroma_db/documents ] || [ ! -d /app/data/chroma_db/document_chunks ]; then echo 'ChromaDB collections not found, initializing...' && python /app/docker/app/init_chroma.py && echo 'ChromaDB initialized successfully'; else echo 'ChromaDB already exists, skipping initialization'; fi && echo 'Starting application at '
`
Possible Workaround
No response
Additional Information
No response
Link to related Github discussion or issue
No response