Skip to content

Force CLIP text encoder onto CUDA if availabe to avoid CPU NaN outputs in Jetson Xavier#125

Open
mqcmd196 wants to merge 1 commit intofacebookresearch:mainfrom
mqcmd196:jetson-avoid-nan
Open

Force CLIP text encoder onto CUDA if availabe to avoid CPU NaN outputs in Jetson Xavier#125
mqcmd196 wants to merge 1 commit intofacebookresearch:mainfrom
mqcmd196:jetson-avoid-nan

Conversation

@mqcmd196
Copy link
Contributor

@mqcmd196 mqcmd196 commented Jul 17, 2025

Cc: @HiroIshida

On Jetson Xavier (with CPU highload?),

def get_clip_embeddings(vocabulary, prompt='a ', clip_download_root=None):
    from detic.modeling.text.text_encoder import build_text_encoder
    text_encoder = build_text_encoder(pretrain=True,
                                      clip_download_root=clip_download_root)
    text_encoder.eval()
    texts = [prompt + x for x in vocabulary]
    emb = text_encoder(texts).detach().permute(1, 0).contiguous().cpu()
    print(f"EMB {emb}") # FOR DEBUG!
    return emb

prints

EMB tensor([[nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan],
        ...,
        [nan, nan, nan],
        [nan, nan, nan],
        [nan, nan, nan]])

It seems that sometimes the CLIP embedding is broken with some reason (sometimes works...)

Using the Python script

import torch
from detic.modeling.text.text_encoder import build_text_encoder

def forward_stats(x):
    n_nan = torch.isnan(x).sum().item()
    n_inf = torch.isinf(x).sum().item()
    return n_nan, n_inf, tuple(x.shape), x.dtype, x.device

def test_cpu_cuda(vocab=None, prompt='a ', clip_download_root="/catkin_ws/src/detic_ros/models"):
    if vocab is None:
        vocab = ["dog", "cat", "robot"]
    print("PyTorch:", torch.__version__)
    print("CUDA available:", torch.cuda.is_available())
    if torch.cuda.is_available():
        print("CUDA device:", torch.cuda.get_device_name(0))

    enc = build_text_encoder(pretrain=True, clip_download_root=clip_download_root)
    enc.eval()

    texts = [prompt + x for x in vocab]

    # CPU
    with torch.no_grad():
        emb_cpu = enc(texts)  # enc is still on CPU
    print("[CPU] stats:", forward_stats(emb_cpu))

    # move to GPU (if available)
    if torch.cuda.is_available():
        enc = enc.to('cuda')
        with torch.no_grad():
            emb_gpu = enc(texts)
        print("[CUDA] stats:", forward_stats(emb_gpu))
        return emb_cpu, emb_gpu
    else:
        return emb_cpu, None

if __name__ == "__main__":
    test_cpu_cuda()

I checked on my Jetson Xavier and x86/RTX3090Ti machine.

Jetson Xavier prints

root@core-io:/catkin_ws/src/detic_ros/node_script# python3 test.py
PyTorch: 1.11.0
CUDA available: True
CUDA device: Xavier
Loading pretrained CLIP
[CPU] stats: (1536, 0, (3, 512), torch.float32, device(type='cpu'))
[CUDA] stats: (0, 0, (3, 512), torch.float32, device(type='cuda', index=0))
x86/RTX3090Ti prints
❯ ~/ros/catkin_ws/devel/.private/detic_ros/share/detic_ros/venv/bin/python ./test.py
PyTorch: 1.9.0+cu111
CUDA available: True
CUDA device: NVIDIA GeForce RTX 3090 Ti
Loading pretrained CLIP
[CPU] stats: (0, 0, (3, 512), torch.float32, device(type='cpu'))
[CUDA] stats: (0, 0, (3, 512), torch.float32, device(type='cuda', index=0))
The result means that it seems that the embeddings has been broken when it is on CPU in Xavier.

The result means that it seems that the embeddings has been broken when it is on CPU in Xavier.

This PR explicitly places the encoder on cuda when available. I confirmed the patch make the function stable in Jetson Xavier. I think this patch doesn't break other features.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant