Why are the results I get from using the code in FG-CLIP/eval/test.py different from those in your paper? For R@1, R@5, and R@10, the results I get from running your code directly and using your weight .pt file are generally better than the results in Table 1 of your paper.