Prototype for multimodal image/text applications using OpenAI's CLIP preprocessing architecture. This application uses FAISS's Inner Product Nearest Neighbours (NNs) approximations to search (text-to-image) or classify (image-to-text) images.
To run this application you'll need wget, Python 3.6+, and the following Python dependencies, installed from the PyPI using conda or pip (as appropriate).
faiss-cpu/faiss-gpuflask+flask-cors(for deployment)ftfyregextorch,torchvision(with CUDA preferably)tqdm
A GPU is also preferred.
You can install it globally.
pip install git+https://github.com/openai/CLIP.gitOr install it locally from submodule.
git submodule update --initBefore running any scripts, review lib/hparams.py and change any hyperparameters as necessary.
- This is where the path to the dataset should be defined. By default, it uses the
tiny-imagenet-200dataset.
- Run
scripts/init.shto prepare the workspace. This creates folders for data, and generated indexes and encodings. - If you want to use the default dataset (
tiny-imagenet-200), runscripts/tiny-imagenet-200.sh. - Make sure that any datasets (AKA repositories) you want to use are accessible with a relative/absolute Unix path from the base of the dataset/repository).
The indexes will be stored in indexes_* and have a filename indicating dataset and number of features (which may vary given the CLIP model and FAISS compression method used).
To generate an image index, run python build_image_index.py <dataset_name> <dataset_path> <n_components>, where
dataset_nameis the name of the dataset/repository you would like to index.dataset_pathis the relative/absolute filepath to the dataset. The Dataloader will recursively include all images under this directory.n_componentsis the number of components the feature vectors will contain. PCA compression is coming soon.- These values have default values in
lib/hparams.py.
After each dataset generates an index, its (dataset_name, dataset_path) are added to collection_<datatype>.txt if they weren't already there. This provides an easy reference to reconstruct an ordered compound index and dataloader.
The file tree should look something like this:
indexes_images/
| tiny-imagenet-200-train_1024.index
| tiny-imagenet-200-train_512.index
| tiny-imagenet-200-test_1024.index
| tiny-imagenet-200-test_512.index
| tiny-imagenet-200-val_1024.index
| tiny-imagenet-200-val_512.index
-
Review the hyperparameters in the
vocabularysection oflib/hparams.py. Give the vocabulary a name and define the URL from which it can be retrieved. -
Run
python build_text_index.pyand review the indexes inindexes_text/. The default configuration should create the following file subtree.- Text indexes might take a while because they're not partitioned yet.
indexes_text/ | text_aidemos_1024.index | text_aidemos_512.index
Run sh scripts/retrieval.sh to start the publical Retrieval API. Run this on a separete port and GPU, for example. The API hasn't yet been fully documented so please see retrieval.py for now.
CUDA_VISIBLE_DEVICES=0 FLASK_RUN_PORT=5020 sh scripts/public.shRun BLOCKING= sh scripts/indexing.sh to start the private Indexing API. Make sure this API is not publicly exposed as it can be blocked by indexing function calls. Run this on a separete port and GPU, for example.
BLOCKING= CUDA_VISIBLE_DEVICES=1 FLASK_RUN_PORT=5021 sh scripts/index.shExample in Python.
import requests
url = "http://0.0.0.0:5021/api/add-text-repo"
payload = {
"modelName": "clip",
"name": "tiny-imagenet-200-train",
"path": "data/tiny-imagenet-200/train",
}
r = requests.post(url, json=payload)Example in Python.
import requests
url = "http://0.0.0.0:5021/api/add-text-repo"
payload = {
"modelName": "clip",
"name": "cities",
"vocab": {
"london": {},
"st. petersberg": {},
"paris": {},
}
}
r = requests.post(url, json=payload)