ArXiv ChatGuru is a Streamlit app that turns a topic from arXiv into a topic-scoped Redis vector index. It fetches papers, chunks them, stores embeddings in Redis, and lets you ask grounded questions against the papers you loaded.
This app is a learning project for academic RAG. It is intentionally simple and is meant to show how Redis fits into a paper Q&A workflow, not to act as a production-ready research assistant.
- Stores topic-specific paper chunks and embeddings
- Powers vector search for retrieval
- Lets you inspect the active index from the built-in stats page
- Enter a topic and choose how many papers to load.
- The app pulls papers from arXiv and splits them into chunks.
- OpenAI generates embeddings for those chunks.
- Redis stores the chunks and embeddings in a topic-scoped index.
- LangChain retrieves the closest chunks for each user question and sends that context to the chat model.
- Python 3.13 for local development
- Docker Desktop if you want the Docker-first flow
- An OpenAI API key
Create a .env file from the template:
cp .env.template .envThen set at least:
OPENAI_API_KEY=your_key_hereThe default template uses:
OPENAI_CHAT_MODEL=gpt-4.1-miniOPENAI_EMBEDDING_MODEL=text-embedding-3-smallREDIS_INDEX_BASENAME=arxivREDIS_URL=redis://arxivchatguru-redis:6379
Docker is the primary local path.
make docker-upThen open:
http://localhost:8501
To stop the stack:
make docker-downInstall Poetry if you do not already have it:
python3 -m pip install --user poetryUse Python 3.13 for the project environment, install dependencies, and start the app:
python3 -m poetry env use python3.13
make install
make devThen open:
http://localhost:8501
If you run locally outside Docker, make sure REDIS_URL points at a reachable Redis instance such as redis://localhost:6379.
make formatformats the app and testsmake testruns the test suitemake buildbuilds the Docker imagemake devstarts Streamlit locallymake docker-upstarts the app with Docker Compose
After you load a topic from the main page, open the Streamlit stats page to inspect the active Redis index. It shows:
- Index metadata
- Indexed fields
- Query Engine stats for the active topic
- Add better metadata filters such as year or author
- Improve chunking strategy for long papers
- Add chat history or memory features only if the tutorial needs them

