VedaLens

A Retrieval-Augmented Generation (RAG) system for exploring the four Vedas — combining a local LLM, the Google Gemini API, and classical NLP techniques to deliver contextual answers and deep textual analysis from one of humanity's oldest bodies of knowledge.

Features

Feature	Description
Automated NLTK Setup	Checks for and downloads required NLTK data (`punkt`, `wordnet`, `stopwords`) on first run
Text Preprocessing	Lowercasing, tokenization, non-alphabetic removal, stop word filtering (including Vedic terms like thou, hymn, veda), and lemmatization
Text Statistics	Reports total word count, unique word count, and top frequent terms after preprocessing
Topic Modeling (LDA)	Identifies underlying themes across hymns using Latent Dirichlet Allocation
TF-IDF Keyword Extraction	Surfaces important, document-specific keywords for individual hymns
Collocation Analysis	Discovers significant bigrams and trigrams — frequently co-occurring word pairs and triplets
Contextual AI Explanations	Uses the Gemini API to generate cultural, religious, and ritualistic explanations for identified collocations

Prerequisites

Python 3.x
The Vedic text file Four-Vedas-English-Translation.txt placed in the same directory as the script
A Google Gemini API key — obtain one from Google AI Studio

Install Dependencies

pip install nltk scikit-learn gensim google-generativeai pandas numpy

Configuration

The following parameters can be adjusted directly in the script:

Parameter	Default	Description
`file_path`	`Four-Vedas-English-Translation.txt`	Path to the input text file
`num_topics`	`5`	Number of themes for LDA to discover
`custom_stopwords`	(set in script)	Words to exclude from analysis
Bigram frequency filter	`5`	Minimum occurrences for a bigram to be considered
Trigram frequency filter	`3`	Minimum occurrences for a trigram to be considered
Gemini model	`gemini-1.5-flash`	Gemini model used for contextual explanations

Troubleshooting

FileNotFoundError Verify that Four-Vedas-English-Translation.txt (or your custom file_path) exists in the same directory as the script.

MemoryError during LDA Reduce num_topics or increase the passes parameter in LdaModel to ease memory pressure on large corpora.

Gemini API errors (404, etc.) Check that your API key is valid and has the necessary permissions. Also confirm that gemini-1.5-flash is still a supported model name in the current API version.

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
Four-Vedas-English-Translation.txt		Four-Vedas-English-Translation.txt
README.md		README.md
VedaLens.ipynb		VedaLens.ipynb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

VedaLens

Features

Prerequisites

Install Dependencies

Configuration

Troubleshooting

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

VedaLens

Features

Prerequisites

Install Dependencies

Configuration

Troubleshooting

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages