Skip to content

Jaegerbawmb/VedaLens

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

11 Commits
 
 
 
 
 
 

Repository files navigation

VedaLens

A Retrieval-Augmented Generation (RAG) system for exploring the four Vedas — combining a local LLM, the Google Gemini API, and classical NLP techniques to deliver contextual answers and deep textual analysis from one of humanity's oldest bodies of knowledge.


Features

Feature Description
Automated NLTK Setup Checks for and downloads required NLTK data (punkt, wordnet, stopwords) on first run
Text Preprocessing Lowercasing, tokenization, non-alphabetic removal, stop word filtering (including Vedic terms like thou, hymn, veda), and lemmatization
Text Statistics Reports total word count, unique word count, and top frequent terms after preprocessing
Topic Modeling (LDA) Identifies underlying themes across hymns using Latent Dirichlet Allocation
TF-IDF Keyword Extraction Surfaces important, document-specific keywords for individual hymns
Collocation Analysis Discovers significant bigrams and trigrams — frequently co-occurring word pairs and triplets
Contextual AI Explanations Uses the Gemini API to generate cultural, religious, and ritualistic explanations for identified collocations

Prerequisites

  • Python 3.x
  • The Vedic text file Four-Vedas-English-Translation.txt placed in the same directory as the script
  • A Google Gemini API key — obtain one from Google AI Studio

Install Dependencies

pip install nltk scikit-learn gensim google-generativeai pandas numpy

Configuration

The following parameters can be adjusted directly in the script:

Parameter Default Description
file_path Four-Vedas-English-Translation.txt Path to the input text file
num_topics 5 Number of themes for LDA to discover
custom_stopwords (set in script) Words to exclude from analysis
Bigram frequency filter 5 Minimum occurrences for a bigram to be considered
Trigram frequency filter 3 Minimum occurrences for a trigram to be considered
Gemini model gemini-1.5-flash Gemini model used for contextual explanations

Troubleshooting

FileNotFoundError Verify that Four-Vedas-English-Translation.txt (or your custom file_path) exists in the same directory as the script.

MemoryError during LDA Reduce num_topics or increase the passes parameter in LdaModel to ease memory pressure on large corpora.

Gemini API errors (404, etc.) Check that your API key is valid and has the necessary permissions. Also confirm that gemini-1.5-flash is still a supported model name in the current API version.

About

Retrieval-Augmented Generation (RAG) system integrating a local Large Language Model and an external API to deliver contextual answers from the four Vedas, utilizing NLP and prompt engineering

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors