Skip to content

daniel-stephens/TOVA

Repository files navigation

TOVA: Topic Visualization & Analysis

Functionalities

  1. Train traditional models (Tomotopy LDA, CTM) or LLM-based models (TopicGPT, OpenTopicRAG) from CLI or web under a common pattern, with a plug-in architecture that lets you extend TOVA with new topic-model classes easily (see class diagram below).
  2. Explore trained models via the dashboard: topic lists, top documents, coherence metrics, and similar-topic suggestions.
  3. Run inference on new corpora and download topic assignments/theta matrices.
tova_class_diagram_basic

Deployment options

1. CLI / local scripts

Use the CLI entry points directly from your workstation:

python3 -m venv .venv && source .venv/bin/activate
pip install --upgrade pip
pip install -e .            # installs the project defined in pyproject.toml

Example CLI invocation

python -m src.tova.cli.main train run \
 --model tomotopyLDA \
 --data data_test/bills_sample_100.csv \
 --text-col tokenized_text \
 --output data/models/tomotopy
    --tr-params '{"num_topics": 10, "num_iters": 50}'

This uses the package metadata declared in pyproject.toml ([tool.setuptools], dynamic dependencies, and optional extras) so everything required for CLI usage is installed automatically.

2. Python package (library) usage

To embed TOVA inside another project, add it as a dependency and install via pip:

pip install git+https://github.com/daniel-stephens/TOVA.git@master

or build a wheel with python -m build --wheel. The resulting package exposes the modules under tova.*, so you can import and orchestrate training/inference programmatically. Extras defined in pyproject.toml (for example pip install .[ui]) install the UI or Solr-specific requirements.

3. Docker / web service deployment (Makefile-driven)

Everything for containerized deployment happens through the Makefile that wraps docker compose. The Makefile targets are not needed for CLI/package usage; they exclusively orchestrate the Docker-based stack.

Quick start

  1. Create .env:
VERSION=0.1.0
ASSETS_DATE=20240601
  1. Kick off the stack:
make up        # build and start api, web, postgres, solr, solr-api, zoo
make down      # stop and remove all containers
make logs-api  # follow API logs

Building images

  • make build (use make rebuild-all for no-cache) – build builder, assets, api, web, solr-api images
  • make build-api (make rebuild-api) – build only the API image
  • make build-web (make rebuild-web) – build only the web UI image
  • make build-solr-api (make rebuild-solr-api) – build only the Solr API image
  • make rebuild-run – rebuild runtime services (api, web, solr-api) and start them

Running services

  • make up – build (if needed) and start everything
  • make down – stop and remove containers

Monitoring

  • make logs-api – stream API logs
  • make logs-web – stream web UI logs
  • make logs-solr-api – stream Solr API logs

Services

The application consists of:

  • API (port 8000) - Main FastAPI application
  • Web (port 8080) - Web UI interface
  • Solr API (port 8001) - Solr search interface
  • Solr (port 8983) - Apache Solr search engine
  • Postgres (port 5432) - Persistent metadata storage for user information
  • Zookeeper (ports 2180, 2181) - Coordination service for Solr

Configuration

The main configuration file is static/config/config.yaml. It aggregates:

  • LLM connectivity (llm): providers, API keys, reachable hosts, and model allowlists.
  • Topic modeling defaults (topic_modeling.general): shared options like llm_provider, prompts, and topic counts.
  • Per-model blocks (topic_modeling.traditional, topic_modeling.llm_based, opentopicrag, topicgpt, etc.) that override or extend the defaults.

LLM configuration

Large Language Models are configured through the llm section. Example:

# Global LLM connectivity: define API keys, hosts and model names for every provider the system may call.
llm:
  gpt: # OpenAI / Azure OpenAI settings
    path_api_key: OPENAI_API_KEY  # env var name read at runtime
    available_models: {...} # available models for this deployment type
  ollama: # Local Ollama endpoint exposed to containers
    host: http://0.0.0.0:11434 # bind Ollama externally so Docker can reach it
    available_models: {...}

# LLM provider/model references used during training (LLM-based models) or optional label/summarization helpers for traditional models
topic_modeling:
    general:
        llm_provider: "ollama"
        llm_model_type: "gemma3:4b"
        llm_server: "http://kumo01.tsc.uc3m.es:11434"
  • OpenAI / Azure OpenAI – save OPENAI_API_KEY (or whatever api_key_env names) in your .env file before invoking the CLI or starting Docker.
  • Ollama – when running inside Docker, the Ollama HTTP endpoint must be reachable from the containers. Launch Ollama with ollama serve --host 0.0.0.0 --port 11434 and update llm.ollama.host to an address visible from the API container (e.g. http://0.0.0.0:11434).
  • Update opentopicrag / topicgpt sections to point to custom prompts, samples, or iterations before starting jobs.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors