Skip to content

File exclusion parameters ignored when database cache exists #494

@MuLeiSY2021

Description

@MuLeiSY2021

Bug Description

When regenerating a wiki with custom file exclusion/inclusion parameters (via the "Refresh Wiki" advanced options), the parameters are silently ignored if a cached database (.pkl file) already exists for the repository.

Root Cause

In api/data_pipeline.py, the prepare_db_index() method checks for an existing .pkl database file and returns cached documents immediately without considering the excluded_dirs, excluded_files, included_dirs, or included_files parameters:

# check the database
if self.repo_paths and os.path.exists(self.repo_paths["save_db_file"]):
    logger.info("Loading existing database...")
    try:
        self.db = LocalDB.load_state(self.repo_paths["save_db_file"])
        documents = self.db.get_transformed_data(key="split_and_embed")
        if documents:
            # ... validation ...
            return documents  # ← Returns cached DB, exclusion params never applied

The exclusion parameters are only applied when creating a new database (the read_all_documents() call further down), but this code path is never reached when a cache exists.

Steps to Reproduce

  1. Index a repository (e.g. via "Refresh Wiki")
  2. Go to the wiki page and click "Refresh Wiki" again
  3. In advanced options, add files to the exclusion list (e.g. README.md)
  4. Generate the wiki
  5. Result: The generated wiki still references the excluded files

Expected Behavior

When custom file filters are provided, the database should be rebuilt with those filters applied, instead of returning stale cached data.

Environment

  • deepwiki-open (latest main branch)
  • Self-hosted Docker deployment

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions