-
Notifications
You must be signed in to change notification settings - Fork 1.7k
Description
Bug Description
When regenerating a wiki with custom file exclusion/inclusion parameters (via the "Refresh Wiki" advanced options), the parameters are silently ignored if a cached database (.pkl file) already exists for the repository.
Root Cause
In api/data_pipeline.py, the prepare_db_index() method checks for an existing .pkl database file and returns cached documents immediately without considering the excluded_dirs, excluded_files, included_dirs, or included_files parameters:
# check the database
if self.repo_paths and os.path.exists(self.repo_paths["save_db_file"]):
logger.info("Loading existing database...")
try:
self.db = LocalDB.load_state(self.repo_paths["save_db_file"])
documents = self.db.get_transformed_data(key="split_and_embed")
if documents:
# ... validation ...
return documents # ← Returns cached DB, exclusion params never appliedThe exclusion parameters are only applied when creating a new database (the read_all_documents() call further down), but this code path is never reached when a cache exists.
Steps to Reproduce
- Index a repository (e.g. via "Refresh Wiki")
- Go to the wiki page and click "Refresh Wiki" again
- In advanced options, add files to the exclusion list (e.g.
README.md) - Generate the wiki
- Result: The generated wiki still references the excluded files
Expected Behavior
When custom file filters are provided, the database should be rebuilt with those filters applied, instead of returning stale cached data.
Environment
- deepwiki-open (latest main branch)
- Self-hosted Docker deployment