Skip to content

Multi-Index Support for Third-Party Libraries #24

@lexasub

Description

@lexasub

Summary

Investigate whether adding support for indexing third-party libraries (manual, path-based) is worth implementing in ast-rag, or if this should be a separate tool.
                                                                           
---                                                                        
                                                                           
Proposed Feature
                                                                           
Manual library indexing:                                                   
                                                                           
 1 ast-rag index-lib --name requests --path /path/to/site-packages/requests
 2 ast-rag search "session timeout" --lib requests                         
                                                                                                                                                          
Key characteristics:                                                                                                                                      
 - User explicitly specifies which libraries to index (no auto-scanning)                                                                                  
 - Separate indexes from project code                                                                                                                     
 - Search scoped by --lib flag (project-only by default)                                                                                                  
                                                                                                                                                          
---                                                                                                                                                       
                                                                                                                                                          
Research Questions                                                                                                                                        
                                                                                                                                                          
                                                                                                                                                          
┌───────────────────────────┬───────────────────────────────────────────────────┐                                                        
│ Question                  │ Why It Matters                                    │                                                                         
├───────────────────────────┼───────────────────────────────────────────────────┤                    
│ Quality improvement       │ How much does library context help LLM responses? │                                                                         
│ Code complexity           │ How many LOC changes to support multiple indexes? │                                                                                                                                                                                                                                       
│ Storage overhead          │ How much disk space per library index?            │                                                                         
│ Search performance        │ Does multi-index slow down queries?               │                                                                         
│ In-scope or separate tool │ Should ast-rag handle this, or a companion tool?  │
└───────────────────────────┴───────────────────────────────────────────────────┘

                                                                           
---
                                                                           
Evaluation Criteria                                                        
                                                                           
Implement in ast-rag if:                                                                                                                                  
 - ✅ < 2000 LOC changes to core                                 
 - ✅ < 50 MB per typical library index                                    
 - ✅ No measurable search slowdown                                                                                                                       
 - ✅ Clear user value (better code completion / search results)
                                                                           
Separate tool if:                                          
 - ❌ Requires major refactoring of IndexManager 
 - ❌ Indexes > 100 MB for common libraries (requests, django)             
 - ❌ Duplicates work that ripgrep / sourcegraph already do
 - ❌ Better as IDE extension (VSCode, JetBrains)
                                                                           
---
                                                                           
Suggested Approach                   
                                                                           
Phase 1: POC                              
 1. Pick one library (e.g., requests)                 
 2. Manually index             
 3. Add --lib flag to search CLI                                           
 4. Measure: index size, indexing time, search latency 
                                                                           
Phase 2: Evaluate     
 - Does search quality improve for API-related queries?
 - Is the complexity acceptable?                                           
 - Go / no-go decision                                                     
                                                                           
---                                                                        
                                                                           
Alternatives to Consider                                                                                                                                  
                                                                                                                                                          
                                                                                                                                                          
┌────────────────────────────┬───────────────────────────────────────────┬───────────────────────────────────┐                                            
│ Alternative                │ Pros                                      │ Cons                              │                                            
├────────────────────────────┼───────────────────────────────────────────┼───────────────────────────────────┤                                            
│ Global library cache       │ One index per lib, shared across projects │ More complex management           │                                            
│ Lazy on-import indexing    │ Automatic when code imports lib           │ Magic behavior, harder to control │                                            
│ Separate `ast-rag-libs` tool │ Keeps core simple                         │ Another tool to maintain          │
│ IDE integration            │ Better UX (click-to-navigate)             │ Requires IDE plugin work          │
└────────────────────────────┴───────────────────────────────────────────┴───────────────────────────────────┘
                                                                           
---                                                                        

Deliverables

 - [ ] POC branch with single-library support
 - [ ] Benchmarks: index size, search latency
 - [ ] Writeup: recommended approach (in-core vs. separate)
 - [ ] If go: follow-up issue for full implementation

---

Labels

Metadata

Metadata

Assignees

No one assigned

    Labels

    Projects

    Status

    Spike/Need Analytics

    Status

    Backlog

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions