Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 10 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -234,3 +234,13 @@ tests/data

# Local working directory (personal scripts, docs, tools)
local/
nitin_docs/
nitin_scripts/

# Local notebooks (kept for development, not committed)
docs/user_guide/13_index_migrations.ipynb

# Migration temp files (generated by rvl migrate commands)
migration_plan.yaml
migration_report.yaml
schema_patch.yaml
165 changes: 165 additions & 0 deletions AGENTS.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
# AGENTS.md - RedisVL Project Context

## Frequently Used Commands

```bash
# Development workflow
make install # Install dependencies
make format # Format code (black + isort)
make check-types # Run mypy type checking
make lint # Run all linting (format + types)
make test # Run tests (no external APIs)
make test-all # Run all tests (includes API tests)
make check # Full check (lint + test)

# Redis setup
make redis-start # Start Redis container
make redis-stop # Stop Redis container

# Documentation
make docs-build # Build documentation
make docs-serve # Serve docs locally
```

Pre-commit hooks are also configured, which you should
run before you commit:
```bash
pre-commit run --all-files
```

## Important Architectural Patterns

### Async/Sync Dual Interfaces
- Most core classes have both sync and async versions (e.g., `SearchIndex` / `AsyncSearchIndex`)
- Follow existing patterns when adding new functionality

### Schema-Driven Design
```python
# Index schemas define structure
schema = IndexSchema.from_yaml("schema.yaml")
index = SearchIndex(schema, redis_url="redis://localhost:6379")
```

## Critical Rules

### Do Not Modify
- **CRITICAL**: Do not change this line unless explicitly asked:
```python
token.strip().strip(",").replace(""", "").replace(""", "").lower()
```

### Git Operations
**CRITICAL**: NEVER use `git push` or attempt to push to remote repositories. The user will handle all git push operations.

### Branch and Commit Policy
**IMPORTANT**: Use conventional branch names and conventional commits.

Branch naming:
- Human-created branches should use `<type>/<short-kebab-description>`
- Automation-created branches may use `codex/<type>/<short-kebab-description>`
- Preferred branch types: `feat`, `fix`, `docs`, `refactor`, `test`, `chore`, `perf`, `build`, `ci`
- Examples:
- `feat/index-migrator`
- `fix/async-sentinel-pool`
- `docs/index-migrator-benchmarking`
- `codex/feat/index-migrator`

Commit messages:
- Use Conventional Commits: `<type>(optional-scope): <summary>`
- Preferred commit types: `feat`, `fix`, `docs`, `refactor`, `test`, `chore`, `perf`, `build`, `ci`
- Examples:
- `feat(migrate): add drop recreate planning docs`
- `docs(index-migrator): add benchmarking guidance`
- `fix(cli): validate migrate plan inputs`

### Code Quality
**IMPORTANT**: Always run `make format` before committing code to ensure proper formatting and linting compliance.

### README.md Maintenance
**IMPORTANT**: DO NOT modify README.md unless explicitly requested.

**If you need to document something, use these alternatives:**
- Development info → CONTRIBUTING.md
- API details → docs/ directory
- Examples → docs/examples/
- Project memory (explicit preferences, directives, etc.) → AGENTS.md

## Code Style Preferences

### Import Organization
- **Prefer module-level imports** by default for clarity and standard Python conventions
- **Use local/inline imports only when necessary** for specific reasons:
- Avoiding circular import dependencies
- Improving startup time for heavy/optional dependencies
- Lazy loading for performance-critical paths
- When using local imports, add a brief comment explaining why (e.g., `# Local import to avoid circular dependency`)

### Comments and Output
- **No emojis in code comments or print statements**
- Keep comments professional and focused on technical clarity
- Use emojis sparingly only in user-facing documentation (markdown files), not in Python code

### General Guidelines
- Follow existing patterns in the RedisVL codebase
- Maintain consistency with the project's established conventions
- Run `make format` before committing to ensure code quality standards

## Testing Notes
RedisVL uses `pytest` with `testcontainers` for testing.

- `make test` - unit tests only (no external APIs)
- `make test-all` - run the full suite, including tests that call external APIs
- `pytest --run-api-tests` - explicitly run API-dependent tests (e.g., LangCache,
external vectorizer/reranker providers). These require the appropriate API
keys and environment variables to be set.

## Project Structure

```
redisvl/
├── cli/ # Command-line interface (rvl command)
├── extensions/ # AI extensions (cache, memory, routing)
│ ├── cache/ # Semantic caching for LLMs
│ ├── llmcache/ # LLM-specific caching
│ ├── message_history/ # Chat history management
│ ├── router/ # Semantic routing
│ └── session_manager/ # Session management
├── index/ # SearchIndex classes (sync/async)
├── query/ # Query builders (Vector, Range, Filter, Count)
├── redis/ # Redis client utilities
├── schema/ # Index schema definitions
└── utils/ # Utilities (vectorizers, rerankers, optimization)
├── rerank/ # Result reranking
└── vectorize/ # Embedding providers integration
```

## Core Components

### 1. Index Management
- `SearchIndex` / `AsyncSearchIndex` - Main interface for Redis vector indices
- `IndexSchema` - Define index structure with fields (text, tags, vectors, etc.)
- Support for JSON and Hash storage types

### 2. Query System
- `VectorQuery` - Semantic similarity search
- `RangeQuery` - Vector search within distance range
- `FilterQuery` - Metadata filtering and full-text search
- `CountQuery` - Count matching records
- Etc.

### 3. AI Extensions
- `SemanticCache` - LLM response caching with semantic similarity
- `EmbeddingsCache` - Cache for vector embeddings
- `MessageHistory` - Chat history with recency/relevancy retrieval
- `SemanticRouter` - Route queries to topics/intents

### 4. Vectorizers (Optional Dependencies)
- OpenAI, Azure OpenAI, Cohere, HuggingFace, Mistral, VoyageAI
- Custom vectorizer support
- Batch processing capabilities

## Documentation
- Main docs: https://docs.redisvl.com
- Built with Sphinx from `docs/` directory
- Includes API reference and user guides
- Example notebooks in documentation `docs/user_guide/...`
23 changes: 22 additions & 1 deletion CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -251,12 +251,33 @@ Before suggesting a new feature:

## Pull Request Process

1. **Fork and create a branch**: Create a descriptive branch name (e.g., `fix-search-bug` or `add-vector-similarity`)
1. **Fork and create a branch**: Use a conventional branch name such as `feat/index-migrator`, `fix/search-bug`, or `docs/vectorizer-guide`
2. **Make your changes**: Follow our coding standards and include tests
3. **Test thoroughly**: Ensure your changes work and don't break existing functionality
4. **Update documentation**: Add or update documentation as needed
5. **Submit your PR**: Include a clear description of what your changes do

### Branch Naming and Commit Messages

We use conventional branch names and Conventional Commits to keep history easy to scan and automate.

Branch naming:

- Use `<type>/<short-kebab-description>`
- Recommended types: `feat`, `fix`, `docs`, `refactor`, `test`, `chore`, `perf`, `build`, `ci`
- Examples:
- `feat/index-migrator`
- `fix/async-sentinel-pool`
- `docs/migration-benchmarking`

Commit messages:

- Use `<type>(optional-scope): <summary>`
- Examples:
- `feat(migrate): add drop recreate plan generation`
- `docs(index-migrator): add benchmark guidance`
- `fix(cli): reject unsupported migration diffs`

### Review Process

- The core team reviews Pull Requests regularly
Expand Down
91 changes: 90 additions & 1 deletion docs/concepts/field-attributes.md
Original file line number Diff line number Diff line change
Expand Up @@ -267,7 +267,7 @@ Key vector attributes:
- `dims`: Vector dimensionality (required)
- `algorithm`: `flat`, `hnsw`, or `svs-vamana`
- `distance_metric`: `COSINE`, `L2`, or `IP`
- `datatype`: `float16`, `float32`, `float64`, or `bfloat16`
- `datatype`: Vector precision (see table below)
- `index_missing`: Allow searching for documents without vectors

```yaml
Expand All @@ -281,6 +281,48 @@ Key vector attributes:
index_missing: true # Handle documents without embeddings
```

### Vector Datatypes

The `datatype` attribute controls how vector components are stored. Smaller datatypes reduce memory usage but may affect precision.

| Datatype | Bits | Memory (768 dims) | Use Case |
|----------|------|-------------------|----------|
| `float32` | 32 | 3 KB | Default. Best precision for most applications. |
| `float16` | 16 | 1.5 KB | Good balance of memory and precision. Recommended for large-scale deployments. |
| `bfloat16` | 16 | 1.5 KB | Better dynamic range than float16. Useful when embeddings have large value ranges. |
| `float64` | 64 | 6 KB | Maximum precision. Rarely needed. |
| `int8` | 8 | 768 B | Integer quantization. Significant memory savings with some precision loss. |
| `uint8` | 8 | 768 B | Unsigned integer quantization. For embeddings with non-negative values. |

**Algorithm Compatibility:**

| Datatype | FLAT | HNSW | SVS-VAMANA |
|----------|------|------|------------|
| `float32` | Yes | Yes | Yes |
| `float16` | Yes | Yes | Yes |
| `bfloat16` | Yes | Yes | No |
| `float64` | Yes | Yes | No |
| `int8` | Yes | Yes | No |
| `uint8` | Yes | Yes | No |

**Choosing a Datatype:**

- **Start with `float32`** unless you have memory constraints
- **Use `float16`** for production systems with millions of vectors (50% memory savings, minimal precision loss)
- **Use `int8`/`uint8`** only after benchmarking recall on your specific dataset
- **SVS-VAMANA users**: Must use `float16` or `float32`

**Quantization with the Migrator:**

You can change vector datatypes on existing indexes using the migration wizard:

```bash
rvl migrate wizard --index my_index --url redis://localhost:6379
# Select "Update field" > choose vector field > change datatype
```

The migrator automatically re-encodes stored vectors to the new precision. See {doc}`/user_guide/how_to_guides/migrate-indexes` for details.

## Redis-Specific Subtleties

### Modifier Ordering
Expand All @@ -304,6 +346,53 @@ Not all attributes work with all field types:
| `unf` | ✓ | ✗ | ✓ | ✗ | ✗ |
| `withsuffixtrie` | ✓ | ✓ | ✗ | ✗ | ✗ |

### Migration Support

The migration wizard (`rvl migrate wizard`) supports updating field attributes on existing indexes. The table below shows which attributes can be updated via the wizard vs requiring manual schema patch editing.

**Wizard Prompts:**

| Attribute | Text | Tag | Numeric | Geo | Vector |
|-----------|------|-----|---------|-----|--------|
| `sortable` | Wizard | Wizard | Wizard | Wizard | N/A |
| `index_missing` | Wizard | Wizard | Wizard | Wizard | N/A |
| `index_empty` | Wizard | Wizard | N/A | N/A | N/A |
| `no_index` | Wizard | Wizard | Wizard | Wizard | N/A |
| `unf` | Wizard* | N/A | Wizard* | N/A | N/A |
| `separator` | N/A | Wizard | N/A | N/A | N/A |
| `case_sensitive` | N/A | Wizard | N/A | N/A | N/A |
| `no_stem` | Wizard | N/A | N/A | N/A | N/A |
| `weight` | Wizard | N/A | N/A | N/A | N/A |
| `algorithm` | N/A | N/A | N/A | N/A | Wizard |
| `datatype` | N/A | N/A | N/A | N/A | Wizard |
| `distance_metric` | N/A | N/A | N/A | N/A | Wizard |
| `m`, `ef_construction` | N/A | N/A | N/A | N/A | Wizard |

*\* `unf` is only prompted when `sortable` is enabled.*

**Manual Schema Patch Required:**

| Attribute | Notes |
|-----------|-------|
| `phonetic_matcher` | Enable phonetic search |
| `withsuffixtrie` | Suffix/contains search optimization |

**Example manual patch** for adding `index_missing` to a field:

```yaml
# schema_patch.yaml
version: 1
changes:
update_fields:
- name: category
attrs:
index_missing: true
```

```bash
rvl migrate plan --index my_index --schema-patch schema_patch.yaml
```

### JSON Path for Nested Fields

When using JSON storage, use the `path` attribute to index nested fields:
Expand Down
Loading
Loading