AssetManager.UniApp/plugins/memory-lancedb-pro/docs/long-context-chunking.md

259 lines
8.5 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Long Context Chunking
## Overview
The long context chunking system automatically handles documents that exceed embedding model context limits by splitting them into manageable chunks and computing averaged embeddings.
## Problem Solved
When embedding very long documents or messages, you might encounter errors like:
```
Input length exceeds context length: 12453 tokens. Maximum length: 8192 tokens.
```
This plugin now handles such cases gracefully by:
1. Detecting context length errors before they cause failures
2. Automatically splitting the document into overlapping chunks
3. Embedding each chunk separately
4. Computing an averaged embedding that preserves semantic meaning
## How It Works
### Chunking Strategy
The chunker uses a **semantic-aware** approach:
- **Splits at sentence boundaries** when possible (better for preserving meaning)
- **Configurable overlap** (default: 200 characters) to maintain context across chunks
- **Adapts to model context limits** based on the embedding model
- **Forced splits** at hard limits if sentence boundaries are not found
### Chunking Flow
```
Long Document
├── 8192+ characters ──┐
┌─────────────────┐
│ Detect Overflow │
└────────┬────────┘
┌─────────────────┐
│ Split into │
│ Overlapping │
│ Chunks │
└────────┬────────┘
┌────────────────────┼────────────────────┐
│ │ │
▼ ▼ ▼
┌────────┐ ┌────────┐ ┌────────┐
│ Chunk 1│ │ Chunk 2│ │ Chunk 3│
│ [1-2k]│ │[1.8k-3.8k]│ │[3.6k-5.6k]│
└───┬────┘ └───┬────┘ └───┬────┘
│ │ │
▼ ▼ ▼
Embedding Embedding Embedding
│ │ │
└──────────────────┼──────────────────┘
Compute Average
Final Embedding
```
## Configuration
### Default Settings
The chunker automatically adapts to your embedding model:
- **maxChunkSize**: 70% of model context limit (e.g., 5734 for 8192-token model)
- **overlapSize**: 5% of model context limit
- **minChunkSize**: 10% of model context limit
- **semanticSplit**: true (prefer sentence boundaries)
- **maxLinesPerChunk**: 50 lines
### Disabling Auto-Chunking
If you prefer to handle chunking manually or want the model to fail on long documents:
```json
{
"plugins": {
"entries": {
"memory-lancedb-pro": {
"enabled": true,
"config": {
"embedding": {
"apiKey": "${JINA_API_KEY}",
"model": "jina-embeddings-v5-text-small",
"chunking": false // Disable auto-chunking
}
}
}
}
}
}
```
### Custom Chunking Parameters
For advanced users who want to tune chunking behavior:
```json
{
"plugins": {
"entries": {
"memory-lancedb-pro": {
"enabled": true,
"config": {
"embedding": {
"autoChunk": {
"maxChunkSize": 2000, // Characters per chunk
"overlapSize": 500, // Overlap between chunks
"minChunkSize": 500, // Minimum acceptable chunk size
"semanticSplit": true, // Prefer sentence boundaries
"maxLinesPerChunk": 100 // Max lines before forced split
}
}
}
}
}
}
}
```
## Supported Models
The chunker automatically adapts to these embedding models:
| Model | Context Limit | Chunk Size | Overlap |
|-------|---------------|------------|----------|
| Jina jina-embeddings-v5-text-small | 8192 | 5734 | 409 |
| OpenAI text-embedding-3-small | 8192 | 5734 | 409 |
| OpenAI text-embedding-3-large | 8192 | 5734 | 409 |
| Gemini gemini-embedding-001 | 2048 | 1433 | 102 |
## Performance Considerations
### Token Savings
- **Without chunking**: 1 failed embedding (retries required)
- **With chunking**: 3-4 chunk embeddings (1 avg result)
- **Net cost increase**: ~3x for long documents (>8k tokens)
- **Trade-off**: Gracefully handling vs. processing smaller documents
### Caching
Chunked embeddings are cached by their original document hash, so:
- Subsequent requests for the same document get the cached averaged embedding
- Cache hit rate improves as long documents are processed repeatedly
### Processing Time
- **Small documents (<4k chars)**: No chunking, same as before
- **Medium documents (4k-8k chars)**: No chunking, same as before
- **Long documents (>8k chars)**: ~100-200ms additional chunking overhead
## Logging & Debugging
### Enable Debug Logging
To see chunking in action, you can check the logs:
```
Document exceeded context limit (...), attempting chunking...
Split document into 3 chunks for embedding
Successfully embedded long document as 3 averaged chunks
```
### Common Scenarios
**Scenario 1: Long memory text**
- When a user's message or system prompt is very long
- Automatically chunked before embedding
- No error thrown, memory is still stored and retrievable
**Scenario 2: Batch embedding long documents**
- If some documents in a batch exceed limits
- Only the long ones are chunked
- Successful documents processed normally
## Troubleshooting
### Chunking Still Fails
If you still see context length errors:
1. **Verify model**: Check which embedding model you're using
2. **Increase minChunkSize**: May need smaller chunks for some models
3. **Disable autoChunk**: Handle chunking manually with explicit split
### Too Many Small Chunks
If chunking creates many tiny fragments:
1. **Increase minChunkSize**: Larger minimum chunk size
2. **Reduce overlap**: Less overlap between chunks means more efficient chunks
### Embedding Quality Degradation
If chunked embeddings seem less accurate:
1. **Increase overlap**: More context between chunks preserves relationships
2. **Use smaller maxChunkSize**: Split into more, smaller overlapping pieces
3. **Consider hierarchical approach**: Use a two-pass retrieval (chunk → document → full text)
## Future Enhancements
Planned improvements:
- [ ] **Hierarchical chunking**: Chunk → document-level embedding
- [ ] **Sliding window**: Different overlap strategies per document complexity
- [ ] **Smart summarization**: Summarize chunks before averaging for better quality
- [ ] **Context-aware overlap**: Dynamic overlap based on document complexity
- [ ] **Async chunking**: Process chunks in parallel for batch operations
## Technical Details
### Algorithm
1. **Detect overflow**: Check if document exceeds maxChunkSize
2. **Split semantically**: Find sentence boundaries within target range
3. **Create overlap**: Include overlap with previous chunk's end
4. **Embed in parallel**: Process all chunks simultaneously
5. **Average the result**: Compute mean embedding across all chunks
### Complexity
- **Time**: O(n × k) where n = number of chunks, k = average chunk processing time
- **Space**: O(n × d) where d = embedding dimension
### Edge Cases
| Case | Handling |
|------|----------|
| Empty document | Returns empty embedding immediately |
| Very small documents | No chunking, normal processing |
| Perfect boundaries | Split at sentence ends, no truncation |
| No boundaries found | Hard split at max position |
| Single oversized chunk | Process as-is, let provider error |
| All chunks too small | Last chunk takes remaining text |
## References
- [LanceDB Documentation](https://lancedb.com)
- [OpenAI Embedding Context Limits](https://platform.openai.com/docs/guides/embeddings)
- [Semantic Chunking Research](https://arxiv.org/abs/2310.05970)
---
*This feature was added to handle long-context documents gracefully without losing memory quality.*