# Long Context Chunking ## Overview The long context chunking system automatically handles documents that exceed embedding model context limits by splitting them into manageable chunks and computing averaged embeddings. ## Problem Solved When embedding very long documents or messages, you might encounter errors like: ``` Input length exceeds context length: 12453 tokens. Maximum length: 8192 tokens. ``` This plugin now handles such cases gracefully by: 1. Detecting context length errors before they cause failures 2. Automatically splitting the document into overlapping chunks 3. Embedding each chunk separately 4. Computing an averaged embedding that preserves semantic meaning ## How It Works ### Chunking Strategy The chunker uses a **semantic-aware** approach: - **Splits at sentence boundaries** when possible (better for preserving meaning) - **Configurable overlap** (default: 200 characters) to maintain context across chunks - **Adapts to model context limits** based on the embedding model - **Forced splits** at hard limits if sentence boundaries are not found ### Chunking Flow ``` Long Document │ ├── 8192+ characters ──┐ │ ▼ ┌─────────────────┐ │ Detect Overflow │ └────────┬────────┘ │ ▼ ┌─────────────────┐ │ Split into │ │ Overlapping │ │ Chunks │ └────────┬────────┘ │ ┌────────────────────┼────────────────────┐ │ │ │ ▼ ▼ ▼ ┌────────┐ ┌────────┐ ┌────────┐ │ Chunk 1│ │ Chunk 2│ │ Chunk 3│ │ [1-2k]│ │[1.8k-3.8k]│ │[3.6k-5.6k]│ └───┬────┘ └───┬────┘ └───┬────┘ │ │ │ ▼ ▼ ▼ Embedding Embedding Embedding │ │ │ └──────────────────┼──────────────────┘ │ ▼ Compute Average │ ▼ Final Embedding ``` ## Configuration ### Default Settings The chunker automatically adapts to your embedding model: - **maxChunkSize**: 70% of model context limit (e.g., 5734 for 8192-token model) - **overlapSize**: 5% of model context limit - **minChunkSize**: 10% of model context limit - **semanticSplit**: true (prefer sentence boundaries) - **maxLinesPerChunk**: 50 lines ### Disabling Auto-Chunking If you prefer to handle chunking manually or want the model to fail on long documents: ```json { "plugins": { "entries": { "memory-lancedb-pro": { "enabled": true, "config": { "embedding": { "apiKey": "${JINA_API_KEY}", "model": "jina-embeddings-v5-text-small", "chunking": false // Disable auto-chunking } } } } } } ``` ### Custom Chunking Parameters For advanced users who want to tune chunking behavior: ```json { "plugins": { "entries": { "memory-lancedb-pro": { "enabled": true, "config": { "embedding": { "autoChunk": { "maxChunkSize": 2000, // Characters per chunk "overlapSize": 500, // Overlap between chunks "minChunkSize": 500, // Minimum acceptable chunk size "semanticSplit": true, // Prefer sentence boundaries "maxLinesPerChunk": 100 // Max lines before forced split } } } } } } } ``` ## Supported Models The chunker automatically adapts to these embedding models: | Model | Context Limit | Chunk Size | Overlap | |-------|---------------|------------|----------| | Jina jina-embeddings-v5-text-small | 8192 | 5734 | 409 | | OpenAI text-embedding-3-small | 8192 | 5734 | 409 | | OpenAI text-embedding-3-large | 8192 | 5734 | 409 | | Gemini gemini-embedding-001 | 2048 | 1433 | 102 | ## Performance Considerations ### Token Savings - **Without chunking**: 1 failed embedding (retries required) - **With chunking**: 3-4 chunk embeddings (1 avg result) - **Net cost increase**: ~3x for long documents (>8k tokens) - **Trade-off**: Gracefully handling vs. processing smaller documents ### Caching Chunked embeddings are cached by their original document hash, so: - Subsequent requests for the same document get the cached averaged embedding - Cache hit rate improves as long documents are processed repeatedly ### Processing Time - **Small documents (<4k chars)**: No chunking, same as before - **Medium documents (4k-8k chars)**: No chunking, same as before - **Long documents (>8k chars)**: ~100-200ms additional chunking overhead ## Logging & Debugging ### Enable Debug Logging To see chunking in action, you can check the logs: ``` Document exceeded context limit (...), attempting chunking... Split document into 3 chunks for embedding Successfully embedded long document as 3 averaged chunks ``` ### Common Scenarios **Scenario 1: Long memory text** - When a user's message or system prompt is very long - Automatically chunked before embedding - No error thrown, memory is still stored and retrievable **Scenario 2: Batch embedding long documents** - If some documents in a batch exceed limits - Only the long ones are chunked - Successful documents processed normally ## Troubleshooting ### Chunking Still Fails If you still see context length errors: 1. **Verify model**: Check which embedding model you're using 2. **Increase minChunkSize**: May need smaller chunks for some models 3. **Disable autoChunk**: Handle chunking manually with explicit split ### Too Many Small Chunks If chunking creates many tiny fragments: 1. **Increase minChunkSize**: Larger minimum chunk size 2. **Reduce overlap**: Less overlap between chunks means more efficient chunks ### Embedding Quality Degradation If chunked embeddings seem less accurate: 1. **Increase overlap**: More context between chunks preserves relationships 2. **Use smaller maxChunkSize**: Split into more, smaller overlapping pieces 3. **Consider hierarchical approach**: Use a two-pass retrieval (chunk → document → full text) ## Future Enhancements Planned improvements: - [ ] **Hierarchical chunking**: Chunk → document-level embedding - [ ] **Sliding window**: Different overlap strategies per document complexity - [ ] **Smart summarization**: Summarize chunks before averaging for better quality - [ ] **Context-aware overlap**: Dynamic overlap based on document complexity - [ ] **Async chunking**: Process chunks in parallel for batch operations ## Technical Details ### Algorithm 1. **Detect overflow**: Check if document exceeds maxChunkSize 2. **Split semantically**: Find sentence boundaries within target range 3. **Create overlap**: Include overlap with previous chunk's end 4. **Embed in parallel**: Process all chunks simultaneously 5. **Average the result**: Compute mean embedding across all chunks ### Complexity - **Time**: O(n × k) where n = number of chunks, k = average chunk processing time - **Space**: O(n × d) where d = embedding dimension ### Edge Cases | Case | Handling | |------|----------| | Empty document | Returns empty embedding immediately | | Very small documents | No chunking, normal processing | | Perfect boundaries | Split at sentence ends, no truncation | | No boundaries found | Hard split at max position | | Single oversized chunk | Process as-is, let provider error | | All chunks too small | Last chunk takes remaining text | ## References - [LanceDB Documentation](https://lancedb.com) - [OpenAI Embedding Context Limits](https://platform.openai.com/docs/guides/embeddings) - [Semantic Chunking Research](https://arxiv.org/abs/2310.05970) --- *This feature was added to handle long-context documents gracefully without losing memory quality.*