HAND-TAGGED >>> 991 SKILLS LIVE <<<* OPEN SOURCE *NO LOGIN, NO TRACKING FRESH DROPS WEEKLY HAND-TAGGED >>> 991 SKILLS LIVE <<<* OPEN SOURCE *NO LOGIN, NO TRACKING FRESH DROPS WEEKLY HAND-TAGGED >>> 991 SKILLS LIVE <<<* OPEN SOURCE *NO LOGIN, NO TRACKING FRESH DROPS WEEKLY HAND-TAGGED >>> 991 SKILLS LIVE <<<* OPEN SOURCE *NO LOGIN, NO TRACKING FRESH DROPS WEEKLY HAND-TAGGED >>> 991 SKILLS LIVE <<<* OPEN SOURCE *NO LOGIN, NO TRACKING FRESH DROPS WEEKLY HAND-TAGGED >>> 991 SKILLS LIVE <<<* OPEN SOURCE *NO LOGIN, NO TRACKING FRESH DROPS WEEKLY
← back to homepage
Optimize vector search performance fastSKILL #NING
Research

vector-index-tuning

Optimize vector search performance fast

Optimize vector index performance for latency, recall, and memory. Use when tuning HNSW parameters, selecting quantization strategies, or scaling vector search infrastructure.

↗ github · ★ 37k·src: wshobson/agents

the manual

Vector Index Tuning

Guide to optimizing vector indexes for production performance.

When to Use This Skill

  • Tuning HNSW parameters
  • Implementing quantization
  • Optimizing memory usage
  • Reducing search latency
  • Balancing recall vs speed
  • Scaling to billions of vectors

Core Concepts

1. Index Type Selection

Data Size           Recommended Index
────────────────────────────────────────
< 10K vectors  →    Flat (exact search)
10K - 1M       →    HNSW
1M - 100M      →    HNSW + Quantization
> 100M         →    IVF + PQ or DiskANN

2. HNSW Parameters

ParameterDefaultEffect
M16Connections per node, ↑ = better recall, more memory
efConstruction100Build quality, ↑ = better index, slower build
efSearch50Search quality, ↑ = better recall, slower search

3. Quantization Types

Full Precision (FP32): 4 bytes × dimensions
Half Precision (FP16): 2 bytes × dimensions
INT8 Scalar:           1 byte × dimensions
Product Quantization:  ~32-64 bytes total
Binary:                dimensions/8 bytes

Templates and detailed worked examples

Full template library and detailed worked examples live in references/details.md. Read that file when you need the concrete templates.

Best Practices

Do's

  • Benchmark with real queries - Synthetic may not represent production
  • Monitor recall continuously - Can degrade with data drift
  • Start with defaults - Tune only when needed
  • Use quantization - Significant memory savings
  • Consider tiered storage - Hot/cold data separation

Don'ts

  • Don't over-optimize early - Profile first
  • Don't ignore build time - Index updates have cost
  • Don't forget reindexing - Plan for maintenance
  • Don't skip warming - Cold indexes are slow

more research

Boost search results with hybrid methods
Research
HOT
Boost search results with hybrid methods
hybrid-search-implementation
0@ 0 37k
Build smarter AI with RAG systems
Research
HOT
Build smarter AI with RAG systems
rag-implementation
0@ 0 37k
Optimize your embedding models fast
Research
HOT
Optimize your embedding models fast
embedding-strategies
0@ 0 37k
Boost search speed with smart indexing
Research
HOT
Boost search speed with smart indexing
similarity-search-patterns
0@ 0 37k
protocolsio-integration
Research
HOT
protocolsio-integration
0@ 0 28k
perplexity-search
Research
HOT
perplexity-search
0@ 0 28k
transformer-lens-interpretability
Research
HOT
transformer-lens-interpretability
0@ 0 28k