The problem: vector search isn’t “solved” anymore
If you built a RAG app in 2023, you probably treated the vector database as a black box: embed, upsert, query, done. That workflow still works, but it’s no longer competitive on its own.
Over the last 18–24 months, vector databases have evolved into retrieval engines that blend dense vectors, sparse retrieval (BM25-style), and metadata-aware filtering while pushing hard on lower latency, lower cost, and simpler operations.
This post breaks down the latest advancements that matter in real systems and how to choose a direction without getting trapped in hype.
1) Hybrid retrieval goes mainstream (dense + sparse)
Pure dense similarity search is great at semantic matching, but it can miss exact terms, numbers, product codes, and “must include” keywords. The modern answer is hybrid search: combine dense embeddings with sparse signals (BM25 or learned sparse models) to get better recall and precision.
What changed recently
- Built-in sparse indexing and scoring is now a first-class feature in several engines, not just an external pipeline.
- Single-query hybrid is becoming standard: one request produces a fused ranking instead of stitching results in your app.
- Learned sparse vectors (for example, SPLADE-style approaches) are increasingly treated as peers to BM25 rather than research curiosities.
Why it matters for RAG quality
- Fewer “I can’t find it” failures when the answer contains exact tokens (IDs, error codes, SKUs, legal clauses).
- Better controllability for enterprise search where terms must match.
- More stable relevance across document types (FAQs, tickets, specs, long PDFs).
Practical guidance
If you do Q&A over technical docs, policies, contracts, logs, or support content, assume you’ll want hybrid retrieval. Treat “dense-only” as a baseline, not an end state.
2) Filtering + ANN finally works the way developers expect
In real products, retrieval isn’t “search everything.” It’s “search the right subset” by tenant, permission, region, time window, product line, or workflow stage. Historically, metadata filtering plus approximate nearest neighbor (ANN) caused surprises: slow queries, empty results, or messy tuning.
What’s improving
- Smarter planning for filtered queries: engines are getting better at deciding when to use ANN vs. alternate plans.
- Iterative/expanding scans: if a strict filter returns too few candidates, the system can intelligently widen the scan to satisfy k results without you rewriting logic.
- More expressive filtering: richer boolean logic and improved performance under heavy multi-tenant workloads.
How to design your schema for today’s engines
- Model tenancy and permissions explicitly (tenant_id, org_id, visibility, ACL group IDs).
- Keep filters selective but not overly fragmented (avoid a unique filter value per document if you can).
- Index what you filter on (or use an engine that handles filter-aware ANN efficiently).
3) “Vector database” now includes text search and doc storage
A major trend: vector systems are absorbing adjacent features so teams can ship faster with fewer components.
What’s being pulled into the database layer
- Full-text search so you can run BM25-style retrieval next to semantic retrieval.
- Doc-in, doc-out flows: store raw text or documents, not just embeddings.
- Tokenization and sparse vector generation integrated into ingestion pipelines.
When this is a win
- Your team wants one retrieval service rather than a separate keyword search stack.
- You need to iterate relevance quickly without rebuilding multiple systems.
- You prefer operational simplicity over assembling best-of-breed components.
When you should still keep components separate
- You already have a mature search stack (or strict compliance constraints) and only need vector similarity as an add-on.
- Your retrieval needs are extreme (very high QPS, complex ranking, heavy analytics) and you want specialized tooling.
4) Multi-vector and multimodal support becomes a default expectation
The early pattern was one embedding per chunk. Now, many serious use cases require multiple vectors per entity:
- Dense + sparse representations for the same document
- Multiple dense embeddings (general semantic + domain-tuned + reranker embeddings)
- Multimodal retrieval across text, images, audio, or video features
What’s changing in data modeling
Vector stores increasingly support richer structures: multiple vector fields, weighted scoring strategies, and query-time fusion. That shifts the mindset from “store vectors” to “store representations.”
What to do in your RAG pipeline
- Start with one dense embedding to ship quickly.
- Add sparse as soon as you see misses on exact terms.
- Add a reranking step before you add your third embedding model. It often delivers the biggest relevance lift per unit of complexity.
5) Postgres keeps closing the gap (and that changes architecture)
A quiet but important advancement is the continued improvement of vector search inside Postgres via extensions. This matters because it enables a powerful default: one database for relational + vectors.
Why teams like the Postgres path
- Fewer moving parts: transactions, metadata, and embeddings live together.
- Existing tooling: backups, observability, ORMs, migrations, and security are already solved.
- Good-enough performance for many production RAG workloads when indexes are tuned.
Where specialized vector databases still win
- Large-scale ANN with tight latency targets
- High-ingest streaming workloads and massive collections
- Advanced retrieval features (built-in hybrid ranking, tiering, specialized indexing options)
6) Serverless and cost-aware scaling mature
Vector workloads are spiky: ingestion jobs, bursty chat traffic, and periodic re-embedding can swing compute needs dramatically. That pushed vendors to invest heavily in serverless, consumption-based pricing, and elastic scaling.
What “serverless vector search” typically implies
- Automatic scaling of compute and storage
- Reduced capacity planning
- Faster experimentation (spin up indexes without cluster design)
What to watch out for
- Performance variability under cold starts or unpredictable scaling behavior
- Cost opacity if you don’t instrument query volume, payload sizes, and top-k usage
- Operational limits (quotas, region availability, feature differences vs. dedicated deployments)
7) Better integration with graph and structured queries
RAG doesn’t live in a vacuum. Users ask questions that implicitly reference relationships: org charts, ownership, dependencies, supply chain, citations, and time-based sequences. That’s why we’re seeing deeper integration between vector search and graph/relational querying.
What this unlocks
- Context you can explain: “this answer comes from these related entities”
- Policy-safe retrieval: graph edges can encode permissions and data lineage
- Higher precision: similarity search provides candidates; graph constraints validate and refine
8) Interoperability becomes a real concern (API fragmentation)
The ecosystem grew fast, and every engine exposes different APIs, query semantics, and filtering behavior. As vector search becomes core infrastructure, teams are feeling the pain of vendor lock-in and portability.
How to future-proof your application
- Define a retrieval interface in your codebase (upsert, delete, query, hybrid query, fetch-by-id).
- Keep your chunking + embedding pipeline independent of your database choice.
- Log retrieval inputs/outputs so you can regression-test relevance when switching engines or models.
Implementation checklist: shipping modern retrieval without overengineering
Baseline (week 1)
- Chunking with stable IDs
- One dense embedding per chunk
- Metadata fields: tenant_id, doc_id, source, updated_at
- Top-k semantic search + basic filters
Production-hardening (weeks 2–4)
- Add hybrid (BM25 or learned sparse) if you see misses on exact terms
- Add a reranker for better ordering
- Adopt iterative scans or filter-aware strategies to prevent empty result sets
- Instrument latency, recall proxies, and cost per query
Scale and quality (month 2+)
- Multi-vector strategies (dense + sparse + domain-tuned)
- Tiering/hot-cold storage if supported
- Graph/relational constraints for higher precision and safer retrieval
Summary
Vector database “latest advancements” aren’t about a single new index. The big shift is that retrieval is becoming a complete system: hybrid search, filter-aware ANN, multi-vector modeling, and operational simplicity through serverless and tighter integration with text/structured queries.
If you’re planning your next iteration, prioritize hybrid retrieval and robust filtering first. Those two upgrades tend to produce the biggest real-world relevance gains.
Call to action
If you’re building or improving RAG, you’ll move faster with a workspace that can test models, retrieval strategies, and prompts side-by-side. Projectchat.ai gives you multimodal chat from all providers, image generation models, and Agentic/Hybrid RAG over your own data so you can create dedicated workspaces and projects for each use case. Start a trial here: https://projectchat.ai/trial/


