Hybrid BM25 + Vector Retrieval for SAP HANA

The gap

If you're doing RAG on SAP HANA Cloud, you've got vector search through langchain-hanadb. But no BM25. No keyword ranking. No hybrid retrieval.

That matters more than you think. When someone asks about "article 12" or "HR-LIQUI-2024/0003", semantic search returns vaguely related content. It understands the general topic but misses the exact reference. You need keyword matching for precision and vector search for understanding.

langchain-hana-retriever

I built langchain-hana-retriever to close this gap. It provides two LangChain-compatible retrievers that work directly on your HANA Cloud instance:

HANABm25Retriever

Uses LOCATE on NCLOB columns to fetch keyword candidates, then scores with BM25Okapi in Python. Pure keyword search with proper term frequency scoring.

HANAHybridRetriever

Combines vector and BM25 search, merging results via Reciprocal Rank Fusion. Best of both worlds — semantic understanding plus exact matches.

 Installation pip install langchain-hana-retriever
 # For hybrid search
 pip install "langchain-hana-retriever[hybrid]" 

Why hybrid matters

Vector search finds meaning, BM25 finds exact matches. Hybrid gives you both — reranked and merged.

In practice, this makes a huge difference for enterprise documents. Financial regulations reference specific article numbers. HR documents have internal codes. Contract clauses need exact matching. Pure vector search will get you "in the neighborhood" but miss the specific reference.

The hybrid retriever runs both searches in parallel, then uses Reciprocal Rank Fusion to merge the ranked results. Documents that score well on both semantic similarity and keyword relevance bubble to the top.

Key points

Drop-in LangChain retriever — works with any LangChain chain or agent
No extra infrastructure — runs on your existing HANA Cloud instance
Reciprocal Rank Fusion — intelligently merges vector + keyword results
BM25Okapi scoring — proper term frequency ranking, not just substring matching
MIT licensed — PRs welcome

Discussion

Comments live on LinkedIn. Drop a thought, ask a question, or share your own take.

Comment on LinkedIn →

Resources

Read on LinkedIn → SAP Community → GitHub →

Hybrid BM25 + VectorRetrieval for SAP HANA

The gap

langchain-hana-retriever

Why hybrid matters

Hybrid BM25 + Vector
Retrieval for SAP HANA