Semantic Caching for SAP HANA Cloud

The problem

If you're building AI agents, you've probably noticed something: users ask the same questions over and over, just worded differently. "How do I reset my password?", "I need help with my password", "password reset process" — they all want the same answer.

But every single one of those triggers a full LLM API call. That's tokens spent, latency added, and money burned for something you've already answered.

This is one of those problems that seems small until you look at the numbers in production. A customer service agent handling hundreds of conversations a day, a document Q&A system where people keep asking about the same regulations, an internal help desk where the top 20 questions cover 80% of the traffic. Every repeated question is a wasted and expensive API call.

Semantic caching changes this

Instead of matching prompts word by word, semantic caching understands that "how do I reset my password" and "I forgot my password, help" mean the same thing. When a similar enough question comes in, it returns the answer you already have, instantly. No API call, no tokens, no waiting.

Store the response once, serve it forever — or until you decide it should expire.

langchain-hana-cache

I built and released langchain-hana-cache, an open-source package that brings semantic caching to SAP HANA Cloud. It uses the same vector similarity engine that already powers RAG retrieval in HANA, but instead of matching documents, it matches prompts.

Installation pip install langchain-hana-cache

It integrates with LangChain in two lines of code. The cache table lives in the same HANA instance where your business data already sits — no additional infrastructure to manage, no Redis to set up, no separate vector database. If you're already on BTP with HANA Cloud, you're ready to go.

The results

21x

Faster on cache hits

96%

Latency reduction

Tokens on cache hit

That's going from 8 seconds down to under half a second. And since the LLM never gets called on a cache hit, you consume zero tokens. For high-traffic agents on expensive models like Claude Opus or GPT-5, the savings add up fast.

Key points

Uses HANA's vector similarity engine — the same one powering your RAG retrieval
Two lines of LangChain integration — drop-in replacement for existing cache
No extra infrastructure — lives in your existing HANA instance
Configurable similarity threshold and TTL expiration
MIT licensed, PRs welcome

Discussion

Comments live on LinkedIn. Drop a thought, ask a question, or share your own take.

Comment on LinkedIn →

Resources

Read on LinkedIn → SAP Community → GitHub →

Semantic Cachingfor SAP HANA Cloud

The problem

Semantic caching changes this

langchain-hana-cache

The results

Semantic Caching
for SAP HANA Cloud