The problem
If you're building AI agents, you've probably noticed something: users ask the same questions over and over, just worded differently. "How do I reset my password?", "I need help with my password", "password reset process" — they all want the same answer.
But every single one of those triggers a full LLM API call. That's tokens spent, latency added, and money burned for something you've already answered.
This is one of those problems that seems small until you look at the numbers in production. A customer service agent handling hundreds of conversations a day, a document Q&A system where people keep asking about the same regulations, an internal help desk where the top 20 questions cover 80% of the traffic. Every repeated question is a wasted and expensive API call.
Semantic caching changes this
Instead of matching prompts word by word, semantic caching understands that "how do I reset my password" and "I forgot my password, help" mean the same thing. When a similar enough question comes in, it returns the answer you already have, instantly. No API call, no tokens, no waiting.
Store the response once, serve it forever — or until you decide it should expire.
langchain-hana-cache
I built and released langchain-hana-cache, an open-source package that brings
semantic caching to SAP HANA Cloud. It uses the same vector similarity engine that already
powers RAG retrieval in HANA, but instead of matching documents, it matches prompts.
It integrates with LangChain in two lines of code. The cache table lives in the same HANA instance where your business data already sits — no additional infrastructure to manage, no Redis to set up, no separate vector database. If you're already on BTP with HANA Cloud, you're ready to go.
The results
That's going from 8 seconds down to under half a second. And since the LLM never gets called on a cache hit, you consume zero tokens. For high-traffic agents on expensive models like Claude Opus or GPT-5, the savings add up fast.
- Uses HANA's vector similarity engine — the same one powering your RAG retrieval
- Two lines of LangChain integration — drop-in replacement for existing cache
- No extra infrastructure — lives in your existing HANA instance
- Configurable similarity threshold and TTL expiration
- MIT licensed, PRs welcome