The Problem: Context Loss in AI Conversations
Traditional chatbots treat each message as isolated. When a user asks "What's your pricing?" followed by "Tell me more about the Pro plan," the AI doesn't remember the first question. It responds generically or asks the user to repeat themselves.
Result: Frustrated customers, lower conversion rates, and missed sales opportunities.
The Architecture: Hierarchical Memory + Semantic Search
eKTextAi's AI engine uses a two-layer memory system:
1. Conversation Memory (Redis)
Stores the actual conversation history—what the user asked and how the AI responded. This enables follow-up questions and context continuity.
Memory Structure:
tenant:{tenantId}:user:{userId}:global:{browserSessionId}:chat:{chatSessionId}:history
This hierarchical key structure enables:
- • Tenant isolation (data security)
- • Multiple chat sessions per browser session
- • Efficient retrieval of recent conversation history
- • Automatic cleanup via TTL (24-hour default)
2. Knowledge Base Memory (ChromaDB + Vector Embeddings)
Stores your business content—product docs, FAQs, website content, Google Drive files—as vector embeddings. When a user asks a question, the AI searches this knowledge base for relevant information.
Search Process:
- User query is converted to a vector embedding (OpenAI embeddings API)
- Vector search finds top 5 most relevant documents from ChromaDB
- Retrieved documents are injected into the AI's system prompt
- AI generates response using both conversation history and knowledge base
How It Works: The Complete Flow
Here's what happens when a user sends a message:
Request Processing Flow
Request arrives with session IDs (browserSessionId, chatSessionId). System identifies user via email hash or userId.
RedisMemoryService retrieves conversation history for this chat session. Typically returns last 5-10 message pairs (user + assistant).
User query is embedded and searched against ChromaDB knowledge base. Top 5 relevant documents are retrieved.
System prompt is built with: (1) Instructions, (2) Retrieved knowledge base documents, (3) Conversation history, (4) Current user query.
OpenAI GPT-4o-mini generates response using the assembled context. Response is streamed back to user.
Both user message and AI response are stored in Redis for future context retrieval.
Real-World Example: How Contextual Memory Improves Conversations
Let's see how this works in practice:
Conversation Example:
AI: "We offer three plans: Starter ($49/mo), Pro ($149/mo), and Enterprise (custom). Which plan are you interested in?"
AI (with memory): "The Pro plan includes 10,000 messages/month, multi-channel support (WhatsApp, email, web chat), AI-powered email campaigns, and priority support. It's ideal for growing businesses with 50-200 employees."
AI (with memory): "Yes! The Pro plan includes AI-powered email campaigns with engagement analytics, follow-up automation, and template builder. You can create campaigns based on opens, clicks, and engagement segments."
Without contextual memory: The AI would ask "What plan are you asking about?" or respond generically to "Does it include email campaigns?"
With contextual memory: The AI remembers the entire conversation context and provides relevant, personalized responses.
Technical Implementation Details
Session Management
eKTextAi uses a dual-session architecture:
- Browser Session (globalSessionId): Long-term session that persists across page reloads. Enables cross-conversation context.
- Chat Session (chatSessionId): Individual conversation thread. Users can have multiple chat sessions within one browser session.
Memory Limits and Optimization
To prevent token overflow and maintain performance:
- Conversation History: Limited to last 10 messages (5 exchanges) by default. Older messages are truncated.
- Knowledge Base Retrieval: Top 5 documents per query. Documents are ranked by semantic similarity (cosine distance).
- TTL Policy: Redis keys expire after 24 hours. This balances context retention with memory efficiency.
Vector Search Strategy
The semantic search uses a two-tier approach:
- Primary Search: Searches user-specific knowledge base (filtered by userEmailHash or userId). Ensures personalized results.
- Fallback Search: If primary search yields insufficient results, searches broader knowledge base. Ensures comprehensive coverage.
What This Means for Your Business
Contextual memory isn't just a technical feature—it directly impacts customer experience and revenue:
1. Higher Conversion Rates
When AI remembers context, customers don't have to repeat themselves. They can ask follow-up questions naturally, leading to faster decision-making and higher conversion rates.
2. Reduced Support Costs
AI can handle complex, multi-turn conversations without human intervention. This reduces support ticket volume and frees your team for high-value interactions.
3. Personalized Experiences
By combining conversation history with knowledge base retrieval, AI can provide personalized responses based on both what the user asked and what your business knows about them.
4. Multi-Channel Consistency
The same memory architecture works across WhatsApp, email, web chat, and voice. Context follows the customer, regardless of channel.
Limitations and Realistic Expectations
It's important to set realistic expectations:
- Memory Duration: Conversation history is stored for 24 hours by default. For longer-term memory, you'd need to implement custom solutions or upgrade to enterprise plans.
- Knowledge Base Quality: AI responses are only as good as your knowledge base. If your content is outdated or incomplete, responses will reflect that.
- Token Limits: Very long conversations may be truncated to prevent token overflow. This is a limitation of the underlying LLM, not the memory system.
- Semantic Search Accuracy: Vector search is probabilistic, not deterministic. It finds "similar" content, not exact matches. Results depend on embedding quality and knowledge base structure.
Best Practices for Maximizing Contextual Memory
To get the most out of eKTextAi's contextual memory:
- Maintain a Comprehensive Knowledge Base: Regularly update your knowledge base with product docs, FAQs, and business content. The more relevant content you have, the better the AI responses.
- Structure Your Content: Use clear headings, bullet points, and structured formats. This helps vector embeddings capture semantic meaning more accurately.
- Monitor Conversation Quality: Review AI responses regularly. If context is being lost, check your knowledge base coverage and conversation history limits.
- Leverage Multi-Channel Context: Ensure your knowledge base includes content relevant to all channels (WhatsApp, email, web chat, voice). This enables consistent experiences across touchpoints.
Conclusion
Contextual memory is what separates intelligent AI assistants from simple chatbots. By combining Redis-based conversation history with semantic knowledge base retrieval, eKTextAi enables AI to remember context, understand follow-ups, and deliver personalized responses.
For businesses, this means higher conversion rates, reduced support costs, and better customer experiences. But it's not magic—it requires a well-maintained knowledge base and realistic expectations about memory duration and search accuracy.
If you're evaluating AI platforms, ask about their memory architecture. Do they store conversation history? How do they retrieve knowledge base content? How long does context persist? These technical details directly impact customer experience and business outcomes.
Ready to See Contextual Memory in Action?
Experience how eKTextAi's AI engine remembers context and delivers personalized responses across WhatsApp, email, web chat, and voice.
Start Free Trial →