Inside eKTextAi's AI Engine: How Contextual Memory Powers Smarter Conversations

The Problem: Context Loss in AI Conversations

Traditional chatbots treat each message as isolated. When a user asks "What's your pricing?" followed by "Tell me more about the Pro plan," the AI doesn't remember the first question. It responds generically or asks the user to repeat themselves.

Result: Frustrated customers, lower conversion rates, and missed sales opportunities.

The Architecture: Hierarchical Memory + Semantic Search

eKTextAi's AI engine uses a two-layer memory system:

1. Conversation Memory (Redis)

Stores the actual conversation history—what the user asked and how the AI responded. This enables follow-up questions and context continuity.

Memory Structure:

tenant:{tenantId}:user:{userId}:global:{browserSessionId}:chat:{chatSessionId}:history

This hierarchical key structure enables:

• Tenant isolation (data security)
• Multiple chat sessions per browser session
• Efficient retrieval of recent conversation history
• Automatic cleanup via TTL (24-hour default)

2. Knowledge Base Memory (ChromaDB + Vector Embeddings)

Stores your business content—product docs, FAQs, website content, Google Drive files—as vector embeddings. When a user asks a question, the AI searches this knowledge base for relevant information.

Search Process:

User query is converted to a vector embedding (OpenAI embeddings API)
Vector search finds top 5 most relevant documents from ChromaDB
Retrieved documents are injected into the AI's system prompt
AI generates response using both conversation history and knowledge base

How It Works: The Complete Flow

Here's what happens when a user sends a message:

Request Processing Flow

Step 1: Ingress Pipeline
Request arrives with session IDs (browserSessionId, chatSessionId). System identifies user via email hash or userId.

Step 2: Memory Retrieval
RedisMemoryService retrieves conversation history for this chat session. Typically returns last 5-10 message pairs (user + assistant).

Step 3: Semantic Search
User query is embedded and searched against ChromaDB knowledge base. Top 5 relevant documents are retrieved.

Step 4: Context Assembly
System prompt is built with: (1) Instructions, (2) Retrieved knowledge base documents, (3) Conversation history, (4) Current user query.

Step 5: AI Completion
OpenAI GPT-4o-mini generates response using the assembled context. Response is streamed back to user.

Step 6: Memory Storage
Both user message and AI response are stored in Redis for future context retrieval.

Real-World Example: How Contextual Memory Improves Conversations

Let's see how this works in practice:

Conversation Example:

User: "What's your pricing?"
AI: "We offer three plans: Starter ($49/mo), Pro ($149/mo), and Enterprise (custom). Which plan are you interested in?"

User: "Tell me more about Pro"
AI (with memory): "The Pro plan includes 10,000 messages/month, multi-channel support (WhatsApp, email, web chat), AI-powered email campaigns, and priority support. It's ideal for growing businesses with 50-200 employees."

User: "Does it include email campaigns?"
AI (with memory): "Yes! The Pro plan includes AI-powered email campaigns with engagement analytics, follow-up automation, and template builder. You can create campaigns based on opens, clicks, and engagement segments."

Without contextual memory: The AI would ask "What plan are you asking about?" or respond generically to "Does it include email campaigns?"

With contextual memory: The AI remembers the entire conversation context and provides relevant, personalized responses.

Technical Implementation Details

Session Management

eKTextAi uses a dual-session architecture:

Browser Session (globalSessionId): Long-term session that persists across page reloads. Enables cross-conversation context.
Chat Session (chatSessionId): Individual conversation thread. Users can have multiple chat sessions within one browser session.

Memory Limits and Optimization

To prevent token overflow and maintain performance:

Conversation History: Limited to last 10 messages (5 exchanges) by default. Older messages are truncated.
Knowledge Base Retrieval: Top 5 documents per query. Documents are ranked by semantic similarity (cosine distance).
TTL Policy: Redis keys expire after 24 hours. This balances context retention with memory efficiency.

Vector Search Strategy

The semantic search uses a two-tier approach:

Primary Search: Searches user-specific knowledge base (filtered by userEmailHash or userId). Ensures personalized results.
Fallback Search: If primary search yields insufficient results, searches broader knowledge base. Ensures comprehensive coverage.

What This Means for Your Business

Contextual memory isn't just a technical feature—it directly impacts customer experience and revenue:

1. Higher Conversion Rates

When AI remembers context, customers don't have to repeat themselves. They can ask follow-up questions naturally, leading to faster decision-making and higher conversion rates.

2. Reduced Support Costs

AI can handle complex, multi-turn conversations without human intervention. This reduces support ticket volume and frees your team for high-value interactions.

3. Personalized Experiences

By combining conversation history with knowledge base retrieval, AI can provide personalized responses based on both what the user asked and what your business knows about them.

4. Multi-Channel Consistency

The same memory architecture works across WhatsApp, email, web chat, and voice. Context follows the customer, regardless of channel.

Limitations and Realistic Expectations

It's important to set realistic expectations:

Memory Duration: Conversation history is stored for 24 hours by default. For longer-term memory, you'd need to implement custom solutions or upgrade to enterprise plans.
Knowledge Base Quality: AI responses are only as good as your knowledge base. If your content is outdated or incomplete, responses will reflect that.
Token Limits: Very long conversations may be truncated to prevent token overflow. This is a limitation of the underlying LLM, not the memory system.
Semantic Search Accuracy: Vector search is probabilistic, not deterministic. It finds "similar" content, not exact matches. Results depend on embedding quality and knowledge base structure.

Best Practices for Maximizing Contextual Memory

To get the most out of eKTextAi's contextual memory:

Maintain a Comprehensive Knowledge Base: Regularly update your knowledge base with product docs, FAQs, and business content. The more relevant content you have, the better the AI responses.
Structure Your Content: Use clear headings, bullet points, and structured formats. This helps vector embeddings capture semantic meaning more accurately.
Monitor Conversation Quality: Review AI responses regularly. If context is being lost, check your knowledge base coverage and conversation history limits.
Leverage Multi-Channel Context: Ensure your knowledge base includes content relevant to all channels (WhatsApp, email, web chat, voice). This enables consistent experiences across touchpoints.

Conclusion

Contextual memory is what separates intelligent AI assistants from simple chatbots. By combining Redis-based conversation history with semantic knowledge base retrieval, eKTextAi enables AI to remember context, understand follow-ups, and deliver personalized responses.

For businesses, this means higher conversion rates, reduced support costs, and better customer experiences. But it's not magic—it requires a well-maintained knowledge base and realistic expectations about memory duration and search accuracy.

If you're evaluating AI platforms, ask about their memory architecture. Do they store conversation history? How do they retrieve knowledge base content? How long does context persist? These technical details directly impact customer experience and business outcomes.

Ready to See Contextual Memory in Action?

Experience how eKTextAi's AI engine remembers context and delivers personalized responses across WhatsApp, email, web chat, and voice.

Start Free Trial →