Integrated Model Memory (IMM)

Overview

Integrated Model Memory (IMM) is our implementation of Cache Augmented Generation (CAG), designed to revolutionize how AI applications handle conversation context. Unlike traditional CAG systems that are typically tied to specific models, IMM provides a model-independent memory solution that works seamlessly across all supported AI models. This means your conversation history and context persist within a session regardless of which model you use - you can start a conversation with GPT-4, continue with Claude, and switch to Mistral while maintaining complete context. This unique approach eliminates the complexity of manual memory management while providing intelligent context retention across conversations.

As our enhanced implementation of CAG, IMM simplifies AI development by automating critical memory-related tasks:

Automatic context management
Seamless conversation history tracking
Intelligent memory expiration handling
Multi-session support for different users or use cases

Key Features and Benefits

Effortless Implementation

Enable memory with a simple "memory": 1 parameter
No need for complex vector database management
Automatic context retention and retrieval

Advanced Session Management

Create independent memory instances with mem_session
Perfect for multi-user applications
Isolated conversation contexts for different use cases

Smart Memory Controls

Configure memory expiration with mem_expire
Automatic cleanup of outdated conversations
Efficient resource management

Developer-Friendly Design

Simple API integration
Flexible configuration options
Model-independent memory management
Persistent session memory across different models
Intelligent caching and retrieval mechanisms

Implementation Guide

Enabling Memory Management

Activate IMM by including the memory parameter in your API request:

curl https://apipie.ai/v1/chat/completions \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <Your APIpie API Key>" \
  -data-raw '{
  "stream": true,
  "memory": 1,
  "mem_expire": 120,
  "mem_session": "test123",
  "provider": "openai",
  "model": "gpt-4o",
  "max_tokens": 100,
  "messages": [
    {
      "role": "user",
      "content": "where did I put my car keys? I already forgot."
    }
  ]
}'

Verifying Memory Functionality

To see Integrated Model Memory in action, follow this sequence:

Step 1: Save Information to Memory

curl -X POST "https://apipie.ai/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <Your APIpie API Key>" \
  --data-raw '{
    "memory": 1,
    "mem_expire": 60,
    "mem_session": "test123",
    "provider": "openai",
    "model": "gpt-4o",
    "max_tokens": 100,
    "messages": [
      {
        "role": "user",
        "content": "I put my car keys in the coffee can by the swing on the front porch."
      },
      {
        "role": "assistant",
        "content": "Okay, I will keep your secret safe."
      },
      {
        "role": "user",
        "content": "Why is the sky blue?"
      },
      {
        "role": "assistant",
        "content": "The sky appears blue due to a phenomenon called Rayleigh scattering..."
      },
      {
        "role": "user",
        "content": "What did I just ask you?"
      }
    ]
  }'

Step 2: Ask a Follow-up Question

Now, ask the AI where you put your car keys and note how you do not need to include the message history:

curl -X POST "https://apipie.ai/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <Your APIpie API Key>" \
  --data-raw '{
    "memory": 1,
    "mem_session": "cross_model_test",
    "provider": "openai",
    "model": "gpt-4o",
    "max_tokens": 50,
    "messages": [
      {
        "role": "user",
        "content": "Where did I put my car keys?"
      }
    ]
  }'

Expected Response

{
  "id": "chatcmpl-6008a7442651dc5b433b58db12fa88ee",
  "object": "chat.completion",
  "created": 1737469664,
  "provider": "openai",
  "model": "gpt-4o",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "You put your car keys in the coffee can by the swing on the front porch."
      },
      "logprobs": null,
      "finish_reason": "stop"
    }
  ]
}

This confirms that the AI correctly remembers past conversations and can recall stored information.

Cross-Model Memory Example

One of IMM's unique features is its ability to maintain context across different AI models within the same session. Here's how to switch models while preserving conversation context:

Start with GPT-4

curl -X POST "https://apipie.ai/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <Your APIpie API Key>" \
  --data-raw '{
    "memory": 1,
    "mem_session": "cross_model_test",
    "provider": "openai",
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "My favorite color is blue."
      }
    ]
  }'

Continue with Claude, maintaining context

curl -X POST "https://apipie.ai/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <Your APIpie API Key>" \
  --data-raw '{
    "memory": 1,
    "mem_session": "cross_model_test",
    "provider": "anthropic",
    "model": "claude-2",
    "messages": [
      {
        "role": "user",
        "content": "What's my favorite color?"
      }
    ]
  }'

Switch to Mistral, still maintaining context

curl -X POST "https://apipie.ai/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <Your APIpie API Key>" \
  --data-raw '{
    "memory": 1,
    "mem_session": "cross_model_test",
    "provider": "mistral",
    "model": "mistral-large",
    "messages": [
      {
        "role": "user",
        "content": "Can you confirm my favorite color?"
      }
    ]
  }'

Each model will have access to the full conversation history, demonstrating IMM's unique ability to maintain context across different AI models.

Managing Multiple Sessions

Create separate memory contexts for different users or applications, separate the sessions by differentiating the mem_session string.

curl -X POST "https://apipie.ai/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <Your APIpie API Key>" \
  --data-raw '{
    "memory": 1,
    "mem_expire": 60,
    "mem_session": "user456",
    "provider": "openai",
    "model": "gpt-4o",
    "max_tokens": 100,
    "messages": [
      {
        "role": "user",
        "content": "donde esta la bibliteca?"
      }
    ]
  }'

Memory Expiration Control

Set custom expiration times for conversation contexts to a maximum 1440 min with a default of 15:

curl -X POST "https://apipie.ai/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer <Your APIpie API Key>" \
  --data-raw '{
    "memory": 1,
    "mem_expire": 120,
    "mem_session": "timed_session",
    "provider": "openai",
    "model": "gpt-4o",
    "messages": [
      {
        "role": "user",
        "content": "This conversation will expire in 120 minutes."
      }
    ]
  }'

Clearing Memory

Clear specific session memory when needed, If you do not specify a session the default unnammed session will be cleared:

curl -X POST "https://apipie.ai/v1/chat/completions" \
-H "Content-Type: application/json" \
-H "Authorization: Bearer <Your APIpie API Key>" \
--data-raw '{
"memory": 1,
"mem_clear": 1,
"mem_session": "session_to_clear",
"provider": "openai",
"model": "gpt-4o",
"messages": [
{
"role": "user",
"content": "Clear this session's memory."
}
]
}'

Expected response:

{
  "queryResults": "Memory session deleted successfully",
  "queryDetails": {
    "provider": "openai",
    "route": "gpt-4o",
    "promptTokens": 0,
    "responseTokens": 0,
    "promptChar": 0,
    "responseChar": 31,
    "cost": 0,
    "latencyMs": 0
  }
}

Performance Considerations

System Impact

Minimal latency increase for enhanced context awareness
Efficient memory management through Pinecone integration (powered by Pinecone's vector database)
Optimized resource utilization

Best Practices

Use unique session IDs for different conversation contexts
Implement appropriate memory expiration times
Monitor and manage active sessions

Resource Management

Automatic cleanup of expired sessions
Efficient handling of memory resources
Cost-effective implementation

Advanced Features

Memory Optimization

Automatic context prioritization
Intelligent memory cleanup
Resource-efficient storage

Session Control

Fine-grained session management
Custom expiration settings
Independent memory contexts

Integration Support

Compatible with all supported AI models
Seamless API integration
Flexible implementation options

By implementing Integrated Model Memory, developers can create more intelligent, context-aware AI applications while maintaining efficient resource usage and optimal performance. Learn more about enhancing your AI applications with our Completions API and RAG tuning capabilities.

Additional Resources

Vector Similarity Search Fundamentals - Pinecone's guide to vector search
Anthropic's prompt caching
OpenAI's prompt caching

Overview​

Key Features and Benefits​

Effortless Implementation​

Advanced Session Management​

Smart Memory Controls​

Developer-Friendly Design​

Implementation Guide​

Enabling Memory Management​

Verifying Memory Functionality​

Step 1: Save Information to Memory​

Step 2: Ask a Follow-up Question​

Expected Response​

Cross-Model Memory Example​