Introduction

Remina

Production-grade memory infrastructure for AI systems.

Remina provides battle-tested algorithms for memory management with full infrastructure flexibility. Think of it as SQLAlchemy for AI memory — the core logic is handled for you, while storage backends, vector databases, and embedding models remain fully pluggable.

The Problem

AI applications without persistent memory face compounding technical debt:

  • Lost context — Conversations reset, users repeat themselves, personalization degrades
  • Vendor lock-in — Existing solutions force specific infrastructure choices
  • Scaling limitations — Naive approaches fail under production memory volumes
  • Memory quality — No intelligent consolidation, deduplication, or relevance scoring

Remina addresses these at the infrastructure layer with a pluggable architecture and sophisticated memory algorithms.

Architecture

┌─────────────────────────────────────────────────────────────┐
│                     Remina Memory                            │
├─────────────────────────────────────────────────────────────┤
│  ┌─────────────────────────────────────────────────────────┐│
│  │                  Core Algorithms                         ││
│  │  • Importance Scoring    • Fact Extraction              ││
│  │  • Hybrid Retrieval      • Deduplication                ││
│  └─────────────────────────────────────────────────────────┘│
│                                                              │
│  ┌───────────┐  ┌───────────────┐  ┌───────────────────┐   │
│  │ L1 Cache  │  │  L2 Storage   │  │   Vector Store    │   │
│  │  (Redis)  │  │  (Pluggable)  │  │   (Pluggable)     │   │
│  │  FIXED    │  │               │  │                   │   │
│  └───────────┘  └───────────────┘  └───────────────────┘   │
│                                                              │
│  ┌─────────────────────────────────────────────────────────┐│
│  │              Embedding + LLM Providers                   ││
│  │                    (Pluggable)                           ││
│  └─────────────────────────────────────────────────────────┘│
└─────────────────────────────────────────────────────────────┘

Capabilities

  • Pluggable Architecture — Swap storage, vector DB, embeddings, and LLM providers independently
  • Two-Tier Caching — Redis L1 for hot-path access, pluggable L2 for persistence
  • Hybrid Retrieval — Combines vector similarity, temporal decay, and importance weighting
  • Automatic Extraction — LLM-powered fact extraction from conversations
  • Deduplication — Similarity-based duplicate prevention
  • Importance Scoring — Relevance ranking based on recency, frequency, and base importance

Usage

from remina import Memory
 
# Initialize with defaults (SQLite + Chroma + OpenAI)
memory = Memory()
 
# Extract and store facts from text
result = memory.add(
    messages="I'm John. I work as a software engineer at Google.",
    user_id="user_123"
)
# Extracts: ["Name is John", "Works as a software engineer at Google"]
 
# Semantic search
results = memory.search(
    query="What is the user's profession?",
    user_id="user_123"
)
 
# Retrieve all memories
all_memories = memory.get_all(user_id="user_123")

Supported Providers

CategoryProviders
Storage (L2)SQLite, PostgreSQL, MongoDB
Vector StoresChroma, Pinecone, Qdrant, pgvector
EmbeddingsOpenAI, Gemini, Cohere, Ollama, HuggingFace
LLMsOpenAI, Gemini, Anthropic, Ollama

License

Apache 2.0 — Free for commercial and personal use.