Core Concepts

Core Concepts

Technical overview of Remina's architecture and algorithms.

Architecture

Remina implements a layered architecture with provider abstraction:

┌──────────────────────────────────────────────────────────────┐
│                     Remina Engine                             │
│                                                               │
│  ┌─────────────────────────────────────────────────────────┐ │
│  │                  Core Algorithms                         │ │
│  │  • Importance Scoring    • Memory Consolidation          │ │
│  │  • Hybrid Retrieval      • Contradiction Detection       │ │
│  │  • Graph Linking         • Adaptive Decay                │ │
│  │  • Deduplication         • Fact Extraction               │ │
│  └─────────────────────────────────────────────────────────┘ │
│                                                               │
│  ┌───────────┐  ┌───────────────┐  ┌───────────────────┐    │
│  │ L1 Cache  │  │  L2 Storage   │  │   Vector Store    │    │
│  │  (Redis)  │  │  (Pluggable)  │  │   (Pluggable)     │    │
│  │  FIXED    │  │               │  │                   │    │
│  └───────────┘  └───────────────┘  └───────────────────┘    │
└──────────────────────────────────────────────────────────────┘

Memory Model

Each memory entity contains:

@dataclass
class Memory:
    id: str                      # Unique identifier (UUID)
    user_id: str                 # Owner/namespace
    content: str                 # Memory content
    embedding: list[float]       # Vector representation
    
    # Metadata
    metadata: dict               # User-defined key-value pairs
    tags: list[str]              # Categorization tags
    source: str                  # Origin (conversation, manual, extraction)
    
    # Scoring factors
    importance: float            # Base importance (0-1)
    decay_rate: float            # Decay coefficient (default: 0.01)
    access_count: int            # Retrieval count
    
    # Timestamps
    created_at: datetime
    updated_at: datetime
    last_accessed_at: datetime
    
    # Graph
    links: list[str]             # Related memory IDs
    
    # State
    is_consolidated: bool        # Part of a merge operation
    consolidated_from: list[str] # Source memory IDs if merged

Two-Tier Caching

Remina uses a two-tier caching strategy optimized for AI workloads:

L1 Cache (Redis) — Fixed

  • Purpose: Hot-path access for recent/frequent memories
  • Target latency: < 5ms
  • Behavior:
    • Automatic promotion on access
    • TTL-based eviction (default: 1 hour)
    • Per-user memory limits (default: 100)

L2 Storage (Pluggable)

  • Purpose: Persistent, durable storage
  • Options: PostgreSQL, MongoDB, SQLite
  • Behavior:
    • Full memory persistence
    • Queried on L1 cache miss

Cache Flow

┌─────────────────────────────────────────────────────────────┐
│                        Search Request                        │
└─────────────────────────────────────────────────────────────┘


                    ┌─────────────────┐
                    │   L1 Cache Hit? │
                    └─────────────────┘
                      │           │
                     Yes          No
                      │           │
                      ▼           ▼
              ┌───────────┐  ┌───────────────┐
              │  Return   │  │ Query Vector  │
              │  Cached   │  │    Store      │
              └───────────┘  └───────────────┘


                            ┌───────────────┐
                            │ Query L2      │
                            │ Storage       │
                            └───────────────┘


                            ┌───────────────┐
                            │ Promote to L1 │
                            └───────────────┘


                            ┌───────────────┐
                            │    Return     │
                            └───────────────┘

Fact Extraction

The memory.add() operation uses an LLM to extract discrete facts:

# Input
memory.add(
    messages="I'm John. I'm a software engineer at Google. I prefer Python.",
    user_id="john_123"
)
 
# LLM extracts:
# - "Name is John"
# - "Is a software engineer"
# - "Works at Google"
# - "Prefers Python"

Extraction Pipeline

  1. Format Input — Normalize messages to prompt format
  2. LLM Call — Send to configured LLM with extraction prompt
  3. Parse Response — Extract JSON array of facts
  4. Deduplicate — Compare against existing memories
  5. Embed — Generate vector embeddings
  6. Store — Persist to vector store and L2 storage

Hybrid Retrieval

Remina combines multiple signals for relevance ranking:

final_score = (
    0.5 * semantic_score +      # Vector similarity
    0.3 * importance_score +    # Recency + frequency + base importance
    0.2 * keyword_score         # Direct term overlap
)

Semantic Score

Cosine similarity between query embedding and memory embedding.

Importance Score

importance_score = (
    weight_recency * recency_factor +
    weight_frequency * frequency_factor +
    weight_importance * base_importance
)

Components:

  • Recency factor: Temporal decay based on last_accessed_at
  • Frequency factor: Derived from access_count
  • Base importance: User-defined or extraction-derived importance

Keyword Score

Term overlap between query and memory content.

Deduplication

Before storing, Remina checks for semantic duplicates:

# Default threshold: 0.9 (90% similarity)
if cosine_similarity(new_embedding, existing_embedding) > dedup_threshold:
    # Skip storage — duplicate detected

This prevents redundant entries like:

  • "Prefers Python" and "Loves Python programming"
  • "Works at Google" and "Employed at Google"

Memory Consolidation (Planned)

Automatic consolidation of related memories:

# Before consolidation:
# - "Drinks coffee"
# - "Prefers dark roast"
# - "Has 3 cups daily"
 
# After consolidation:
# - "Coffee preference: dark roast, 3 cups daily"

Provider Abstraction

All providers implement standardized interfaces:

Storage Provider

class StorageBase(ABC):
    async def save(self, memories: List[Memory]) -> None
    async def get(self, ids: List[str]) -> List[Memory]
    async def delete(self, ids: List[str]) -> None
    async def query(self, user_id: str, filters: Dict = None, limit: int = 100) -> List[Memory]
    async def update(self, memory: Memory) -> None
    async def count(self, user_id: str) -> int
    async def close(self) -> None

Vector Store Provider

class VectorStoreBase(ABC):
    async def upsert(self, id: str, embedding: List[float], metadata: Dict) -> None
    async def upsert_batch(self, items: List[Tuple]) -> None
    async def search(self, embedding: List[float], limit: int = 10, filters: Dict = None) -> List[VectorSearchResult]
    async def delete(self, ids: List[str]) -> None
    async def close(self) -> None

Embedding Provider

class EmbeddingBase(ABC):
    def embed(self, text: str) -> List[float]
    def embed_batch(self, texts: List[str]) -> List[List[float]]
    @property
    def dimensions(self) -> int
    @property
    def model_name(self) -> str

LLM Provider

class LLMBase(ABC):
    def generate_response(self, messages: List[Dict], tools: List[Dict] = None) -> Dict
    @property
    def model_name(self) -> str

Error Handling

Remina uses structured exceptions with error codes:

class ReminaError(Exception):
    message: str
    error_code: str
    details: Dict
    suggestion: str
    debug_info: Dict
 
# Specific exceptions
class ConfigurationError(ReminaError): ...
class StorageError(ReminaError): ...
class VectorStoreError(ReminaError): ...
class EmbeddingError(ReminaError): ...
class LLMError(ReminaError): ...
class CacheError(ReminaError): ...
class MemoryNotFoundError(ReminaError): ...

Next Steps