procedural_memory

ReasoningBank-style procedural memory for RLM

Overview

This module implements Stage 2.5: Procedural Memory Loop inspired by the ReasoningBank paper. The goal is to enable an RLM agent to improve over time by accumulating procedural knowledge (strategies, templates, debugging moves) without replacing evidence-based retrieval.

Closed-Loop Cycle

┌──────────┐    ┌──────────┐    ┌──────────┐
│ RETRIEVE │───▶│ INTERACT │───▶│ EXTRACT  │
│ (BM25)   │    │ (rlm_run)│    │ (Judge + │
└────▲─────┘    └──────────┘    │ Extractor)│
     │                          └─────┬─────┘
     │                                │
     │          ┌──────────┐          │
     └──────────│  STORE   │◀─────────┘
                │ (JSON)   │
                └──────────┘

Design Principles

Procedural, not episodic: Memories are strategies/checklists, not retellings
Bounded injection: Only title + description + 3 key bullets in prompts
Evidence-sensitive judgment: Success requires grounding in retrieved evidence
Keyword retrieval: BM25 over title/description/tags (deterministic, offline)
Append-only storage: Simple JSON file for experimentation

Reference

ReasoningBank Paper

Imports

Memory Schema

A MemoryItem represents a reusable procedural insight extracted from an RLM trajectory.

Constraints: - Items must be small enough to inject into prompts - content should be procedural (steps/checklist), not a retelling - Up to 3 items extracted per trajectory

MemoryItem


def MemoryItem(
    id:str, title:str, description:str, content:str, source_type:str, task_query:str, created_at:str,
    access_count:int=0, tags:Optional=None, session_id:Optional=None
)->None:

A reusable procedural memory extracted from an RLM trajectory.

Attributes: id: Unique identifier (UUID) title: Concise identifier (≤10 words) description: One-sentence summary content: Procedural steps/checklist/template (Markdown) source_type: ‘success’ or ‘failure’ task_query: Original task that produced this memory created_at: ISO timestamp access_count: Number of times retrieved (for future consolidation) tags: Keywords for BM25 retrieval session_id: Optional session ID from DatasetMeta (links to dataset session)

# Test MemoryItem creation and serialization
test_item = MemoryItem(
    id='test-uuid',
    title='SPARQL Query Pattern',
    description='Template for searching entities by label.',
    content='- Use `rdfs:label` for human-readable names\n- Add FILTER for case-insensitive search',
    source_type='success',
    task_query='Find entities named "Activity"',
    created_at=datetime.now(timezone.utc).isoformat(),
    tags=['sparql', 'search', 'rdfs']
)

# Test roundtrip
data = test_item.to_dict()
restored = MemoryItem.from_dict(data)
assert restored.title == test_item.title
assert restored.tags == test_item.tags
print("✓ MemoryItem serialization works")

✓ MemoryItem serialization works

DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
  created_at=datetime.utcnow().isoformat(),

Memory Store

Persistent storage for procedural memories using a simple JSON file format.

MemoryStore


def MemoryStore(
    memories:list=<factory>, path:Optional=None
)->None:

Persistent storage for procedural memories.

Attributes: memories: List of MemoryItem objects path: Path to JSON file

# Test MemoryStore save/load roundtrip
import tempfile

with tempfile.TemporaryDirectory() as tmpdir:
    test_path = Path(tmpdir) / 'test_memories.json'
    
    # Create store and add items
    store = MemoryStore(path=test_path)
    item1 = MemoryItem(
        id=str(uuid.uuid4()),
        title='Test Memory 1',
        description='First test memory',
        content='- Step 1\n- Step 2',
        source_type='success',
        task_query='test task 1',
        created_at=datetime.now(timezone.utc).isoformat(),
        tags=['test', 'example']
    )
    item2 = MemoryItem(
        id=str(uuid.uuid4()),
        title='Test Memory 2',
        description='Second test memory',
        content='- Action A\n- Action B',
        source_type='failure',
        task_query='test task 2',
        created_at=datetime.now(timezone.utc).isoformat(),
        tags=['test']
    )
    
    store.add(item1)
    store.add(item2)
    store.save()
    
    # Load and verify
    loaded = MemoryStore.load(test_path)
    assert len(loaded.memories) == 2
    assert loaded.memories[0].title == 'Test Memory 1'
    assert loaded.memories[1].source_type == 'failure'
    assert loaded.memories[0].tags == ['test', 'example']
    
    # Test corpus generation
    corpus = loaded.get_corpus_for_bm25()
    assert len(corpus) == 2
    assert 'test' in corpus[0]  # From title and tags
    
    print("✓ MemoryStore save/load/corpus works")

✓ MemoryStore save/load/corpus works

Trajectory Artifact

Extract a bounded representation of an RLM run for the judge and extractor.

Purpose: Summarize iterations into key steps (~10 max) with actions and outcomes.

extract_trajectory_artifact


def extract_trajectory_artifact(
    task:str, answer:str, iterations:list, ns:dict
)->dict:

Create bounded trajectory artifact for judge/extractor.

Summarizes each iteration’s code blocks into 1-2 line “action + outcome”, limiting to ~10 most informative key steps.

Args: task: Original task query answer: Final answer from rlm_run iterations: List of RLMIteration objects ns: Final namespace dict

Returns: Dictionary with keys: - task: str - final_answer: str - iteration_count: int - converged: bool (whether final_answer was set) - key_steps: List of {iteration, action, outcome} - variables_created: List of variable names in ns - errors_encountered: List of error messages from stderr

# Test with mock iterations
from rlm._rlmpaper_compat import CodeBlock, REPLResult

mock_block1 = CodeBlock(
    code="search('Activity')",
    result=REPLResult(stdout="Found 3 entities", stderr=None, locals={})
)
mock_block2 = CodeBlock(
    code="describe_entity('prov:Activity')",
    result=REPLResult(stdout="prov:Activity is a class", stderr=None, locals={})
)
mock_iteration = RLMIteration(
    prompt="test prompt",
    response="test response",
    code_blocks=[mock_block1, mock_block2],
    final_answer=None,
    iteration_time=0.5
)

artifact = extract_trajectory_artifact(
    task="What is prov:Activity?",
    answer="prov:Activity is a class",
    iterations=[mock_iteration],
    ns={'result': 'prov:Activity is a class'}
)

assert artifact['task'] == "What is prov:Activity?"
assert artifact['iteration_count'] == 1
assert artifact['converged'] == True
assert len(artifact['key_steps']) == 2
assert 'search' in artifact['key_steps'][0]['action'].lower()
assert len(artifact['variables_created']) == 1
print("✓ Trajectory artifact extraction works")

✓ Trajectory artifact extraction works

Judge

Classify trajectory as success or failure with evidence-sensitivity.

Success criteria: 1. Answer directly addresses the task 2. Answer is grounded in retrieved evidence (not hallucinated) 3. Reasoning shows systematic exploration

Failure indicators: 1. No answer produced (didn’t converge) 2. Answer doesn’t address the task 3. Answer makes claims without supporting evidence

judge_trajectory


def judge_trajectory(
    artifact:dict, ns:dict=None
)->dict:

Judge trajectory success using llm_query.

Evidence-sensitive: success requires grounding in retrieved evidence.

Args: artifact: Trajectory artifact from extract_trajectory_artifact() ns: Optional namespace for additional context

Returns: Dictionary with keys: - is_success: bool - reason: str - confidence: str (‘high’, ‘medium’, ‘low’) - missing: list[str] (what evidence was lacking if failure)

# Test judge with real LLM (requires API key)
test_artifact = {
    'task': 'What is prov:Activity?',
    'final_answer': 'prov:Activity is a class representing activities in PROV ontology',
    'iteration_count': 2,
    'converged': True,
    'key_steps': [
        {'iteration': 1, 'action': "search('Activity')", 'outcome': 'Found 3 entities'},
        {'iteration': 2, 'action': "describe_entity('prov:Activity')", 'outcome': 'A class in PROV'}
    ],
    'variables_created': ['result'],
    'errors_encountered': []
}

judgment = judge_trajectory(test_artifact)
print(f"Success: {judgment['is_success']}")
print(f"Reason: {judgment['reason']}")
print(f"Confidence: {judgment['confidence']}")

Extractor

Extract 1-3 reusable memory items from a trajectory.

For successes: Emphasize why the approach worked

For failures: Emphasize what to avoid and recovery strategies

Output format: Procedural (steps/checklist/template), NOT a retelling

extract_memories


def extract_memories(
    artifact:dict, judgment:dict, ns:dict=None
)->list:

Extract up to 3 reusable memory items from trajectory.

Args: artifact: Trajectory artifact from extract_trajectory_artifact() judgment: Judgment dict from judge_trajectory() ns: Optional namespace for additional context

Returns: List of MemoryItem objects (0-3 items)

# Test extractor with real LLM
test_artifact = {
    'task': 'Find properties of prov:Activity',
    'final_answer': 'prov:Activity has properties: prov:startedAtTime, prov:endedAtTime',
    'iteration_count': 3,
    'converged': True,
    'key_steps': [
        {'iteration': 1, 'action': "search('Activity')", 'outcome': 'Found prov:Activity'},
        {'iteration': 2, 'action': "describe_entity('prov:Activity')", 'outcome': 'A class'},
        {'iteration': 3, 'action': "get_properties('prov:Activity')", 'outcome': 'Listed properties'}
    ],
    'variables_created': ['activity_props'],
    'errors_encountered': []
}

test_judgment = {
    'is_success': True,
    'reason': 'Answer grounded in ontology data',
    'confidence': 'high',
    'missing': []
}

memories = extract_memories(test_artifact, test_judgment)
print(f"Extracted {len(memories)} memories:")
for m in memories:
    print(f"  - {m.title}")
    print(f"    Tags: {m.tags}")

BM25 Retrieval

Find relevant memories for new tasks using keyword-based BM25 retrieval.

Searches over: title + description + tags

retrieve_memories


def retrieve_memories(
    store:MemoryStore, task:str, k:int=3
)->list:

Retrieve top-k relevant memories using BM25.

Tokenizes task and searches over title + description + tags.

Args: store: MemoryStore instance task: Task query string k: Number of memories to retrieve

Returns: List of top-k MemoryItem objects (may be fewer if scores ≤ 0)

# Test BM25 retrieval
test_store = MemoryStore()

# Add diverse memories
test_store.add(MemoryItem(
    id=str(uuid.uuid4()),
    title='SPARQL query pattern for entity search',
    description='Use rdfs:label with FILTER for case-insensitive search.',
    content='- Step 1\n- Step 2',
    source_type='success',
    task_query='Find entities by name',
    created_at=datetime.now(timezone.utc).isoformat(),
    tags=['sparql', 'search', 'entity']
))

test_store.add(MemoryItem(
    id=str(uuid.uuid4()),
    title='Property exploration strategy',
    description='Systematically explore properties using describe then probe.',
    content='- Action A\n- Action B',
    source_type='success',
    task_query='What properties does X have?',
    created_at=datetime.now(timezone.utc).isoformat(),
    tags=['properties', 'exploration']
))

test_store.add(MemoryItem(
    id=str(uuid.uuid4()),
    title='Debugging failed SPARQL queries',
    description='Check syntax, namespaces, and endpoint first.',
    content='- Check 1\n- Check 2',
    source_type='failure',
    task_query='Query failed with error',
    created_at=datetime.now(timezone.utc).isoformat(),
    tags=['sparql', 'debugging', 'error']
))

# Test retrieval for different queries
results1 = retrieve_memories(test_store, 'How do I search for entities?', k=2)
assert len(results1) <= 2
assert any('search' in r.title.lower() or 'search' in r.tags for r in results1)
print(f"✓ Retrieved {len(results1)} memories for 'search for entities'")

results2 = retrieve_memories(test_store, 'My SPARQL query is broken', k=2)
assert len(results2) <= 2
assert any('sparql' in r.tags for r in results2)
print(f"✓ Retrieved {len(results2)} memories for 'SPARQL query broken'")

results3 = retrieve_memories(test_store, 'What properties does prov:Activity have?', k=2)
print(f"✓ Retrieved {len(results3)} memories for 'properties question'")

# Test access count increment
assert results1[0].access_count > 0
print("✓ Access count tracking works")

✓ Retrieved 2 memories for 'search for entities'
✓ Retrieved 2 memories for 'SPARQL query broken'
✓ Retrieved 2 memories for 'properties question'
✓ Access count tracking works

DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
  created_at=datetime.utcnow().isoformat(),
<ipython-input-1-c9306d916f1d>:23: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
  created_at=datetime.utcnow().isoformat(),
<ipython-input-1-c9306d916f1d>:34: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
  created_at=datetime.utcnow().isoformat(),

Injection Formatting

Format retrieved memories for bounded prompt injection.

Output includes: - Assessment instruction - Title + description + up to 3 key bullets from content

Never injects full content to maintain bounded prompt size.

format_memories_for_injection


def format_memories_for_injection(
    memories:list, max_bullets:int=3
)->str:

Format memories for bounded prompt injection.

Returns string with: - Assessment instruction - Title + description + key bullets from content (up to max_bullets)

Args: memories: List of MemoryItem objects to format max_bullets: Maximum bullets to extract from content

Returns: Formatted string for prompt injection

# Test injection formatting
test_memories = [
    MemoryItem(
        id='test-1',
        title='SPARQL Search Pattern',
        description='Template for searching entities by label.',
        content="""- Use rdfs:label for human-readable names
- Add FILTER for case-insensitive matching
- Include LIMIT to avoid timeout
- Check for alternative label properties""",
        source_type='success',
        task_query='test',
        created_at=datetime.now(timezone.utc).isoformat(),
        tags=['sparql']
    ),
    MemoryItem(
        id='test-2',
        title='Property Discovery',
        description='Systematic approach to finding properties.',
        content="""1. Start with describe_entity() for overview
2. Use get_properties() for full list
3. Check both domain and range
4. Look for inverse properties""",
        source_type='success',
        task_query='test',
        created_at=datetime.now(timezone.utc).isoformat(),
        tags=['properties']
    )
]

formatted = format_memories_for_injection(test_memories, max_bullets=3)

# Verify format
assert '## Relevant Prior Experience' in formatted
assert 'assess which of these strategies' in formatted
assert '### 1. SPARQL Search Pattern' in formatted
assert '### 2. Property Discovery' in formatted
assert 'Use rdfs:label' in formatted
assert 'Start with describe_entity' in formatted

# Verify bullet limiting (should have max 3 bullets per memory)
lines = formatted.split('\n')
bullet_count_mem1 = sum(1 for l in lines[lines.index('### 1. SPARQL Search Pattern'):lines.index('### 2. Property Discovery')] if l.strip().startswith('-'))
assert bullet_count_mem1 <= 3

print("✓ Injection formatting works")
print("\nFormatted output:")
print(formatted[:300] + "...")

✓ Injection formatting works

Formatted output:
## Relevant Prior Experience

Before taking action, briefly assess which of these strategies apply to your current task and which do not.

### 1. SPARQL Search Pattern
Template for searching entities by label.
Key points:
- Use rdfs:label for human-readable names
- Add FILTER for case-insensitive ma...

DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
  created_at=datetime.utcnow().isoformat(),
<ipython-input-1-2dc0c9d48ca1>:26: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
  created_at=datetime.utcnow().isoformat(),

Integration

Complete closed-loop: RETRIEVE → INJECT → INTERACT → EXTRACT → STORE

rlm_run_with_memory


def rlm_run_with_memory(
    query:str, context:str, memory_store:MemoryStore, ns:dict=None, enable_memory_extraction:bool=True,
    persist_dataset:bool=False, # NEW: Dataset persistence
    dataset_path:Path=None, kwargs:VAR_KEYWORD
)->tuple:

RLM run with procedural memory loop.

Closed-loop cycle: 1. RETRIEVE: Get relevant memories via BM25 2. INJECT: Add to context/prompt 3. INTERACT: Run rlm_run() 4. EXTRACT: Judge + extract new memories 5. STORE: Persist new memories

NEW: Dataset persistence: - If persist_dataset=True and dataset_path provided, loads snapshot before run - After run, if dataset was modified, saves snapshot - Stores snapshot path in extracted MemoryItem for lineage

Args: query: Task query string context: Context string (e.g., ontology summary) memory_store: MemoryStore instance for retrieval/storage ns: Optional namespace dict enable_memory_extraction: Whether to extract and store new memories (default True) persist_dataset: Whether to persist dataset snapshots (default False) dataset_path: Optional path for dataset snapshot **kwargs: Additional arguments for rlm_run()

Returns: Tuple of (answer, iterations, ns, new_memories)

# Integration test (requires full RLM setup)
from rlm.ontology import setup_ontology_context
import tempfile

def test_memory_improves_convergence():
    """Second attempt should benefit from first attempt's memory."""
    with tempfile.TemporaryDirectory() as tmpdir:
        store = MemoryStore(path=Path(tmpdir) / 'test_integration.json')
        
        # First run - no memories
        ns = {}
        setup_ontology_context('ontology/prov.ttl', ns, name='prov')
        
        answer1, iters1, ns1, mems1 = rlm_run_with_memory(
            "What is prov:Activity and what properties does it have?",
            ns['prov_meta'].summary(),
            store,
            ns=ns
        )
        print(f"\nFirst run: {len(iters1)} iterations, {len(mems1)} memories extracted")
        for mem in mems1:
            print(f"  - {mem.title}")
        
        # Second run - similar task, should retrieve memories
        ns2 = {}
        setup_ontology_context('ontology/prov.ttl', ns2, name='prov')
        
        answer2, iters2, ns2, mems2 = rlm_run_with_memory(
            "What is prov:Entity and what properties does it have?",
            ns2['prov_meta'].summary(),
            store,
            ns=ns2
        )
        print(f"\nSecond run: {len(iters2)} iterations")
        print(f"Total memories in store: {len(store.memories)}")
        
        # Verify memories were retrieved
        retrieved_for_second = retrieve_memories(
            store,
            "What is prov:Entity and what properties does it have?",
            k=3
        )
        print(f"Memories that would be retrieved for second run: {len(retrieved_for_second)}")
        for mem in retrieved_for_second:
            print(f"  - {mem.title} (accessed {mem.access_count} times)")

# Run test
# test_memory_improves_convergence()

Usage Examples

End-to-end examples with PROV ontology.

# Full example: Build up procedural memory over multiple queries
from rlm.ontology import setup_ontology_context
from pathlib import Path

# Initialize memory store
store = MemoryStore(path=Path('memories/prov_memories.json'))

# If store exists, load it
if store.path.exists():
    store = MemoryStore.load(store.path)
    print(f"Loaded {len(store.memories)} existing memories")

# Setup ontology context
ns = {}
setup_ontology_context('ontology/prov.ttl', ns, name='prov')

# Series of queries
queries = [
    "What is prov:Activity?",
    "What properties does prov:Activity have?",
    "How are prov:Activity and prov:Entity related?",
]

for i, query in enumerate(queries, 1):
    print(f"\n{'='*60}")
    print(f"Query {i}: {query}")
    print('='*60)
    
    answer, iterations, ns, new_memories = rlm_run_with_memory(
        query,
        ns['prov_meta'].summary(),
        store,
        ns=ns
    )
    
    print(f"\nAnswer: {answer}")
    print(f"Iterations: {len(iterations)}")
    print(f"New memories extracted: {len(new_memories)}")
    for mem in new_memories:
        print(f"  - {mem.title}")

print(f"\n{'='*60}")
print(f"Final memory store: {len(store.memories)} memories")
print('='*60)

# Show all memories with access counts
for mem in store.memories:
    print(f"\n{mem.title}")
    print(f"  Source: {mem.source_type}")
    print(f"  Accessed: {mem.access_count} times")
    print(f"  Tags: {mem.tags}")

Bootstrap General Strategies

Architectural Role (2026-01-19 Refactor):

Universal ontology exploration patterns that should be loaded into memory on startup. These strategies were previously in reasoning_bank.CORE_RECIPES but were moved here to align with the ReasoningBank paper’s architecture.

Key Insight: General strategies are LEARNED (procedural memory), not AUTHORED (recipes).

Why Bootstrap?

Correct conceptual layer: Universal patterns belong in procedural_memory (Layer 1), not reasoning_bank (Layer 2)
Enable learning: Stored as MemoryItems, these can be:
- Retrieved via BM25 (not always injected)
- Updated with success_rate over time
- Merged/consolidated with newly extracted patterns
- Removed if ineffective
Future extensibility: New strategies extracted from successful runs can be added to memory_store automatically

Usage

# One-time bootstrap at startup
memory_store = MemoryStore()
for strategy in bootstrap_general_strategies():
    memory_store.add(strategy)

# Use in RLM runs
from rlm.reasoning_bank import rlm_run_enhanced
answer, iters, ns = rlm_run_enhanced(
    query="What is Activity?",
    context=meta.summary(),
    sense=sense,
    memory_store=memory_store  # General strategies retrieved via BM25
)

Note: These are seed strategies - the system can learn and add more over time via the memory extraction loop.

bootstrap_general_strategies


def bootstrap_general_strategies(
    
)->list:

Create general strategy memories for bootstrapping.

These are universal patterns extracted from successful RLM runs that apply to all ontologies.

Returns: List of MemoryItem objects representing general strategies

# Test bootstrap
strategies = bootstrap_general_strategies()
print(f"Bootstrapped {len(strategies)} general strategies:")
for s in strategies:
    print(f"  - {s.title}")
    print(f"    Tags: {s.tags}")
    print(f"    Task: {s.task_query}")

# Test that they can be stored
import tempfile
with tempfile.TemporaryDirectory() as tmpdir:
    test_path = Path(tmpdir) / 'bootstrap_test.json'
    store = MemoryStore(path=test_path)
    
    for strategy in strategies:
        store.add(strategy)
    
    store.save()
    
    # Reload and verify
    loaded = MemoryStore.load(test_path)
    assert len(loaded.memories) == len(strategies)
    print(f"\n✓ Bootstrap strategies can be saved and loaded")
    print(f"✓ Total: {len(loaded.memories)} strategies")

Bootstrapped 7 general strategies:
  - Describe Entity by Label
    Tags: ['entity', 'search', 'describe', 'universal']
    Task: entity_description
  - Find Subclasses Using GraphMeta
    Tags: ['hierarchy', 'subclass', 'graphmeta', 'universal']
    Task: hierarchy
  - Find Superclasses Using GraphMeta
    Tags: ['hierarchy', 'superclass', 'graphmeta', 'universal']
    Task: hierarchy
  - Find Properties by Domain/Range
    Tags: ['properties', 'domain', 'range', 'universal']
    Task: property_discovery
  - Pattern-Based Entity Search
    Tags: ['search', 'pattern', 'multiple', 'universal']
    Task: pattern_search
  - Find Relationship Path Between Entities
    Tags: ['relationships', 'path', 'connection', 'universal']
    Task: relationship_discovery
  - Navigate Class Hierarchy from Roots
    Tags: ['hierarchy', 'exploration', 'roots', 'universal']
    Task: hierarchy

✓ Bootstrap strategies can be saved and loaded
✓ Total: 7 strategies

Validation Functions

Validation gates to ensure quality and consistency of procedural memory.

validate_no_hardcoded_uris


def validate_no_hardcoded_uris(
    strategies:list
)->bool:

Ensure strategies don’t reference specific ontology URIs.

Universal strategies should use placeholders like {ontology}_meta instead of hardcoded ontology prefixes.

validate_bootstrap_strategies


def validate_bootstrap_strategies(
    
)->dict:

Validate bootstrap creates valid, non-conflicting strategies.

Checks: - Correct count (7 strategies) - All are valid MemoryItem objects - Unique titles (no duplicates) - All tagged as ‘universal’ - No hardcoded ontology-specific URIs

Returns: Dictionary with ‘valid’ flag and detailed checks

check_memory_deduplication


def check_memory_deduplication(
    new_memory:MemoryItem, store:MemoryStore, threshold:float=0.7
)->str:

Gate 1: Check for duplicate memories.

Uses title similarity to detect duplicates and decide action: - add: No similar memories, safe to add - merge: Similar memory exists, should combine insights - skip: Similar memory exists and is better, don’t add - replace: New memory is better, replace existing

Args: new_memory: MemoryItem to check store: MemoryStore to check against threshold: Similarity threshold (0-1) for considering duplicate

Returns: Action string: ‘add’, ‘merge’, ‘skip’, or ‘replace’

score_generalization


def score_generalization(
    memory:MemoryItem
)->float:

Gate 3: Score how generalizable a memory is (0-1).

Higher score = more general/reusable across ontologies. Lower score = specific to one ontology or situation.

Scoring factors: - Penalize hardcoded URIs (prov:, sio:, http://) - Reward procedural language (use, check, try, if/then) - Reward ‘universal’ tag

Args: memory: MemoryItem to score

Returns: Score between 0.0 and 1.0

validate_retrieval_quality


def validate_retrieval_quality(
    memory_store:MemoryStore, test_cases:list
)->dict:

Validate BM25 retrieves relevant memories for known queries.

Args: memory_store: MemoryStore with strategies test_cases: List of (query, expected_tags) tuples

Returns: Dictionary with validation results including success_rate

# Test validation functions
print("Test 1: Validate bootstrap strategies")
result = validate_bootstrap_strategies()
print(f"  Valid: {result['valid']}")
print(f"  Checks: {result['checks']}")

print("\nTest 2: Score generalization")
test_mem = bootstrap_general_strategies()[0]
score = score_generalization(test_mem)
print(f"  Strategy '{test_mem.title}' generalization score: {score:.2f}")

print("\nTest 3: Check memory deduplication")
store = MemoryStore()
strategies = bootstrap_general_strategies()
for s in strategies:
    store.add(s)

# Try adding a duplicate
duplicate = MemoryItem(
    id=str(uuid.uuid4()),
    title='Describe Entity by Label',  # Same as first strategy
    description='Test duplicate',
    content='Test content',
    source_type='success',
    task_query='test',
    created_at=datetime.now(timezone.utc).isoformat(),
    tags=['test']
)
action = check_memory_deduplication(duplicate, store, threshold=0.7)
print(f"  Action for duplicate: {action}")

print("\nTest 4: Validate retrieval quality")
test_cases = [
    ("What is Activity?", ['entity', 'describe']),
    ("Find subclasses", ['hierarchy', 'subclass']),
    ("What properties does it have?", ['properties', 'domain'])
]
retrieval_result = validate_retrieval_quality(store, test_cases)
print(f"  Valid: {retrieval_result['valid']}")
print(f"  Success rate: {retrieval_result['success_rate']:.1%}")

print("\n✓ All validation functions work")