This module implements Stage 2.5: Procedural Memory Loop inspired by the ReasoningBank paper. The goal is to enable an RLM agent to improve over time by accumulating procedural knowledge (strategies, templates, debugging moves) without replacing evidence-based retrieval.
A MemoryItem represents a reusable procedural insight extracted from an RLM trajectory.
Constraints: - Items must be small enough to inject into prompts - content should be procedural (steps/checklist), not a retelling - Up to 3 items extracted per trajectory
A reusable procedural memory extracted from an RLM trajectory.
Attributes: id: Unique identifier (UUID) title: Concise identifier (≤10 words) description: One-sentence summary content: Procedural steps/checklist/template (Markdown) source_type: ‘success’ or ‘failure’ task_query: Original task that produced this memory created_at: ISO timestamp access_count: Number of times retrieved (for future consolidation) tags: Keywords for BM25 retrieval session_id: Optional session ID from DatasetMeta (links to dataset session)
# Test MemoryItem creation and serializationtest_item = MemoryItem(id='test-uuid', title='SPARQL Query Pattern', description='Template for searching entities by label.', content='- Use `rdfs:label` for human-readable names\n- Add FILTER for case-insensitive search', source_type='success', task_query='Find entities named "Activity"', created_at=datetime.now(timezone.utc).isoformat(), tags=['sparql', 'search', 'rdfs'])# Test roundtripdata = test_item.to_dict()restored = MemoryItem.from_dict(data)assert restored.title == test_item.titleassert restored.tags == test_item.tagsprint("✓ MemoryItem serialization works")
✓ MemoryItem serialization works
DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
created_at=datetime.utcnow().isoformat(),
Memory Store
Persistent storage for procedural memories using a simple JSON file format.
Create bounded trajectory artifact for judge/extractor.
Summarizes each iteration’s code blocks into 1-2 line “action + outcome”, limiting to ~10 most informative key steps.
Args: task: Original task query answer: Final answer from rlm_run iterations: List of RLMIteration objects ns: Final namespace dict
Returns: Dictionary with keys: - task: str - final_answer: str - iteration_count: int - converged: bool (whether final_answer was set) - key_steps: List of {iteration, action, outcome} - variables_created: List of variable names in ns - errors_encountered: List of error messages from stderr
# Test with mock iterationsfrom rlm._rlmpaper_compat import CodeBlock, REPLResultmock_block1 = CodeBlock( code="search('Activity')", result=REPLResult(stdout="Found 3 entities", stderr=None, locals={}))mock_block2 = CodeBlock( code="describe_entity('prov:Activity')", result=REPLResult(stdout="prov:Activity is a class", stderr=None, locals={}))mock_iteration = RLMIteration( prompt="test prompt", response="test response", code_blocks=[mock_block1, mock_block2], final_answer=None, iteration_time=0.5)artifact = extract_trajectory_artifact( task="What is prov:Activity?", answer="prov:Activity is a class", iterations=[mock_iteration], ns={'result': 'prov:Activity is a class'})assert artifact['task'] =="What is prov:Activity?"assert artifact['iteration_count'] ==1assert artifact['converged'] ==Trueassertlen(artifact['key_steps']) ==2assert'search'in artifact['key_steps'][0]['action'].lower()assertlen(artifact['variables_created']) ==1print("✓ Trajectory artifact extraction works")
✓ Trajectory artifact extraction works
Judge
Classify trajectory as success or failure with evidence-sensitivity.
Success criteria: 1. Answer directly addresses the task 2. Answer is grounded in retrieved evidence (not hallucinated) 3. Reasoning shows systematic exploration
Failure indicators: 1. No answer produced (didn’t converge) 2. Answer doesn’t address the task 3. Answer makes claims without supporting evidence
Evidence-sensitive: success requires grounding in retrieved evidence.
Args: artifact: Trajectory artifact from extract_trajectory_artifact() ns: Optional namespace for additional context
Returns: Dictionary with keys: - is_success: bool - reason: str - confidence: str (‘high’, ‘medium’, ‘low’) - missing: list[str] (what evidence was lacking if failure)
# Test judge with real LLM (requires API key)test_artifact = {'task': 'What is prov:Activity?','final_answer': 'prov:Activity is a class representing activities in PROV ontology','iteration_count': 2,'converged': True,'key_steps': [ {'iteration': 1, 'action': "search('Activity')", 'outcome': 'Found 3 entities'}, {'iteration': 2, 'action': "describe_entity('prov:Activity')", 'outcome': 'A class in PROV'} ],'variables_created': ['result'],'errors_encountered': []}judgment = judge_trajectory(test_artifact)print(f"Success: {judgment['is_success']}")print(f"Reason: {judgment['reason']}")print(f"Confidence: {judgment['confidence']}")
Extractor
Extract 1-3 reusable memory items from a trajectory.
For successes: Emphasize why the approach worked
For failures: Emphasize what to avoid and recovery strategies
Output format: Procedural (steps/checklist/template), NOT a retelling
Tokenizes task and searches over title + description + tags.
Args: store: MemoryStore instance task: Task query string k: Number of memories to retrieve
Returns: List of top-k MemoryItem objects (may be fewer if scores ≤ 0)
# Test BM25 retrievaltest_store = MemoryStore()# Add diverse memoriestest_store.add(MemoryItem(id=str(uuid.uuid4()), title='SPARQL query pattern for entity search', description='Use rdfs:label with FILTER for case-insensitive search.', content='- Step 1\n- Step 2', source_type='success', task_query='Find entities by name', created_at=datetime.now(timezone.utc).isoformat(), tags=['sparql', 'search', 'entity']))test_store.add(MemoryItem(id=str(uuid.uuid4()), title='Property exploration strategy', description='Systematically explore properties using describe then probe.', content='- Action A\n- Action B', source_type='success', task_query='What properties does X have?', created_at=datetime.now(timezone.utc).isoformat(), tags=['properties', 'exploration']))test_store.add(MemoryItem(id=str(uuid.uuid4()), title='Debugging failed SPARQL queries', description='Check syntax, namespaces, and endpoint first.', content='- Check 1\n- Check 2', source_type='failure', task_query='Query failed with error', created_at=datetime.now(timezone.utc).isoformat(), tags=['sparql', 'debugging', 'error']))# Test retrieval for different queriesresults1 = retrieve_memories(test_store, 'How do I search for entities?', k=2)assertlen(results1) <=2assertany('search'in r.title.lower() or'search'in r.tags for r in results1)print(f"✓ Retrieved {len(results1)} memories for 'search for entities'")results2 = retrieve_memories(test_store, 'My SPARQL query is broken', k=2)assertlen(results2) <=2assertany('sparql'in r.tags for r in results2)print(f"✓ Retrieved {len(results2)} memories for 'SPARQL query broken'")results3 = retrieve_memories(test_store, 'What properties does prov:Activity have?', k=2)print(f"✓ Retrieved {len(results3)} memories for 'properties question'")# Test access count incrementassert results1[0].access_count >0print("✓ Access count tracking works")
✓ Retrieved 2 memories for 'search for entities'
✓ Retrieved 2 memories for 'SPARQL query broken'
✓ Retrieved 2 memories for 'properties question'
✓ Access count tracking works
DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
created_at=datetime.utcnow().isoformat(),
<ipython-input-1-c9306d916f1d>:23: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
created_at=datetime.utcnow().isoformat(),
<ipython-input-1-c9306d916f1d>:34: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
created_at=datetime.utcnow().isoformat(),
Injection Formatting
Format retrieved memories for bounded prompt injection.
Output includes: - Assessment instruction - Title + description + up to 3 key bullets from content
Never injects full content to maintain bounded prompt size.
Returns string with: - Assessment instruction - Title + description + key bullets from content (up to max_bullets)
Args: memories: List of MemoryItem objects to format max_bullets: Maximum bullets to extract from content
Returns: Formatted string for prompt injection
# Test injection formattingtest_memories = [ MemoryItem(id='test-1', title='SPARQL Search Pattern', description='Template for searching entities by label.', content="""- Use rdfs:label for human-readable names- Add FILTER for case-insensitive matching- Include LIMIT to avoid timeout- Check for alternative label properties""", source_type='success', task_query='test', created_at=datetime.now(timezone.utc).isoformat(), tags=['sparql'] ), MemoryItem(id='test-2', title='Property Discovery', description='Systematic approach to finding properties.', content="""1. Start with describe_entity() for overview2. Use get_properties() for full list3. Check both domain and range4. Look for inverse properties""", source_type='success', task_query='test', created_at=datetime.now(timezone.utc).isoformat(), tags=['properties'] )]formatted = format_memories_for_injection(test_memories, max_bullets=3)# Verify formatassert'## Relevant Prior Experience'in formattedassert'assess which of these strategies'in formattedassert'### 1. SPARQL Search Pattern'in formattedassert'### 2. Property Discovery'in formattedassert'Use rdfs:label'in formattedassert'Start with describe_entity'in formatted# Verify bullet limiting (should have max 3 bullets per memory)lines = formatted.split('\n')bullet_count_mem1 =sum(1for l in lines[lines.index('### 1. SPARQL Search Pattern'):lines.index('### 2. Property Discovery')] if l.strip().startswith('-'))assert bullet_count_mem1 <=3print("✓ Injection formatting works")print("\nFormatted output:")print(formatted[:300] +"...")
✓ Injection formatting works
Formatted output:
## Relevant Prior Experience
Before taking action, briefly assess which of these strategies apply to your current task and which do not.
### 1. SPARQL Search Pattern
Template for searching entities by label.
Key points:
- Use rdfs:label for human-readable names
- Add FILTER for case-insensitive ma...
DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
created_at=datetime.utcnow().isoformat(),
<ipython-input-1-2dc0c9d48ca1>:26: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
created_at=datetime.utcnow().isoformat(),
Closed-loop cycle: 1. RETRIEVE: Get relevant memories via BM25 2. INJECT: Add to context/prompt 3. INTERACT: Run rlm_run() 4. EXTRACT: Judge + extract new memories 5. STORE: Persist new memories
NEW: Dataset persistence: - If persist_dataset=True and dataset_path provided, loads snapshot before run - After run, if dataset was modified, saves snapshot - Stores snapshot path in extracted MemoryItem for lineage
Args: query: Task query string context: Context string (e.g., ontology summary) memory_store: MemoryStore instance for retrieval/storage ns: Optional namespace dict enable_memory_extraction: Whether to extract and store new memories (default True) persist_dataset: Whether to persist dataset snapshots (default False) dataset_path: Optional path for dataset snapshot **kwargs: Additional arguments for rlm_run()
Returns: Tuple of (answer, iterations, ns, new_memories)
# Integration test (requires full RLM setup)from rlm.ontology import setup_ontology_contextimport tempfiledef test_memory_improves_convergence():"""Second attempt should benefit from first attempt's memory."""with tempfile.TemporaryDirectory() as tmpdir: store = MemoryStore(path=Path(tmpdir) /'test_integration.json')# First run - no memories ns = {} setup_ontology_context('ontology/prov.ttl', ns, name='prov') answer1, iters1, ns1, mems1 = rlm_run_with_memory("What is prov:Activity and what properties does it have?", ns['prov_meta'].summary(), store, ns=ns )print(f"\nFirst run: {len(iters1)} iterations, {len(mems1)} memories extracted")for mem in mems1:print(f" - {mem.title}")# Second run - similar task, should retrieve memories ns2 = {} setup_ontology_context('ontology/prov.ttl', ns2, name='prov') answer2, iters2, ns2, mems2 = rlm_run_with_memory("What is prov:Entity and what properties does it have?", ns2['prov_meta'].summary(), store, ns=ns2 )print(f"\nSecond run: {len(iters2)} iterations")print(f"Total memories in store: {len(store.memories)}")# Verify memories were retrieved retrieved_for_second = retrieve_memories( store,"What is prov:Entity and what properties does it have?", k=3 )print(f"Memories that would be retrieved for second run: {len(retrieved_for_second)}")for mem in retrieved_for_second:print(f" - {mem.title} (accessed {mem.access_count} times)")# Run test# test_memory_improves_convergence()
Usage Examples
End-to-end examples with PROV ontology.
# Full example: Build up procedural memory over multiple queriesfrom rlm.ontology import setup_ontology_contextfrom pathlib import Path# Initialize memory storestore = MemoryStore(path=Path('memories/prov_memories.json'))# If store exists, load itif store.path.exists(): store = MemoryStore.load(store.path)print(f"Loaded {len(store.memories)} existing memories")# Setup ontology contextns = {}setup_ontology_context('ontology/prov.ttl', ns, name='prov')# Series of queriesqueries = ["What is prov:Activity?","What properties does prov:Activity have?","How are prov:Activity and prov:Entity related?",]for i, query inenumerate(queries, 1):print(f"\n{'='*60}")print(f"Query {i}: {query}")print('='*60) answer, iterations, ns, new_memories = rlm_run_with_memory( query, ns['prov_meta'].summary(), store, ns=ns )print(f"\nAnswer: {answer}")print(f"Iterations: {len(iterations)}")print(f"New memories extracted: {len(new_memories)}")for mem in new_memories:print(f" - {mem.title}")print(f"\n{'='*60}")print(f"Final memory store: {len(store.memories)} memories")print('='*60)# Show all memories with access countsfor mem in store.memories:print(f"\n{mem.title}")print(f" Source: {mem.source_type}")print(f" Accessed: {mem.access_count} times")print(f" Tags: {mem.tags}")
Bootstrap General Strategies
Architectural Role (2026-01-19 Refactor):
Universal ontology exploration patterns that should be loaded into memory on startup. These strategies were previously in reasoning_bank.CORE_RECIPES but were moved here to align with the ReasoningBank paper’s architecture.
Key Insight: General strategies are LEARNED (procedural memory), not AUTHORED (recipes).
Why Bootstrap?
Correct conceptual layer: Universal patterns belong in procedural_memory (Layer 1), not reasoning_bank (Layer 2)
Enable learning: Stored as MemoryItems, these can be:
Retrieved via BM25 (not always injected)
Updated with success_rate over time
Merged/consolidated with newly extracted patterns
Removed if ineffective
Future extensibility: New strategies extracted from successful runs can be added to memory_store automatically
Usage
# One-time bootstrap at startupmemory_store = MemoryStore()for strategy in bootstrap_general_strategies(): memory_store.add(strategy)# Use in RLM runsfrom rlm.reasoning_bank import rlm_run_enhancedanswer, iters, ns = rlm_run_enhanced( query="What is Activity?", context=meta.summary(), sense=sense, memory_store=memory_store # General strategies retrieved via BM25)
Note: These are seed strategies - the system can learn and add more over time via the memory extraction loop.
bootstrap_general_strategies
def bootstrap_general_strategies()->list:
Create general strategy memories for bootstrapping.
These are universal patterns extracted from successful RLM runs that apply to all ontologies.
Returns: List of MemoryItem objects representing general strategies
# Test bootstrapstrategies = bootstrap_general_strategies()print(f"Bootstrapped {len(strategies)} general strategies:")for s in strategies:print(f" - {s.title}")print(f" Tags: {s.tags}")print(f" Task: {s.task_query}")# Test that they can be storedimport tempfilewith tempfile.TemporaryDirectory() as tmpdir: test_path = Path(tmpdir) /'bootstrap_test.json' store = MemoryStore(path=test_path)for strategy in strategies: store.add(strategy) store.save()# Reload and verify loaded = MemoryStore.load(test_path)assertlen(loaded.memories) ==len(strategies)print(f"\n✓ Bootstrap strategies can be saved and loaded")print(f"✓ Total: {len(loaded.memories)} strategies")
Bootstrapped 7 general strategies:
- Describe Entity by Label
Tags: ['entity', 'search', 'describe', 'universal']
Task: entity_description
- Find Subclasses Using GraphMeta
Tags: ['hierarchy', 'subclass', 'graphmeta', 'universal']
Task: hierarchy
- Find Superclasses Using GraphMeta
Tags: ['hierarchy', 'superclass', 'graphmeta', 'universal']
Task: hierarchy
- Find Properties by Domain/Range
Tags: ['properties', 'domain', 'range', 'universal']
Task: property_discovery
- Pattern-Based Entity Search
Tags: ['search', 'pattern', 'multiple', 'universal']
Task: pattern_search
- Find Relationship Path Between Entities
Tags: ['relationships', 'path', 'connection', 'universal']
Task: relationship_discovery
- Navigate Class Hierarchy from Roots
Tags: ['hierarchy', 'exploration', 'roots', 'universal']
Task: hierarchy
✓ Bootstrap strategies can be saved and loaded
✓ Total: 7 strategies
Validation Functions
Validation gates to ensure quality and consistency of procedural memory.
Checks: - Correct count (7 strategies) - All are valid MemoryItem objects - Unique titles (no duplicates) - All tagged as ‘universal’ - No hardcoded ontology-specific URIs
Returns: Dictionary with ‘valid’ flag and detailed checks
Uses title similarity to detect duplicates and decide action: - add: No similar memories, safe to add - merge: Similar memory exists, should combine insights - skip: Similar memory exists and is better, don’t add - replace: New memory is better, replace existing
Args: new_memory: MemoryItem to check store: MemoryStore to check against threshold: Similarity threshold (0-1) for considering duplicate
Returns: Action string: ‘add’, ‘merge’, ‘skip’, or ‘replace’