from rlm.dataset import setup_dataset_context
# Initialize dataset with mem/prov graphs
ns = {}
setup_dataset_context(ns)
print(ns['dataset_stats']())Dataset 'ds' (session: ae4334a3)
mem: 0 triples
prov: 0 events
work graphs: 0
onto graphs: 0
This project implements the Recursive Language Model (RLM) architecture for querying RDF ontologies through progressive disclosure. The implementation follows the protocol from Zhang et al. (2025) while using claudette as the LLM backend.
The work is part of an ongoing investigation into how LLM agents can navigate large knowledge graphs without overwhelming their context windows. Rather than loading entire ontologies into the prompt, the agent iteratively explores through bounded REPL operations, delegating heavy summarization tasks to sub-LLMs.
The RLM architecture addresses a fundamental tension in using LLMs for knowledge graph tasks: ontologies and query results are often too large to fit in context, yet the model needs semantic understanding to construct correct queries.
The solution externalizes the large context to a REPL environment. The root LLM emits small code blocks that execute against the graph, receiving truncated results. When more detail is needed, it delegates to sub-LLMs via llm_query() calls that summarize specific chunks. The process iterates until the model returns a final answer.
This implementation extends RLM with two complementary memory systems and a four-layer context injection strategy:
Dataset memory (RDF quads) stores domain facts discovered during exploration. An RDF Dataset provides named graphs for working memory (mem), provenance tracking (prov), and scratch space (work/*). Facts persist across queries and can be snapshotted for session continuity.
Procedural memory (ReasoningBank-style) stores reusable exploration strategies extracted from past trajectories. The system bootstraps with 7 universal strategies (describe entity, navigate hierarchy, find properties) stored as MemoryItem objects. After each RLM run, a judge evaluates success or failure, and an extractor distills new procedural insights. These are retrieved via BM25 for similar future tasks, allowing the agent to improve over time.
Structured sense data provides compact ontology metadata (~600 chars) with 100% URI grounding validation. Instead of loading full ontologies into context, the system injects targeted sense cards with key classes, properties, and exploration hints—achieving 83% iteration reduction on entity queries.
Additional components include SPARQL result handles that expose metadata without materializing full result sets, SHACL shape indexing for schema discovery, and query template retrieval from sh:SPARQLExecutable examples.
To enable effective ontology exploration, this implementation uses a four-layer context injection strategy that provides the LLM with just enough information without overwhelming its context window:
Ont-Sense provides compact, programmatically-extracted ontology metadata with 100% URI grounding validation. Instead of loading full ontologies, the system injects ~600 character sense cards containing:
The sense card is auto-generated from GraphMeta scaffolding and validated to ensure all URIs exist in the ontology (zero hallucinations). Progressive disclosure automatically injects detailed sections (hierarchy overview, common patterns) when query keywords trigger them.
Universal exploration patterns are bootstrapped as procedural memories and retrieved via BM25 when relevant to the query. These 7 general strategies include:
These strategies are stored as MemoryItem objects (not hardcoded), enabling the system to learn new patterns over time and update success rates based on actual performance.
Domain-specific patterns (PROV Activity-Entity relationships, SIO measurement patterns) can be authored as Recipe objects and injected when working with specific ontologies. This layer is currently a placeholder, reserved for future ontology-specific guidance.
GraphMeta summary and ontology statistics provide foundational context about triple counts, class/property distributions, and index availability.
This architecture achieves 83% iteration reduction on entity description queries:
The four-layer approach maintains bounded context size (~1800 chars total) while providing targeted, relevant guidance for each query type.
This project uses uv for package management with a shared environment:
For development, you also need nbdev:
The following demonstrates loading an ontology into the dataset memory and using bounded view functions to explore it.
Dataset 'ds' (session: ae4334a3)
mem: 0 triples
prov: 0 events
work graphs: 0
onto graphs: 0
# Get bounded description of a shape (first 10 properties)
desc = describe_shape(ns['dcat_shacl'], results[0]['uri'], limit=10)
print(f"Properties: {desc['property_count']} (showing {len(desc['properties'])})")
for p in desc['properties'][:5]:
print(f" {p['path'].split('/')[-1]}: min={p.get('minCount')}")Query templates can be loaded from SHACL-AF examples and searched by keyword:
from rlm.shacl_examples import load_query_examples, search_queries, get_query_text
# Load neXtProt SPARQL examples
load_query_examples('ontology/uniprot/examples/neXtProt', ns, 'nxq')
print(ns['nxq'].summary())
# Find queries about phosphorylation
queries = search_queries(ns['nxq'], 'phosphorylation', limit=2)
for q in queries:
print(f"{q['uri'].split('/')[-1]}: {q['comment'][:60]}...")For a complete walkthrough with working examples, see 91_tutorial.ipynb. The tutorial demonstrates:
llm_query() and rlm_run()All cells are executed with real Claude API calls showing actual outputs.
The project includes a comprehensive test suite with 110+ tests covering all components:
tests/
├── unit/ # Component-level tests
│ ├── test_sparql_handles.py
│ ├── test_session_tracking.py
│ ├── test_memory_store.py
│ ├── test_bootstrap_strategies.py # NEW: Bootstrap validation
│ ├── test_memory_recipe_separation.py # NEW: Architecture separation
│ └── test_sense_structured.py # NEW: Sense data validation
├── integration/ # Cross-component tests
│ ├── test_dataset_memory.py
│ ├── test_sparql_dataset.py
│ ├── test_memory_closed_loop.py
│ └── test_full_stack.py
├── live/ # API-required tests
│ └── test_memory_integration.py # NEW: Memory-based architecture
└── test_quick_e2e.py # End-to-end validation
# Activate environment
source ~/uvws/.venv/bin/activate
# Run unit tests (no API calls)
pytest tests/unit/ -v
# Run integration tests (no API calls)
pytest tests/integration/ -v
# Run live tests (requires ANTHROPIC_API_KEY)
ANTHROPIC_API_KEY=sk-... pytest tests/live/ -v
# Run quick end-to-end test (with API calls)
python tests/test_quick_e2e.py
# Run notebook tests
nbdev_testAll tests pass, validating: - Core RLM loop with Claude API - Ontology loading and exploration - Structured sense data with URI grounding ✅ - Bootstrap general strategies (7 universal patterns) ✅ - Memory-recipe separation validation ✅ - Four-layer context injection ✅ - Dataset memory persistence - SPARQL result handles - Procedural memory closed loop - SHACL shape indexing - End-to-end integration workflows
See tests/README.md for detailed test documentation.
The project includes a comprehensive test suite with 100+ tests covering all components:
tests/
├── unit/ # Component-level tests
│ ├── test_sparql_handles.py
│ ├── test_session_tracking.py
│ └── test_memory_store.py
├── integration/ # Cross-component tests
│ ├── test_dataset_memory.py
│ ├── test_sparql_dataset.py
│ ├── test_memory_closed_loop.py
│ └── test_full_stack.py
└── test_quick_e2e.py # End-to-end validation
All tests pass, validating: - Core RLM loop with Claude API - Ontology loading and exploration - Dataset memory persistence - SPARQL result handles - Procedural memory closed loop - SHACL shape indexing - End-to-end integration workflows
See tests/README.md for detailed test documentation.
This is preliminary research code under active development. The current implementation covers stages 1-5 of the trajectory:
Stage 6 (evaluation framework) is in progress with task-based eval system in evals/.
The code is developed through exploratory programming in Jupyter notebooks using nbdev. It targets integration with the Solveit platform but can run standalone.