RLM

Recursive Language Models for ontology-based query construction

This project implements the Recursive Language Model (RLM) architecture for querying RDF ontologies through progressive disclosure. The implementation follows the protocol from Zhang et al. (2025) while using claudette as the LLM backend.

The work is part of an ongoing investigation into how LLM agents can navigate large knowledge graphs without overwhelming their context windows. Rather than loading entire ontologies into the prompt, the agent iteratively explores through bounded REPL operations, delegating heavy summarization tasks to sub-LLMs.

Background

The RLM architecture addresses a fundamental tension in using LLMs for knowledge graph tasks: ontologies and query results are often too large to fit in context, yet the model needs semantic understanding to construct correct queries.

The solution externalizes the large context to a REPL environment. The root LLM emits small code blocks that execute against the graph, receiving truncated results. When more detail is needed, it delegates to sub-LLMs via llm_query() calls that summarize specific chunks. The process iterates until the model returns a final answer.

This implementation extends RLM with two complementary memory systems and a four-layer context injection strategy:

Dataset memory (RDF quads) stores domain facts discovered during exploration. An RDF Dataset provides named graphs for working memory (mem), provenance tracking (prov), and scratch space (work/*). Facts persist across queries and can be snapshotted for session continuity.

Procedural memory (ReasoningBank-style) stores reusable exploration strategies extracted from past trajectories. The system bootstraps with 7 universal strategies (describe entity, navigate hierarchy, find properties) stored as MemoryItem objects. After each RLM run, a judge evaluates success or failure, and an extractor distills new procedural insights. These are retrieved via BM25 for similar future tasks, allowing the agent to improve over time.

Structured sense data provides compact ontology metadata (~600 chars) with 100% URI grounding validation. Instead of loading full ontologies into context, the system injects targeted sense cards with key classes, properties, and exploration hints—achieving 83% iteration reduction on entity queries.

Additional components include SPARQL result handles that expose metadata without materializing full result sets, SHACL shape indexing for schema discovery, and query template retrieval from sh:SPARQLExecutable examples.

Context Engineering: Ont-Sense & Memory-Based Architecture

To enable effective ontology exploration, this implementation uses a four-layer context injection strategy that provides the LLM with just enough information without overwhelming its context window:

Layer 0: Structured Sense Data

Ont-Sense provides compact, programmatically-extracted ontology metadata with 100% URI grounding validation. Instead of loading full ontologies, the system injects ~600 character sense cards containing:

  • Key classes and properties (with URIs)
  • Available indexes (hierarchy, domains, ranges)
  • Label/description predicates
  • Quick exploration hints

The sense card is auto-generated from GraphMeta scaffolding and validated to ensure all URIs exist in the ontology (zero hallucinations). Progressive disclosure automatically injects detailed sections (hierarchy overview, common patterns) when query keywords trigger them.

Layer 1: General Strategies (Procedural Memory)

Universal exploration patterns are bootstrapped as procedural memories and retrieved via BM25 when relevant to the query. These 7 general strategies include:

  • Describe Entity by Label
  • Find Subclasses/Superclasses Using GraphMeta
  • Find Properties by Domain/Range
  • Pattern-Based Entity Search
  • Find Relationship Paths
  • Navigate Class Hierarchy from Roots

These strategies are stored as MemoryItem objects (not hardcoded), enabling the system to learn new patterns over time and update success rates based on actual performance.

Layer 2: Ontology-Specific Recipes

Domain-specific patterns (PROV Activity-Entity relationships, SIO measurement patterns) can be authored as Recipe objects and injected when working with specific ontologies. This layer is currently a placeholder, reserved for future ontology-specific guidance.

Layer 3: Base Context

GraphMeta summary and ontology statistics provide foundational context about triple counts, class/property distributions, and index availability.

Performance Results

This architecture achieves 83% iteration reduction on entity description queries:

  • Baseline (no enhancements): 6 iterations
  • With sense + memory: 1 iteration

The four-layer approach maintains bounded context size (~1800 chars total) while providing targeted, relevant guidance for each query type.

Installation

This project uses uv for package management with a shared environment:

source ~/uvws/.venv/bin/activate
uv pip install fastcore claudette rdflib rank-bm25
uv pip install -e .

For development, you also need nbdev:

uv pip install nbdev
nbdev_install_hooks

Example

The following demonstrates loading an ontology into the dataset memory and using bounded view functions to explore it.

from rlm.dataset import setup_dataset_context

# Initialize dataset with mem/prov graphs
ns = {}
setup_dataset_context(ns)
print(ns['dataset_stats']())
Dataset 'ds' (session: ae4334a3)
mem: 0 triples
prov: 0 events
work graphs: 0
onto graphs: 0
# Mount an ontology (SHACL shapes are auto-indexed)
ns['mount_ontology']('ontology/dcat-ap/dcat-ap-SHACL.ttl', 'dcat')

# The SHACL index is now available
print(ns['dcat_shacl'].summary())
from rlm.shacl_examples import search_shapes, describe_shape

# Search for shapes related to datasets
results = search_shapes(ns['dcat_shacl'], 'dataset', limit=3)
for r in results:
    print(f"{r['uri'].split('#')[-1]}: targets {r['targets']}")
# Get bounded description of a shape (first 10 properties)
desc = describe_shape(ns['dcat_shacl'], results[0]['uri'], limit=10)
print(f"Properties: {desc['property_count']} (showing {len(desc['properties'])})")
for p in desc['properties'][:5]:
    print(f"  {p['path'].split('/')[-1]}: min={p.get('minCount')}")

Query templates can be loaded from SHACL-AF examples and searched by keyword:

from rlm.shacl_examples import load_query_examples, search_queries, get_query_text

# Load neXtProt SPARQL examples
load_query_examples('ontology/uniprot/examples/neXtProt', ns, 'nxq')
print(ns['nxq'].summary())

# Find queries about phosphorylation
queries = search_queries(ns['nxq'], 'phosphorylation', limit=2)
for q in queries:
    print(f"{q['uri'].split('/')[-1]}: {q['comment'][:60]}...")

Tutorial

For a complete walkthrough with working examples, see 91_tutorial.ipynb. The tutorial demonstrates:

  • Core RLM loop with llm_query() and rlm_run()
  • Ontology loading with bounded views
  • Progressive disclosure over RDF graphs
  • Structured sense data with 100% URI grounding
  • Four-layer context injection (sense + memory + recipes + base)
  • Memory-based general strategies and BM25 retrieval
  • Dataset memory for fact persistence
  • SPARQL result handles
  • Procedural memory closed loop (judge + extract)
  • SHACL shape indexing
  • Multi-ontology integration

All cells are executed with real Claude API calls showing actual outputs.

Testing

The project includes a comprehensive test suite with 110+ tests covering all components:

tests/
├── unit/                    # Component-level tests
│   ├── test_sparql_handles.py
│   ├── test_session_tracking.py
│   ├── test_memory_store.py
│   ├── test_bootstrap_strategies.py      # NEW: Bootstrap validation
│   ├── test_memory_recipe_separation.py  # NEW: Architecture separation
│   └── test_sense_structured.py          # NEW: Sense data validation
├── integration/             # Cross-component tests
│   ├── test_dataset_memory.py
│   ├── test_sparql_dataset.py
│   ├── test_memory_closed_loop.py
│   └── test_full_stack.py
├── live/                    # API-required tests
│   └── test_memory_integration.py        # NEW: Memory-based architecture
└── test_quick_e2e.py        # End-to-end validation

Running Tests

# Activate environment
source ~/uvws/.venv/bin/activate

# Run unit tests (no API calls)
pytest tests/unit/ -v

# Run integration tests (no API calls)
pytest tests/integration/ -v

# Run live tests (requires ANTHROPIC_API_KEY)
ANTHROPIC_API_KEY=sk-... pytest tests/live/ -v

# Run quick end-to-end test (with API calls)
python tests/test_quick_e2e.py

# Run notebook tests
nbdev_test

All tests pass, validating: - Core RLM loop with Claude API - Ontology loading and exploration - Structured sense data with URI grounding ✅ - Bootstrap general strategies (7 universal patterns) ✅ - Memory-recipe separation validation ✅ - Four-layer context injection ✅ - Dataset memory persistence - SPARQL result handles - Procedural memory closed loop - SHACL shape indexing - End-to-end integration workflows

See tests/README.md for detailed test documentation.

Testing

The project includes a comprehensive test suite with 100+ tests covering all components:

tests/
├── unit/                    # Component-level tests
│   ├── test_sparql_handles.py
│   ├── test_session_tracking.py
│   └── test_memory_store.py
├── integration/             # Cross-component tests
│   ├── test_dataset_memory.py
│   ├── test_sparql_dataset.py
│   ├── test_memory_closed_loop.py
│   └── test_full_stack.py
└── test_quick_e2e.py        # End-to-end validation

Running Tests

# Activate environment
source ~/uvws/.venv/bin/activate

# Run all tests (no API calls)
pytest tests/unit/ tests/integration/ -v

# Run quick end-to-end test (with API calls)
python tests/test_quick_e2e.py

# Run notebook tests
nbdev_test

All tests pass, validating: - Core RLM loop with Claude API - Ontology loading and exploration - Dataset memory persistence - SPARQL result handles - Procedural memory closed loop - SHACL shape indexing - End-to-end integration workflows

See tests/README.md for detailed test documentation.

Status

This is preliminary research code under active development. The current implementation covers stages 1-5 of the trajectory:

  • Stage 1: Core RLM loop with claudette backend ✅
  • Stage 2: Bounded view primitives for progressive disclosure ✅
  • Stage 3: SPARQL handles with work-bound query execution ✅
  • Stage 4: SHACL shape indexing and query template retrieval ✅
  • Stage 5: Ont-Sense improvements & ReasoningBank integration ✅
    • Structured sense data with 100% URI grounding
    • Four-layer context injection (sense, memory, recipes, base context)
    • Memory-based general strategies (bootstrap + learning)
    • Validation pipeline and comprehensive test suite
    • 83% iteration reduction on entity queries

Stage 6 (evaluation framework) is in progress with task-based eval system in evals/.

The code is developed through exploratory programming in Jupyter notebooks using nbdev. It targets integration with the Solveit platform but can run standalone.

References