# RLM


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

This project implements the Recursive Language Model (RLM) architecture
for querying RDF ontologies through progressive disclosure. The
implementation follows the protocol from [Zhang et
al. (2025)](https://github.com/alexzhang13/rlm) while using
[claudette](https://claudette.answer.ai/) as the LLM backend.

The work is part of an ongoing investigation into how LLM agents can
navigate large knowledge graphs without overwhelming their context
windows. Rather than loading entire ontologies into the prompt, the
agent iteratively explores through bounded REPL operations, delegating
heavy summarization tasks to sub-LLMs.

## Background

The RLM architecture addresses a fundamental tension in using LLMs for
knowledge graph tasks: ontologies and query results are often too large
to fit in context, yet the model needs semantic understanding to
construct correct queries.

The solution externalizes the large context to a REPL environment. The
root LLM emits small code blocks that execute against the graph,
receiving truncated results. When more detail is needed, it delegates to
sub-LLMs via `llm_query()` calls that summarize specific chunks. The
process iterates until the model returns a final answer.

This implementation extends RLM with two complementary memory systems
and a four-layer context injection strategy:

**Dataset memory** (RDF quads) stores domain facts discovered during
exploration. An RDF Dataset provides named graphs for working memory
(`mem`), provenance tracking (`prov`), and scratch space (`work/*`).
Facts persist across queries and can be snapshotted for session
continuity.

**Procedural memory**
([ReasoningBank](https://arxiv.org/html/2509.25140v1)-style) stores
reusable exploration strategies extracted from past trajectories. The
system bootstraps with 7 universal strategies (describe entity, navigate
hierarchy, find properties) stored as `MemoryItem` objects. After each
RLM run, a judge evaluates success or failure, and an extractor distills
new procedural insights. These are retrieved via BM25 for similar future
tasks, allowing the agent to improve over time.

**Structured sense data** provides compact ontology metadata (~600
chars) with 100% URI grounding validation. Instead of loading full
ontologies into context, the system injects targeted sense cards with
key classes, properties, and exploration hints—achieving 83% iteration
reduction on entity queries.

Additional components include SPARQL result handles that expose metadata
without materializing full result sets, SHACL shape indexing for schema
discovery, and query template retrieval from `sh:SPARQLExecutable`
examples.

## Context Engineering: Ont-Sense & Memory-Based Architecture

To enable effective ontology exploration, this implementation uses a
**four-layer context injection strategy** that provides the LLM with
just enough information without overwhelming its context window:

### Layer 0: Structured Sense Data

**Ont-Sense** provides compact, programmatically-extracted ontology
metadata with 100% URI grounding validation. Instead of loading full
ontologies, the system injects ~600 character sense cards containing:

- Key classes and properties (with URIs)
- Available indexes (hierarchy, domains, ranges)
- Label/description predicates
- Quick exploration hints

The sense card is auto-generated from GraphMeta scaffolding and
validated to ensure all URIs exist in the ontology (zero
hallucinations). Progressive disclosure automatically injects detailed
sections (hierarchy overview, common patterns) when query keywords
trigger them.

### Layer 1: General Strategies (Procedural Memory)

Universal exploration patterns are **bootstrapped as procedural
memories** and retrieved via BM25 when relevant to the query. These 7
general strategies include:

- Describe Entity by Label
- Find Subclasses/Superclasses Using GraphMeta
- Find Properties by Domain/Range
- Pattern-Based Entity Search
- Find Relationship Paths
- Navigate Class Hierarchy from Roots

These strategies are stored as `MemoryItem` objects (not hardcoded),
enabling the system to learn new patterns over time and update success
rates based on actual performance.

### Layer 2: Ontology-Specific Recipes

Domain-specific patterns (PROV Activity-Entity relationships, SIO
measurement patterns) can be authored as `Recipe` objects and injected
when working with specific ontologies. This layer is currently a
placeholder, reserved for future ontology-specific guidance.

### Layer 3: Base Context

GraphMeta summary and ontology statistics provide foundational context
about triple counts, class/property distributions, and index
availability.

### Performance Results

This architecture achieves **83% iteration reduction** on entity
description queries:

- **Baseline** (no enhancements): 6 iterations
- **With sense + memory**: 1 iteration

The four-layer approach maintains bounded context size (~1800 chars
total) while providing targeted, relevant guidance for each query type.

## Installation

This project uses [uv](https://github.com/astral-sh/uv) for package
management with a shared environment:

``` bash
source ~/uvws/.venv/bin/activate
uv pip install fastcore claudette rdflib rank-bm25
uv pip install -e .
```

For development, you also need nbdev:

``` bash
uv pip install nbdev
nbdev_install_hooks
```

## Example

The following demonstrates loading an ontology into the dataset memory
and using bounded view functions to explore it.

``` python
from rlm.dataset import setup_dataset_context

# Initialize dataset with mem/prov graphs
ns = {}
setup_dataset_context(ns)
print(ns['dataset_stats']())
```

    Dataset 'ds' (session: ae4334a3)
    mem: 0 triples
    prov: 0 events
    work graphs: 0
    onto graphs: 0

``` python
# Mount an ontology (SHACL shapes are auto-indexed)
ns['mount_ontology']('ontology/dcat-ap/dcat-ap-SHACL.ttl', 'dcat')

# The SHACL index is now available
print(ns['dcat_shacl'].summary())
```

``` python
from rlm.shacl_examples import search_shapes, describe_shape

# Search for shapes related to datasets
results = search_shapes(ns['dcat_shacl'], 'dataset', limit=3)
for r in results:
    print(f"{r['uri'].split('#')[-1]}: targets {r['targets']}")
```

``` python
# Get bounded description of a shape (first 10 properties)
desc = describe_shape(ns['dcat_shacl'], results[0]['uri'], limit=10)
print(f"Properties: {desc['property_count']} (showing {len(desc['properties'])})")
for p in desc['properties'][:5]:
    print(f"  {p['path'].split('/')[-1]}: min={p.get('minCount')}")
```

Query templates can be loaded from SHACL-AF examples and searched by
keyword:

``` python
from rlm.shacl_examples import load_query_examples, search_queries, get_query_text

# Load neXtProt SPARQL examples
load_query_examples('ontology/uniprot/examples/neXtProt', ns, 'nxq')
print(ns['nxq'].summary())

# Find queries about phosphorylation
queries = search_queries(ns['nxq'], 'phosphorylation', limit=2)
for q in queries:
    print(f"{q['uri'].split('/')[-1]}: {q['comment'][:60]}...")
```

## Tutorial

For a complete walkthrough with working examples, see
[91_tutorial.ipynb](91_tutorial.html). The tutorial demonstrates:

- Core RLM loop with `llm_query()` and `rlm_run()`
- Ontology loading with bounded views
- Progressive disclosure over RDF graphs
- **Structured sense data** with 100% URI grounding
- **Four-layer context injection** (sense + memory + recipes + base)
- **Memory-based general strategies** and BM25 retrieval
- Dataset memory for fact persistence
- SPARQL result handles
- Procedural memory closed loop (judge + extract)
- SHACL shape indexing
- Multi-ontology integration

All cells are executed with real Claude API calls showing actual
outputs.

## Testing

The project includes a comprehensive test suite with 110+ tests covering
all components:

    tests/
    ├── unit/                    # Component-level tests
    │   ├── test_sparql_handles.py
    │   ├── test_session_tracking.py
    │   ├── test_memory_store.py
    │   ├── test_bootstrap_strategies.py      # NEW: Bootstrap validation
    │   ├── test_memory_recipe_separation.py  # NEW: Architecture separation
    │   └── test_sense_structured.py          # NEW: Sense data validation
    ├── integration/             # Cross-component tests
    │   ├── test_dataset_memory.py
    │   ├── test_sparql_dataset.py
    │   ├── test_memory_closed_loop.py
    │   └── test_full_stack.py
    ├── live/                    # API-required tests
    │   └── test_memory_integration.py        # NEW: Memory-based architecture
    └── test_quick_e2e.py        # End-to-end validation

### Running Tests

``` bash
# Activate environment
source ~/uvws/.venv/bin/activate

# Run unit tests (no API calls)
pytest tests/unit/ -v

# Run integration tests (no API calls)
pytest tests/integration/ -v

# Run live tests (requires ANTHROPIC_API_KEY)
ANTHROPIC_API_KEY=sk-... pytest tests/live/ -v

# Run quick end-to-end test (with API calls)
python tests/test_quick_e2e.py

# Run notebook tests
nbdev_test
```

All tests pass, validating: - Core RLM loop with Claude API - Ontology
loading and exploration - **Structured sense data with URI grounding**
✅ - **Bootstrap general strategies (7 universal patterns)** ✅ -
**Memory-recipe separation validation** ✅ - **Four-layer context
injection** ✅ - Dataset memory persistence - SPARQL result handles -
Procedural memory closed loop - SHACL shape indexing - End-to-end
integration workflows

See `tests/README.md` for detailed test documentation.

## Testing

The project includes a comprehensive test suite with 100+ tests covering
all components:

    tests/
    ├── unit/                    # Component-level tests
    │   ├── test_sparql_handles.py
    │   ├── test_session_tracking.py
    │   └── test_memory_store.py
    ├── integration/             # Cross-component tests
    │   ├── test_dataset_memory.py
    │   ├── test_sparql_dataset.py
    │   ├── test_memory_closed_loop.py
    │   └── test_full_stack.py
    └── test_quick_e2e.py        # End-to-end validation

### Running Tests

``` bash
# Activate environment
source ~/uvws/.venv/bin/activate

# Run all tests (no API calls)
pytest tests/unit/ tests/integration/ -v

# Run quick end-to-end test (with API calls)
python tests/test_quick_e2e.py

# Run notebook tests
nbdev_test
```

All tests pass, validating: - Core RLM loop with Claude API - Ontology
loading and exploration - Dataset memory persistence - SPARQL result
handles - Procedural memory closed loop - SHACL shape indexing -
End-to-end integration workflows

See `tests/README.md` for detailed test documentation.

## Status

This is preliminary research code under active development. The current
implementation covers stages 1-5 of the trajectory:

- Stage 1: Core RLM loop with claudette backend ✅
- Stage 2: Bounded view primitives for progressive disclosure ✅
- Stage 3: SPARQL handles with work-bound query execution ✅
- Stage 4: SHACL shape indexing and query template retrieval ✅
- Stage 5: Ont-Sense improvements & ReasoningBank integration ✅
  - Structured sense data with 100% URI grounding
  - Four-layer context injection (sense, memory, recipes, base context)
  - Memory-based general strategies (bootstrap + learning)
  - Validation pipeline and comprehensive test suite
  - 83% iteration reduction on entity queries

Stage 6 (evaluation framework) is in progress with task-based eval
system in `evals/`.

The code is developed through exploratory programming in Jupyter
notebooks using nbdev. It targets integration with the
[Solveit](https://solveit.ai) platform but can run standalone.

## References

- Zhang, A., et al. (2025). [Recursive Language
  Models](https://github.com/alexzhang13/rlm). The reference
  implementation this project follows.
- Wang, B., et al. (2025). [ReasoningBank: Self-Evolving Procedural
  Knowledge for Adaptive
  Reasoning](https://arxiv.org/html/2509.25140v1). Procedural memory
  approach for learning from trajectories.
- Anthropic. (2025). [Building Effective
  Agents](https://www.anthropic.com/research/building-effective-agents).
  Context engineering patterns for agentic systems.
- Howard, J. & Gugger, S. [nbdev](https://nbdev.fast.ai/). Literate
  programming framework.
- Howard, J. [claudette](https://claudette.answer.ai/). Claude API
  wrapper used as the LLM backend.
