# dataset


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## Overview

This module implements RDF Dataset-based memory for RLM sessions using
named graphs:

- `onto/<name>` - Read-only ontology graphs
- `mem` - Mutable working memory for current session
- `prov` - Provenance/audit trail
- `work/<task_id>` - Scratch graphs for intermediate results

### Design Principles

- **Session-scoped**: `mem` is working memory for current RLM run
- **Handle-based access**: Model sees bounded views, never raw quads
- **Provenance tracking**: All `mem` changes recorded with
  timestamp/source/reason
- **Lazy indexing**: Caches invalidated on mutation

## Imports

## DatasetMeta

Meta-graph navigation for RDF Dataset with lazy-cached indexes.

------------------------------------------------------------------------

### DatasetMeta

``` python

def DatasetMeta(
    dataset:Dataset, name:str='ds', session_id:str=<factory>
)->None:

```

*Meta-graph navigation for RDF Dataset.*

Provides lazy-cached indexes and bounded views over named graphs.
Indexes are invalidated on any mutation to mem graph.

## Setup Function

## Memory Operations

------------------------------------------------------------------------

### mem_add

``` python

def mem_add(
    ds_meta:DatasetMeta, subject, predicate, obj, source:str='agent', reason:str=None
)->str:

```

*Add fact to mem with provenance tracking.*

Args: ds_meta: DatasetMeta containing the dataset subject: Subject URI
or literal predicate: Predicate URI obj: Object URI or literal source:
Source of this fact (default: ‘agent’) reason: Optional reason for
adding

Returns: Summary string

------------------------------------------------------------------------

### mem_query

``` python

def mem_query(
    ds_meta:DatasetMeta, sparql:str, limit:int=100
)->list:

```

*Query mem graph, return bounded results.*

Args: ds_meta: DatasetMeta containing the dataset sparql: SPARQL query
string limit: Maximum results to return

Returns: List of result rows (as dicts)

------------------------------------------------------------------------

### mem_retract

``` python

def mem_retract(
    ds_meta:DatasetMeta, subject:NoneType=None, predicate:NoneType=None, obj:NoneType=None, source:str='agent',
    reason:str=None
)->str:

```

*Remove triples with provenance.*

Args: ds_meta: DatasetMeta containing the dataset subject: Subject URI
or None (wildcard) predicate: Predicate URI or None (wildcard) obj:
Object URI/literal or None (wildcard) source: Source of this retraction
reason: Optional reason for removing

Returns: Summary string

------------------------------------------------------------------------

### mem_describe

``` python

def mem_describe(
    ds_meta:DatasetMeta, uri:str, limit:int=20
)->dict:

```

*Get bounded entity description from mem.*

Args: ds_meta: DatasetMeta containing the dataset uri: URI of entity to
describe limit: Maximum triples to include

Returns: Dict with ‘as_subject’ and ‘as_object’ triple lists

## Scratch Graph Operations

------------------------------------------------------------------------

### work_create

``` python

def work_create(
    ds_meta:DatasetMeta, task_id:str=None
)->tuple:

```

*Create a scratch graph for intermediate results.*

Args: ds_meta: DatasetMeta containing the dataset task_id: Task
identifier (default: auto-generated)

Returns: (graph_uri, graph) tuple

------------------------------------------------------------------------

### work_cleanup

``` python

def work_cleanup(
    ds_meta:DatasetMeta, task_id:str=None, all:bool=False
)->str:

```

*Remove scratch graph(s).*

Args: ds_meta: DatasetMeta containing the dataset task_id: Specific task
to clean up, or None all: If True, remove all work/\* graphs

Returns: Summary string

------------------------------------------------------------------------

### work_to_mem

``` python

def work_to_mem(
    ds_meta:DatasetMeta, task_id:str, source:str='work', reason:str=None
)->str:

```

*Promote triples from scratch graph to mem with provenance.*

Args: ds_meta: DatasetMeta containing the dataset task_id: Task
identifier for work graph source: Source label for provenance reason:
Optional reason for promotion

Returns: Summary string

## Snapshot Functions

------------------------------------------------------------------------

### snapshot_dataset

``` python

def snapshot_dataset(
    ds_meta:DatasetMeta, path:str=None, format:str='trig'
)->str:

```

*Serialize dataset to TriG/N-Quads for debugging.*

Args: ds_meta: DatasetMeta to snapshot path: Output path (default:
auto-generated with timestamp) format: ‘trig’ or ‘nquads’

Returns: Path to snapshot file

------------------------------------------------------------------------

### load_snapshot

``` python

def load_snapshot(
    path:str, ns:dict, name:str='ds'
)->str:

```

*Load dataset from TriG/N-Quads snapshot.*

Useful for debugging/replay. Note: The snapshot preserves the original
dataset name in graph URIs, so if you want to use the original name,
extract it from the graph URIs.

Args: path: Path to snapshot file ns: Namespace dict where Dataset will
be stored name: Variable name for the Dataset handle

Returns: Summary string

## Bounded View Functions

------------------------------------------------------------------------

### res_distinct

``` python

def res_distinct(
    result, column:str, limit:int=50
)->list:

```

*Get distinct values in a column.*

Args: result: ResultTable or list of dicts column: Column to get
distinct values from limit: Maximum distinct values to return

Returns: List of distinct values

------------------------------------------------------------------------

### res_group

``` python

def res_group(
    result, column:str, limit:int=20
)->list:

```

*Get counts grouped by column value.*

Args: result: ResultTable or list of dicts column: Column to group by
limit: Maximum groups to return

Returns: List of (value, count) tuples, sorted by count descending

------------------------------------------------------------------------

### res_where

``` python

def res_where(
    result, column:str, pattern:str=None, value:str=None, limit:int=100
)->list:

```

*Filter result rows by column value or regex pattern.*

Args: result: ResultTable or list of dicts column: Column name to filter
on pattern: Optional regex pattern to match value: Optional exact value
to match limit: Maximum matching rows to return (default: 100)

Returns: List of matching rows

------------------------------------------------------------------------

### res_head

``` python

def res_head(
    result, n:int=10
)->list:

```

*Get first N rows of a result set.*

Args: result: ResultTable, list of dicts, or list of tuples n: Number of
rows to return

Returns: List of rows (same format as input)

------------------------------------------------------------------------

### ResultTable

``` python

def ResultTable(
    rows:list, columns:list, query:str, total_rows:int
)->None:

```

*Wrapper for SPARQL query results with bounded view operations.*

## Result Table Views (Stage 2)

Bounded view operations over SPARQL query results enable iterative
exploration without overwhelming context:

- **res_head()**: Preview first N rows
- **res_where()**: Filter by column value or regex
- **res_group()**: Aggregate and count by column
- **res_distinct()**: Find unique values

These work with `ResultTable` wrapper or plain list-of-dicts from
`mem_query()`.

### Use Cases

- Previewing large result sets: `res_head(results, 10)`
- Finding specific entities:
  `res_where(results, 'name', pattern='Alice')`
- Understanding data distribution: `res_group(results, 'category')`
- Exploring unique values: `res_distinct(results, 'author')`

------------------------------------------------------------------------

### dataset_stats

``` python

def dataset_stats(
    ds_meta:DatasetMeta
)->str:

```

*Get dataset statistics summary.*

------------------------------------------------------------------------

### list_graphs

``` python

def list_graphs(
    ds_meta:DatasetMeta, pattern:str=None
)->list:

```

*List named graphs, optionally filtered.*

Args: ds_meta: DatasetMeta containing the dataset pattern: Optional
substring to filter graph URIs

Returns: List of (graph_uri, triple_count) tuples

------------------------------------------------------------------------

### graph_sample

``` python

def graph_sample(
    ds_meta:DatasetMeta, graph_uri:str, limit:int=10
)->list:

```

*Get sample triples from a graph.*

Args: ds_meta: DatasetMeta containing the dataset graph_uri: URI of
graph to sample limit: Maximum triples to return

Returns: List of (s, p, o) tuples as strings

## Ontology Integration

------------------------------------------------------------------------

### mount_ontology

``` python

def mount_ontology(
    ds_meta:DatasetMeta, ns:dict, path:str, ont_name:str, index_shacl:bool=True, index_queries:bool=True
)->str:

```

*Mount ontology into dataset as read-only onto/<name> graph.*

If index_shacl=True and SHACL content detected, also builds SHACLIndex
and stores in ns\[’{ont_name}\_shacl’\].

If index_queries=True and sh:SPARQLExecutable detected, also builds
QueryIndex and stores in ns\[’{ont_name}\_queries’\].

Args: ds_meta: DatasetMeta containing the dataset ns: Namespace dict
(for compatibility with setup_ontology_context) path: Path to ontology
file ont_name: Name for the ontology index_shacl: Whether to detect and
index SHACL shapes (default: True) index_queries: Whether to detect and
index query templates (default: True)

Returns: Summary string

------------------------------------------------------------------------

### setup_dataset_context

``` python

def setup_dataset_context(
    ns:dict, name:str='ds'
)->str:

```

*Initialize Dataset with mem/prov graphs, bind helper functions.*

Args: ns: Namespace dict where Dataset will be stored name: Variable
name for the Dataset handle

Returns: Summary string describing what was created

``` python
# Test result table views
test_ns = {}
setup_dataset_context(test_ns)
ds_meta = test_ns['ds_meta']

# Add test data
mem_add(ds_meta, 'http://ex.org/alice', 'http://ex.org/age', '30')
mem_add(ds_meta, 'http://ex.org/bob', 'http://ex.org/age', '25')
mem_add(ds_meta, 'http://ex.org/charlie', 'http://ex.org/age', '30')
mem_add(ds_meta, 'http://ex.org/alice', 'http://ex.org/city', 'Boston')
mem_add(ds_meta, 'http://ex.org/bob', 'http://ex.org/city', 'NYC')

# Query and get results as list
results = mem_query(ds_meta, 'SELECT ?s ?age WHERE { ?s <http://ex.org/age> ?age }')

# Test res_head
head = res_head(results, n=2)
assert len(head) == 2
print(f"✓ res_head works: {len(head)} rows")

# Test res_where with exact value
filtered = res_where(results, 'age', value='30')
assert len(filtered) == 2
print(f"✓ res_where (exact) works: {len(filtered)} rows with age=30")

# Test res_where with pattern
filtered_pattern = res_where(results, 's', pattern='alice')
assert len(filtered_pattern) == 1
print(f"✓ res_where (pattern) works: {len(filtered_pattern)} rows matching 'alice'")

# Test res_group
groups = res_group(results, 'age')
assert len(groups) == 2  # Two distinct ages
assert groups[0][1] == 2  # Age '30' appears twice
print(f"✓ res_group works: {groups}")

# Test res_distinct
distinct_ages = res_distinct(results, 'age')
assert len(distinct_ages) == 2
assert '25' in distinct_ages and '30' in distinct_ages
print(f"✓ res_distinct works: {distinct_ages}")

# Test ResultTable wrapper
result_table = ResultTable(
    rows=results,
    columns=['s', 'age'],
    query='SELECT ?s ?age WHERE { ?s <http://ex.org/age> ?age }',
    total_rows=len(results)
)
assert len(result_table) == 3
print(f"✓ ResultTable works: {result_table}")

# Test result table views work with ResultTable
head_from_table = res_head(result_table, n=2)
assert len(head_from_table) == 2
print(f"✓ res_head works with ResultTable")
```

    ✓ res_head works: 2 rows
    ✓ res_where (exact) works: 2 rows with age=30
    ✓ res_where (pattern) works: 1 rows matching 'alice'
    ✓ res_group works: [('30', 2), ('25', 1)]
    ✓ res_distinct works: ['25', '30']
    ✓ ResultTable works: ResultTable(3 rows, columns=['s', 'age'])
    ✓ res_head works with ResultTable

    DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
      ds_meta.prov.add((event_uri, RLM_PROV.timestamp, Literal(datetime.utcnow().isoformat() + 'Z', datatype=XSD.dateTime)))

``` python
# Test dataset creation
test_ns = {}
result = setup_dataset_context(test_ns, name='test_ds')
assert 'test_ds' in test_ns
assert 'test_ds_meta' in test_ns
assert len(test_ns['test_ds_meta'].session_id) == 8
print("✓ Dataset creation works")
print(result)
```

    ✓ Dataset creation works
    Created dataset 'test_ds' with session_id=cb576fb5

``` python
# Test mem_add with provenance
test_ns = {}
setup_dataset_context(test_ns)
ds_meta = test_ns['ds_meta']

result = mem_add(ds_meta, 'http://ex.org/alice', 'http://ex.org/knows', 'http://ex.org/bob', 
                 source='test', reason='Testing')
assert len(ds_meta.mem) == 1
assert len(ds_meta.prov) > 0
print("✓ mem_add works")
print(result)
```

    ✓ mem_add works
    Added triple to mem: (http://ex.org/alice, http://ex.org/knows, http://ex.org/bob)

    DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
      ds_meta.prov.add((event_uri, RLM_PROV.timestamp, Literal(datetime.utcnow().isoformat() + 'Z', datatype=XSD.dateTime)))

``` python
# Test mem_query
test_ns = {}
setup_dataset_context(test_ns)
ds_meta = test_ns['ds_meta']

mem_add(ds_meta, 'http://ex.org/alice', 'http://ex.org/age', '30')
mem_add(ds_meta, 'http://ex.org/bob', 'http://ex.org/age', '25')

results = mem_query(ds_meta, 'SELECT ?s ?age WHERE { ?s <http://ex.org/age> ?age }')
assert len(results) == 2
assert all('s' in r and 'age' in r for r in results)
print("✓ mem_query works")
print(results)
```

    ✓ mem_query works
    [{'s': 'http://ex.org/alice', 'age': '30'}, {'s': 'http://ex.org/bob', 'age': '25'}]

    DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
      ds_meta.prov.add((event_uri, RLM_PROV.timestamp, Literal(datetime.utcnow().isoformat() + 'Z', datatype=XSD.dateTime)))

``` python
# Test mem_retract
test_ns = {}
setup_dataset_context(test_ns)
ds_meta = test_ns['ds_meta']

mem_add(ds_meta, 'http://ex.org/alice', 'http://ex.org/age', '30')
assert len(ds_meta.mem) == 1

result = mem_retract(ds_meta, predicate='http://ex.org/age', source='test', reason='Correction')
assert len(ds_meta.mem) == 0
assert 'Removed 1 triples' in result
print("✓ mem_retract works")
print(result)
```

    ✓ mem_retract works
    Removed 1 triples from mem

    DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
      ds_meta.prov.add((event_uri, RLM_PROV.timestamp, Literal(datetime.utcnow().isoformat() + 'Z', datatype=XSD.dateTime)))
    <ipython-input-1-e3f77e94d507>:36: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
      ds_meta.prov.add((event_uri, RLM_PROV.timestamp, Literal(datetime.utcnow().isoformat() + 'Z', datatype=XSD.dateTime)))

``` python
# Test mem_describe
test_ns = {}
setup_dataset_context(test_ns)
ds_meta = test_ns['ds_meta']

mem_add(ds_meta, 'http://ex.org/alice', 'http://ex.org/knows', 'http://ex.org/bob')
mem_add(ds_meta, 'http://ex.org/alice', 'http://ex.org/age', '30')

desc = mem_describe(ds_meta, 'http://ex.org/alice')
assert 'as_subject' in desc
assert 'as_object' in desc
assert len(desc['as_subject']) == 2
print("✓ mem_describe works")
print(desc)
```

    ✓ mem_describe works
    {'uri': 'http://ex.org/alice', 'as_subject': [('http://ex.org/alice', 'http://ex.org/knows', 'http://ex.org/bob'), ('http://ex.org/alice', 'http://ex.org/age', '30')], 'as_object': []}

    DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
      ds_meta.prov.add((event_uri, RLM_PROV.timestamp, Literal(datetime.utcnow().isoformat() + 'Z', datatype=XSD.dateTime)))

``` python
# Test index invalidation
test_ns = {}
setup_dataset_context(test_ns)
ds_meta = test_ns['ds_meta']

# Access cached property
initial_version = ds_meta._version
_ = ds_meta.graph_stats

# Mutate
mem_add(ds_meta, 'http://ex.org/alice', 'http://ex.org/age', '30')

# Check version incremented
assert ds_meta._version > initial_version
print("✓ Index invalidation works")
```

    ✓ Index invalidation works

    DeprecationWarning: Dataset.contexts is deprecated, use Dataset.graphs instead.
      for ctx in self.dataset.contexts():
    <ipython-input-1-3a8dafc08295>:33: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
      ds_meta.prov.add((event_uri, RLM_PROV.timestamp, Literal(datetime.utcnow().isoformat() + 'Z', datatype=XSD.dateTime)))

``` python
# Test work graph lifecycle
test_ns = {}
setup_dataset_context(test_ns)
ds_meta = test_ns['ds_meta']

# Create work graph
uri, graph = work_create(ds_meta, task_id='test_task')
assert 'work/test_task' in uri
assert len(ds_meta.work_graphs) == 1

# Add some triples to work graph
graph.add((URIRef('http://ex.org/alice'), URIRef('http://ex.org/temp'), Literal('value')))
assert len(graph) == 1

# Promote to mem
result = work_to_mem(ds_meta, 'test_task', reason='Test promotion')
assert len(ds_meta.mem) == 1
assert 'Promoted 1 triples' in result

# Cleanup
result = work_cleanup(ds_meta, task_id='test_task')
assert 'Removed 1 work' in result
assert len(ds_meta.work_graphs) == 0

print("✓ Work graph lifecycle works")
```

    ✓ Work graph lifecycle works

    DeprecationWarning: Dataset.contexts is deprecated, use Dataset.graphs instead.
      return [str(ctx.identifier) for ctx in self.dataset.contexts()
    <ipython-input-1-661dda18a793>:32: DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
      ds_meta.prov.add((event_uri, RLM_PROV.timestamp, Literal(datetime.utcnow().isoformat() + 'Z', datatype=XSD.dateTime)))

``` python
# Test snapshot
import tempfile
import os

test_ns = {}
setup_dataset_context(test_ns)
ds_meta = test_ns['ds_meta']

# Add some data
mem_add(ds_meta, 'http://ex.org/alice', 'http://ex.org/age', '30')

# Take snapshot
with tempfile.NamedTemporaryFile(mode='w', suffix='.trig', delete=False) as f:
    snapshot_path = f.name

result = snapshot_dataset(ds_meta, path=snapshot_path)
assert os.path.exists(snapshot_path)
assert 'Snapshot saved' in result

# Load snapshot (let it auto-detect the name 'ds' from graph URIs)
test_ns2 = {}
result = load_snapshot(snapshot_path, test_ns2, name='restored')
assert 'restored' in test_ns2
assert 'restored_meta' in test_ns2
# Should auto-detect original name 'ds' and use it for URIs
assert len(test_ns2['restored_meta'].mem) == 1

# Also test loading with same name
test_ns3 = {}
result = load_snapshot(snapshot_path, test_ns3, name='ds')
assert 'ds' in test_ns3
assert 'ds_meta' in test_ns3
assert len(test_ns3['ds_meta'].mem) == 1

# Cleanup
os.unlink(snapshot_path)

print("✓ Snapshot roundtrip works")
```

    ✓ Snapshot roundtrip works

``` python
# Test bounded view functions
test_ns = {}
setup_dataset_context(test_ns)
ds_meta = test_ns['ds_meta']

# Add some data
mem_add(ds_meta, 'http://ex.org/alice', 'http://ex.org/age', '30')
work_create(ds_meta, 'task1')
work_create(ds_meta, 'task2')

# Test dataset_stats
stats = dataset_stats(ds_meta)
assert 'mem: 1 triples' in stats
assert 'work graphs: 2' in stats

# Test list_graphs
graphs = list_graphs(ds_meta)
assert len(graphs) >= 4  # mem, prov, work/task1, work/task2

# Test list_graphs with pattern
work_graphs = list_graphs(ds_meta, pattern='work/')
assert len(work_graphs) == 2

# Test graph_sample
mem_uri = f'urn:rlm:{ds_meta.name}:mem'
sample = graph_sample(ds_meta, mem_uri)
assert len(sample) == 1

print("✓ Bounded view functions work")
```

    ✓ Bounded view functions work

    DeprecationWarning: datetime.datetime.utcnow() is deprecated and scheduled for removal in a future version. Use timezone-aware objects to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
      ds_meta.prov.add((event_uri, RLM_PROV.timestamp, Literal(datetime.utcnow().isoformat() + 'Z', datatype=XSD.dateTime)))
    <ipython-input-1-338468221890>:54: DeprecationWarning: Dataset.contexts is deprecated, use Dataset.graphs instead.
      return [str(ctx.identifier) for ctx in self.dataset.contexts()
    <ipython-input-1-338468221890>:44: DeprecationWarning: Dataset.contexts is deprecated, use Dataset.graphs instead.
      for ctx in self.dataset.contexts():
    <ipython-input-1-0d2d1cee68f5>:13: DeprecationWarning: Dataset.contexts is deprecated, use Dataset.graphs instead.
      for ctx in ds_meta.dataset.contexts():

## Tests

## Usage Examples

``` python
# Basic usage in RLM context
ns = {}
setup_dataset_context(ns)

# RLM can now use: mem_add, mem_query, mem_describe, etc.
ns['mem_add']('http://ex.org/alice', 'http://ex.org/knows', 'http://ex.org/bob')
results = ns['mem_query']('SELECT ?s ?p ?o WHERE { ?s ?p ?o }')
print(results)
```

``` python
# Integration with ontology
from rlm.ontology import setup_ontology_context

ns = {}
setup_dataset_context(ns)
setup_ontology_context('ontology/prov.ttl', ns, name='prov')

# Mount ontology into dataset
ns['mount_ontology']('ontology/prov.ttl', 'prov')

# Now ontology is in dataset as onto/prov graph
graphs = ns['list_graphs']()
print(graphs)
```
