# SHACL Examples and Shape Indexing


<!-- WARNING: THIS FILE WAS AUTOGENERATED! DO NOT EDIT! -->

## Query Template Detection

Detect sh:SPARQLExecutable content in RDF graphs.

------------------------------------------------------------------------

### detect_sparql_executables

``` python

def detect_sparql_executables(
    graph:Graph
)->dict:

```

*Detect sh:SPARQLExecutable content in a graph.*

Args: graph: RDF graph to analyze

Returns: Dict with: has_executables: True if any SPARQLExecutable found
select_count: Count of sh:SPARQLSelectExecutable instances
construct_count: Count of sh:SPARQLConstructExecutable instances
ask_count: Count of sh:SPARQLAskExecutable instances total_count: Total
query count

------------------------------------------------------------------------

### QueryIndex

``` python

def QueryIndex(
    queries:List=<factory>, comments:Dict=<factory>, keywords:Dict=<factory>, endpoints:Dict=<factory>,
    query_text:Dict=<factory>, query_type:Dict=<factory>, source_file:Dict=<factory>
)->None:

```

*Index of sh:SPARQLExecutable query templates for retrieval.*

Attributes: queries: List of query URIs comments: Mapping from query URI
to rdfs:comment description keywords: Inverted index from keyword to
query URIs endpoints: Mapping from query URI to target endpoint URLs
query_text: Mapping from query URI to sh:select/sh:construct text
query_type: Mapping from query URI to type (‘select’, ‘construct’,
‘ask’) source_file: Mapping from query URI to source file path

## Query Keyword Extraction

Extract searchable keywords from query templates.

------------------------------------------------------------------------

### extract_query_keywords

``` python

def extract_query_keywords(
    graph:Graph, query_uri:URIRef, comment:str
)->List:

```

*Extract keywords from schema:keywords and rdfs:comment.*

Sources: - schema:keywords (explicit tags) - rdfs:comment (word
extraction) - Query URI local name

Args: graph: RDF graph containing the query query_uri: Query URI
comment: rdfs:comment text

Returns: List of lowercase keywords

## Build Query Index

Build searchable index from sh:SPARQLExecutable templates.

------------------------------------------------------------------------

### build_query_index

``` python

def build_query_index(
    graph:Graph, source_path:str=None
)->QueryIndex:

```

*Build searchable index from sh:SPARQLExecutable templates.*

Extracts: sh:select, sh:construct, sh:ask, rdfs:comment,
schema:keywords, schema:target

Args: graph: RDF graph containing query templates source_path: Optional
source file path for tracking

Returns: QueryIndex with indexed queries

## Query Bounded Views

Progressive disclosure functions for exploring query templates.

------------------------------------------------------------------------

### search_queries

``` python

def search_queries(
    index:QueryIndex, keyword:str, limit:int=5
)->list:

```

*Find query templates matching keyword.*

Args: index: QueryIndex to search keyword: Search term limit: Maximum
number of results

Returns: List of dicts with: uri, comment, endpoints, matched_keyword

------------------------------------------------------------------------

### describe_query

``` python

def describe_query(
    index:QueryIndex, query_uri:str
)->dict:

```

*Get bounded description of a query template.*

Args: index: QueryIndex to query query_uri: URI of query

Returns: Dict with: uri, comment, endpoints, query_type, keywords,
query_preview (200 chars)

------------------------------------------------------------------------

### get_query_text

``` python

def get_query_text(
    index:QueryIndex, query_uri:str
)->str:

```

*Get full SPARQL query text for execution.*

Args: index: QueryIndex to query query_uri: URI of query

Returns: Full SPARQL query text

------------------------------------------------------------------------

### load_query_examples

``` python

def load_query_examples(
    path:str, ns:dict, name:str='queries'
)->str:

```

*Load SPARQL example files from directory into QueryIndex.*

Recursively loads all .ttl files and builds combined index.

Args: path: Directory path containing .ttl example files ns: Namespace
dict for storing the index name: Variable name for the index in ns

Returns: Status message

## SHACLIndex Dataclass

The `SHACLIndex` holds indexed SHACL shapes for efficient retrieval.

------------------------------------------------------------------------

### SHACLIndex

``` python

def SHACLIndex(
    shapes:List=<factory>, targets:Dict=<factory>, properties:Dict=<factory>, keywords:Dict=<factory>,
    paradigm:str='unknown'
)->None:

```

*Index of SHACL shapes for retrieval.*

Attributes: shapes: List of shape URIs targets: Mapping from shape URI
to target class URIs properties: Mapping from shape URI to property
constraint dicts keywords: Inverted index from keyword to shape URIs
paradigm: SHACL usage paradigm (‘validation’, ‘shacl-first’, ‘mixed’)

## SHACL Detection

Detect whether a graph contains SHACL shapes and determine the usage
paradigm.

------------------------------------------------------------------------

### detect_shacl

``` python

def detect_shacl(
    graph:Graph
)->dict:

```

*Detect SHACL content in a graph.*

Args: graph: RDF graph to analyze

Returns: Dict with: has_shacl: True if any SHACL patterns found
node_shapes: Count of sh:NodeShape instances property_shapes: Count of
sh:PropertyShape instances paradigm: ‘validation’, ‘shacl-first’, or
‘mixed’

## Keyword Extraction

Extract searchable keywords from shape metadata.

------------------------------------------------------------------------

### extract_keywords

``` python

def extract_keywords(
    graph:Graph, shape:URIRef, target_classes:List, props:List
)->List:

```

*Extract searchable keywords from a shape.*

Args: graph: RDF graph containing the shape shape: Shape URI
target_classes: Target class URIs props: Property constraint dicts

Returns: List of lowercase keywords

## Shape Index Building

Build a searchable index from SHACL shapes in a graph.

------------------------------------------------------------------------

### build_shacl_index

``` python

def build_shacl_index(
    graph:Graph
)->SHACLIndex:

```

*Build searchable index from SHACL shapes in graph.*

Args: graph: RDF graph containing SHACL shapes

Returns: SHACLIndex with indexed shapes

## Bounded View Functions

Progressive disclosure primitives for exploring SHACL shapes.

------------------------------------------------------------------------

### describe_shape

``` python

def describe_shape(
    index:SHACLIndex, shape_uri:str, limit:int=10
)->dict:

```

*Get bounded description of a SHACL shape.*

Args: index: SHACL index to query shape_uri: URI of shape to describe
limit: Maximum number of properties to return

Returns: Dict with: uri: Shape URI targets: List of target class URIs
properties: First `limit` property constraints property_count: Total
property count truncated: True if property list was truncated

------------------------------------------------------------------------

### search_shapes

``` python

def search_shapes(
    index:SHACLIndex, keyword:str, limit:int=5
)->list:

```

*Find shapes matching keyword.*

Args: index: SHACL index to search keyword: Search term limit: Maximum
number of results

Returns: List of dicts with: uri: Shape URI targets: Target class URIs
matched_keyword: The keyword that matched

------------------------------------------------------------------------

### shape_constraints

``` python

def shape_constraints(
    index:SHACLIndex, shape_uri:str
)->str:

```

*Get human-readable property constraints for a shape.*

Args: index: SHACL index to query shape_uri: URI of shape

Returns: Formatted string with property constraints

## SHACL Advanced Features Evaluation

This project uses SHACL-AF in two ways:

1.  **sh:SPARQLRule** - For inferencing/materialization rules (RDFS/OWL
    closure)
2.  **sh:SPARQLExecutable** - For reusable query templates with examples

### sh:SPARQLRule vs sh:SPARQLExecutable

<table>
<colgroup>
<col style="width: 20%" />
<col style="width: 33%" />
<col style="width: 46%" />
</colgroup>
<thead>
<tr>
<th>Feature</th>
<th>sh:SPARQLRule</th>
<th>sh:SPARQLExecutable</th>
</tr>
</thead>
<tbody>
<tr>
<td>Purpose</td>
<td>Inference/validation</td>
<td>Query templates</td>
</tr>
<tr>
<td>Attached to</td>
<td>sh:NodeShape via sh:rule</td>
<td>Standalone resource</td>
</tr>
<tr>
<td>Query types</td>
<td>CONSTRUCT (inference)</td>
<td>SELECT/CONSTRUCT/ASK</td>
</tr>
<tr>
<td>Example files</td>
<td>rdfsplus.rule.ttl (6 rules)</td>
<td>uniprot/examples (1,228+ queries)</td>
</tr>
<tr>
<td></td>
<td>owlrl-shacl.ttl (13 rules)</td>
<td></td>
</tr>
<tr>
<td></td>
<td>datacube.shapes.ttl (1 rule)</td>
<td></td>
</tr>
</tbody>
</table>

### Existing SHACL-AF Content

**rdfsplus.rule.ttl** - Basic RDFS inference: - Subclass transitivity -
Type propagation via subclass - Subproperty transitivity - Property
inclusion - Domain/range typing

**owlrl-shacl.ttl** - OWL RL subset inference: - All RDFS rules plus -
Equivalent class expansion - Inverse properties - Transitive
properties - Symmetric properties - Functional properties → sameAs
(optional) - Property chain axioms (length-2)

**datacube.shapes.ttl** - RDF Data Cube validation: - sh:SPARQLRule for
copying component properties - sh:sparql constraints for integrity
checking - Validates observations against data structure definitions

Sources: - [SHACL Advanced Features](https://www.w3.org/TR/shacl-af/) -
[W3C SHACL GitHub -
datacube.shapes.ttl](https://github.com/w3c/shacl/blob/main/shapes/datacube.shapes.ttl) -
[SIB Swiss SPARQL Examples
Utils](https://github.com/sib-swiss/sparql-examples-utils)

## QueryIndex Dataclass

The `QueryIndex` holds indexed sh:SPARQLExecutable query templates for
retrieval and reuse.

## Tests

Basic tests for SHACL detection and indexing.

``` python
# Test detect_shacl with empty graph
g = Graph()
result = detect_shacl(g)
assert result['has_shacl'] == False
assert result['paradigm'] == 'none'
print("✓ Empty graph detection works")
```

    ✓ Empty graph detection works

``` python
# Test detect_shacl with a simple NodeShape
g = Graph()
EX = Namespace("http://example.org/")
g.add((EX.PersonShape, RDF.type, SH.NodeShape))
g.add((EX.PersonShape, SH.targetClass, EX.Person))
result = detect_shacl(g)
assert result['has_shacl'] == True
assert result['node_shapes'] == 1
assert result['paradigm'] == 'validation'
print("✓ NodeShape detection works")
```

    ✓ NodeShape detection works

``` python
# Test build_shacl_index
g = Graph()
EX = Namespace("http://example.org/")
g.add((EX.PersonShape, RDF.type, SH.NodeShape))
g.add((EX.PersonShape, SH.targetClass, EX.Person))
g.add((EX.PersonShape, RDFS.label, Literal("Person Shape")))

index = build_shacl_index(g)
assert len(index.shapes) == 1
assert str(EX.PersonShape) in index.shapes
assert str(EX.Person) in index.targets[str(EX.PersonShape)]
assert 'person' in index.keywords or 'personshape' in index.keywords
print("✓ Index building works")
print(f"  {index.summary()}")
```

    ✓ Index building works
      SHACLIndex: 1 shapes, 3 keywords, paradigm=validation

``` python
# Test search_shapes
results = search_shapes(index, 'person')
assert len(results) >= 1
assert str(EX.PersonShape) == results[0]['uri']
print("✓ Shape search works")
print(f"  Found {len(results)} shapes for 'person'")
```

    ✓ Shape search works
      Found 1 shapes for 'person'

``` python
# Test describe_shape
desc = describe_shape(index, str(EX.PersonShape))
assert desc['uri'] == str(EX.PersonShape)
assert str(EX.Person) in desc['targets']
print("✓ Shape description works")
print(f"  Targets: {desc['targets']}")
print(f"  Property count: {desc['property_count']}")
```

    ✓ Shape description works
      Targets: ['http://example.org/Person']
      Property count: 0

``` python
# Test with DCAT-AP shapes
from pathlib import Path
dcat_path = Path('../ontology/dcat-ap/dcat-ap-SHACL.ttl')
if dcat_path.exists():
    g_dcat = Graph()
    g_dcat.parse(dcat_path)
    
    detection = detect_shacl(g_dcat)
    print(f"\n✓ DCAT-AP detection: {detection['node_shapes']} node shapes, paradigm={detection['paradigm']}")
    
    index_dcat = build_shacl_index(g_dcat)
    print(f"  {index_dcat.summary()}")
    
    # Search for Dataset shape
    dataset_shapes = search_shapes(index_dcat, 'dataset', limit=3)
    print(f"  Found {len(dataset_shapes)} shapes matching 'dataset':")
    for s in dataset_shapes[:3]:
        shape_name = s['uri'].split('/')[-1].split('#')[-1]
        print(f"    - {shape_name}")
else:
    print("\n(DCAT-AP shapes not found, skipping test)")
```


    (DCAT-AP shapes not found, skipping test)

## Demonstration: Query Template Workflow

This demonstrates the ‘retrieve example → adapt → run’ workflow for
discovering how to query unfamiliar SPARQL endpoints.

``` python
# Load UniProt example queries from neXtProt
from pathlib import Path

nxp_path = Path('../ontology/uniprot/examples/neXtProt')
if nxp_path.exists():
    ns = {}
    result = load_query_examples(str(nxp_path), ns, 'nxq')
    print(result)
    print(f"\nIndex summary: {ns['nxq'].summary()}")
else:
    print(f"neXtProt examples not found at {nxp_path}")
```

    neXtProt examples not found at ../ontology/uniprot/examples/neXtProt

``` python
# Search for protein-related queries
if 'nxq' in ns:
    results = search_queries(ns['nxq'], 'protein', limit=3)
    print(f"Found {len(results)} protein-related queries:\n")
    for r in results:
        uri_short = r['uri'].split('/')[-1]
        print(f"  {uri_short}")
        print(f"    Comment: {r['comment']}")
        print(f"    Keyword: {r['matched_keyword']}\n")
```

    ---------------------------------------------------------------------------
    NameError                                 Traceback (most recent call last)
    Cell In[1], line 2
          1 # Search for protein-related queries
    ----> 2 if 'nxq' in ns:
          3     results = search_queries(ns['nxq'], 'protein', limit=3)
          4     print(f"Found {len(results)} protein-related queries:\n")

    NameError: name 'ns' is not defined

    NameError: name 'ns' is not defined
    Traceback (most recent call last):

      File "/Users/cvardema/uvws/.venv/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3701, in run_code
        exec(code_obj, self.user_global_ns, self.user_ns)

      File "<ipython-input-1-dc4b1e07ccd1>", line 2, in <module>
        if 'nxq' in ns:
                    ^^

    NameError: name 'ns' is not defined

``` python
# Describe a specific query
if 'nxq' in ns and results:
    query_uri = results[0]['uri']
    desc = describe_query(ns['nxq'], query_uri)
    
    print(f"Query: {query_uri.split('/')[-1]}")
    print(f"Type: {desc['query_type']}")
    print(f"Comment: {desc['comment'][:150]}...")
    if desc['endpoints']:
        print(f"Endpoint: {desc['endpoints'][0]}")
    print(f"\nKeywords: {', '.join(desc['keywords'][:5])}")
    print(f"\nQuery preview:\n{desc['query_preview']}")
```

    ---------------------------------------------------------------------------
    NameError                                 Traceback (most recent call last)
    Cell In[1], line 2
          1 # Describe a specific query
    ----> 2 if 'nxq' in ns and results:
          3     query_uri = results[0]['uri']
          4     desc = describe_query(ns['nxq'], query_uri)

    NameError: name 'ns' is not defined

    NameError: name 'ns' is not defined
    Traceback (most recent call last):

      File "/Users/cvardema/uvws/.venv/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3701, in run_code
        exec(code_obj, self.user_global_ns, self.user_ns)

      File "<ipython-input-1-858a9fb03277>", line 2, in <module>
        if 'nxq' in ns and results:
                    ^^

    NameError: name 'ns' is not defined

``` python
# Get full query text for execution
if 'nxq' in ns and results:
    query_uri = results[0]['uri']
    full_query = get_query_text(ns['nxq'], query_uri)
    
    print(f"Full query ({len(full_query)} chars):\n")
    print(full_query[:500])
    if len(full_query) > 500:
        print("\n... (truncated for display)")
    
    print("\n# This query could now be adapted and executed:")
    print("# sparql_query(full_query, endpoint=desc['endpoints'][0], name='results', ns=ns)")
```

    ---------------------------------------------------------------------------
    NameError                                 Traceback (most recent call last)
    Cell In[1], line 2
          1 # Get full query text for execution
    ----> 2 if 'nxq' in ns and results:
          3     query_uri = results[0]['uri']
          4     full_query = get_query_text(ns['nxq'], query_uri)

    NameError: name 'ns' is not defined

    NameError: name 'ns' is not defined
    Traceback (most recent call last):

      File "/Users/cvardema/uvws/.venv/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3701, in run_code
        exec(code_obj, self.user_global_ns, self.user_ns)

      File "<ipython-input-1-54dffe882117>", line 2, in <module>
        if 'nxq' in ns and results:
                    ^^

    NameError: name 'ns' is not defined

``` python
# Search for PTM-related queries (phosphorylation)
if 'nxq' in ns:
    ptm_results = search_queries(ns['nxq'], 'phosphorylation', limit=5)
    print(f"Found {len(ptm_results)} phosphorylation-related queries:\n")
    for r in ptm_results:
        uri_short = r['uri'].split('/')[-1]
        comment_short = r['comment'][:80] + '...' if len(r['comment']) > 80 else r['comment']
        print(f"  • {uri_short}: {comment_short}")
```

    ---------------------------------------------------------------------------
    NameError                                 Traceback (most recent call last)
    Cell In[1], line 2
          1 # Search for PTM-related queries (phosphorylation)
    ----> 2 if 'nxq' in ns:
          3     ptm_results = search_queries(ns['nxq'], 'phosphorylation', limit=5)
          4     print(f"Found {len(ptm_results)} phosphorylation-related queries:\n")

    NameError: name 'ns' is not defined

    NameError: name 'ns' is not defined
    Traceback (most recent call last):

      File "/Users/cvardema/uvws/.venv/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3701, in run_code
        exec(code_obj, self.user_global_ns, self.user_ns)

      File "<ipython-input-1-3132b14d7a6b>", line 2, in <module>
        if 'nxq' in ns:
                    ^^

    NameError: name 'ns' is not defined
