SHACL Examples and Shape Indexing

Enables ‘retrieve example → adapt → run → inspect’ workflow for discovering how to query unfamiliar datasets.

Query Template Detection

Detect sh:SPARQLExecutable content in RDF graphs.


detect_sparql_executables


def detect_sparql_executables(
    graph:Graph
)->dict:

Detect sh:SPARQLExecutable content in a graph.

Args: graph: RDF graph to analyze

Returns: Dict with: has_executables: True if any SPARQLExecutable found select_count: Count of sh:SPARQLSelectExecutable instances construct_count: Count of sh:SPARQLConstructExecutable instances ask_count: Count of sh:SPARQLAskExecutable instances total_count: Total query count


QueryIndex


def QueryIndex(
    queries:List=<factory>, comments:Dict=<factory>, keywords:Dict=<factory>, endpoints:Dict=<factory>,
    query_text:Dict=<factory>, query_type:Dict=<factory>, source_file:Dict=<factory>
)->None:

Index of sh:SPARQLExecutable query templates for retrieval.

Attributes: queries: List of query URIs comments: Mapping from query URI to rdfs:comment description keywords: Inverted index from keyword to query URIs endpoints: Mapping from query URI to target endpoint URLs query_text: Mapping from query URI to sh:select/sh:construct text query_type: Mapping from query URI to type (‘select’, ‘construct’, ‘ask’) source_file: Mapping from query URI to source file path

Query Keyword Extraction

Extract searchable keywords from query templates.


extract_query_keywords


def extract_query_keywords(
    graph:Graph, query_uri:URIRef, comment:str
)->List:

Extract keywords from schema:keywords and rdfs:comment.

Sources: - schema:keywords (explicit tags) - rdfs:comment (word extraction) - Query URI local name

Args: graph: RDF graph containing the query query_uri: Query URI comment: rdfs:comment text

Returns: List of lowercase keywords

Build Query Index

Build searchable index from sh:SPARQLExecutable templates.


build_query_index


def build_query_index(
    graph:Graph, source_path:str=None
)->QueryIndex:

Build searchable index from sh:SPARQLExecutable templates.

Extracts: sh:select, sh:construct, sh:ask, rdfs:comment, schema:keywords, schema:target

Args: graph: RDF graph containing query templates source_path: Optional source file path for tracking

Returns: QueryIndex with indexed queries

Query Bounded Views

Progressive disclosure functions for exploring query templates.


search_queries


def search_queries(
    index:QueryIndex, keyword:str, limit:int=5
)->list:

Find query templates matching keyword.

Args: index: QueryIndex to search keyword: Search term limit: Maximum number of results

Returns: List of dicts with: uri, comment, endpoints, matched_keyword


describe_query


def describe_query(
    index:QueryIndex, query_uri:str
)->dict:

Get bounded description of a query template.

Args: index: QueryIndex to query query_uri: URI of query

Returns: Dict with: uri, comment, endpoints, query_type, keywords, query_preview (200 chars)


get_query_text


def get_query_text(
    index:QueryIndex, query_uri:str
)->str:

Get full SPARQL query text for execution.

Args: index: QueryIndex to query query_uri: URI of query

Returns: Full SPARQL query text


load_query_examples


def load_query_examples(
    path:str, ns:dict, name:str='queries'
)->str:

Load SPARQL example files from directory into QueryIndex.

Recursively loads all .ttl files and builds combined index.

Args: path: Directory path containing .ttl example files ns: Namespace dict for storing the index name: Variable name for the index in ns

Returns: Status message

SHACLIndex Dataclass

The SHACLIndex holds indexed SHACL shapes for efficient retrieval.


SHACLIndex


def SHACLIndex(
    shapes:List=<factory>, targets:Dict=<factory>, properties:Dict=<factory>, keywords:Dict=<factory>,
    paradigm:str='unknown'
)->None:

Index of SHACL shapes for retrieval.

Attributes: shapes: List of shape URIs targets: Mapping from shape URI to target class URIs properties: Mapping from shape URI to property constraint dicts keywords: Inverted index from keyword to shape URIs paradigm: SHACL usage paradigm (‘validation’, ‘shacl-first’, ‘mixed’)

SHACL Detection

Detect whether a graph contains SHACL shapes and determine the usage paradigm.


detect_shacl


def detect_shacl(
    graph:Graph
)->dict:

Detect SHACL content in a graph.

Args: graph: RDF graph to analyze

Returns: Dict with: has_shacl: True if any SHACL patterns found node_shapes: Count of sh:NodeShape instances property_shapes: Count of sh:PropertyShape instances paradigm: ‘validation’, ‘shacl-first’, or ‘mixed’

Keyword Extraction

Extract searchable keywords from shape metadata.


extract_keywords


def extract_keywords(
    graph:Graph, shape:URIRef, target_classes:List, props:List
)->List:

Extract searchable keywords from a shape.

Args: graph: RDF graph containing the shape shape: Shape URI target_classes: Target class URIs props: Property constraint dicts

Returns: List of lowercase keywords

Shape Index Building

Build a searchable index from SHACL shapes in a graph.


build_shacl_index


def build_shacl_index(
    graph:Graph
)->SHACLIndex:

Build searchable index from SHACL shapes in graph.

Args: graph: RDF graph containing SHACL shapes

Returns: SHACLIndex with indexed shapes

Bounded View Functions

Progressive disclosure primitives for exploring SHACL shapes.


describe_shape


def describe_shape(
    index:SHACLIndex, shape_uri:str, limit:int=10
)->dict:

Get bounded description of a SHACL shape.

Args: index: SHACL index to query shape_uri: URI of shape to describe limit: Maximum number of properties to return

Returns: Dict with: uri: Shape URI targets: List of target class URIs properties: First limit property constraints property_count: Total property count truncated: True if property list was truncated


search_shapes


def search_shapes(
    index:SHACLIndex, keyword:str, limit:int=5
)->list:

Find shapes matching keyword.

Args: index: SHACL index to search keyword: Search term limit: Maximum number of results

Returns: List of dicts with: uri: Shape URI targets: Target class URIs matched_keyword: The keyword that matched


shape_constraints


def shape_constraints(
    index:SHACLIndex, shape_uri:str
)->str:

Get human-readable property constraints for a shape.

Args: index: SHACL index to query shape_uri: URI of shape

Returns: Formatted string with property constraints

SHACL Advanced Features Evaluation

This project uses SHACL-AF in two ways:

  1. sh:SPARQLRule - For inferencing/materialization rules (RDFS/OWL closure)
  2. sh:SPARQLExecutable - For reusable query templates with examples

sh:SPARQLRule vs sh:SPARQLExecutable

Feature sh:SPARQLRule sh:SPARQLExecutable
Purpose Inference/validation Query templates
Attached to sh:NodeShape via sh:rule Standalone resource
Query types CONSTRUCT (inference) SELECT/CONSTRUCT/ASK
Example files rdfsplus.rule.ttl (6 rules) uniprot/examples (1,228+ queries)
owlrl-shacl.ttl (13 rules)
datacube.shapes.ttl (1 rule)

Existing SHACL-AF Content

rdfsplus.rule.ttl - Basic RDFS inference: - Subclass transitivity - Type propagation via subclass - Subproperty transitivity - Property inclusion - Domain/range typing

owlrl-shacl.ttl - OWL RL subset inference: - All RDFS rules plus - Equivalent class expansion - Inverse properties - Transitive properties - Symmetric properties - Functional properties → sameAs (optional) - Property chain axioms (length-2)

datacube.shapes.ttl - RDF Data Cube validation: - sh:SPARQLRule for copying component properties - sh:sparql constraints for integrity checking - Validates observations against data structure definitions

Sources: - SHACL Advanced Features - W3C SHACL GitHub - datacube.shapes.ttl - SIB Swiss SPARQL Examples Utils

QueryIndex Dataclass

The QueryIndex holds indexed sh:SPARQLExecutable query templates for retrieval and reuse.

Tests

Basic tests for SHACL detection and indexing.

# Test detect_shacl with empty graph
g = Graph()
result = detect_shacl(g)
assert result['has_shacl'] == False
assert result['paradigm'] == 'none'
print("✓ Empty graph detection works")
✓ Empty graph detection works
# Test detect_shacl with a simple NodeShape
g = Graph()
EX = Namespace("http://example.org/")
g.add((EX.PersonShape, RDF.type, SH.NodeShape))
g.add((EX.PersonShape, SH.targetClass, EX.Person))
result = detect_shacl(g)
assert result['has_shacl'] == True
assert result['node_shapes'] == 1
assert result['paradigm'] == 'validation'
print("✓ NodeShape detection works")
✓ NodeShape detection works
# Test build_shacl_index
g = Graph()
EX = Namespace("http://example.org/")
g.add((EX.PersonShape, RDF.type, SH.NodeShape))
g.add((EX.PersonShape, SH.targetClass, EX.Person))
g.add((EX.PersonShape, RDFS.label, Literal("Person Shape")))

index = build_shacl_index(g)
assert len(index.shapes) == 1
assert str(EX.PersonShape) in index.shapes
assert str(EX.Person) in index.targets[str(EX.PersonShape)]
assert 'person' in index.keywords or 'personshape' in index.keywords
print("✓ Index building works")
print(f"  {index.summary()}")
✓ Index building works
  SHACLIndex: 1 shapes, 3 keywords, paradigm=validation
# Test search_shapes
results = search_shapes(index, 'person')
assert len(results) >= 1
assert str(EX.PersonShape) == results[0]['uri']
print("✓ Shape search works")
print(f"  Found {len(results)} shapes for 'person'")
✓ Shape search works
  Found 1 shapes for 'person'
# Test describe_shape
desc = describe_shape(index, str(EX.PersonShape))
assert desc['uri'] == str(EX.PersonShape)
assert str(EX.Person) in desc['targets']
print("✓ Shape description works")
print(f"  Targets: {desc['targets']}")
print(f"  Property count: {desc['property_count']}")
✓ Shape description works
  Targets: ['http://example.org/Person']
  Property count: 0
# Test with DCAT-AP shapes
from pathlib import Path
dcat_path = Path('../ontology/dcat-ap/dcat-ap-SHACL.ttl')
if dcat_path.exists():
    g_dcat = Graph()
    g_dcat.parse(dcat_path)
    
    detection = detect_shacl(g_dcat)
    print(f"\n✓ DCAT-AP detection: {detection['node_shapes']} node shapes, paradigm={detection['paradigm']}")
    
    index_dcat = build_shacl_index(g_dcat)
    print(f"  {index_dcat.summary()}")
    
    # Search for Dataset shape
    dataset_shapes = search_shapes(index_dcat, 'dataset', limit=3)
    print(f"  Found {len(dataset_shapes)} shapes matching 'dataset':")
    for s in dataset_shapes[:3]:
        shape_name = s['uri'].split('/')[-1].split('#')[-1]
        print(f"    - {shape_name}")
else:
    print("\n(DCAT-AP shapes not found, skipping test)")

(DCAT-AP shapes not found, skipping test)

Demonstration: Query Template Workflow

This demonstrates the ‘retrieve example → adapt → run’ workflow for discovering how to query unfamiliar SPARQL endpoints.

# Load UniProt example queries from neXtProt
from pathlib import Path

nxp_path = Path('../ontology/uniprot/examples/neXtProt')
if nxp_path.exists():
    ns = {}
    result = load_query_examples(str(nxp_path), ns, 'nxq')
    print(result)
    print(f"\nIndex summary: {ns['nxq'].summary()}")
else:
    print(f"neXtProt examples not found at {nxp_path}")
neXtProt examples not found at ../ontology/uniprot/examples/neXtProt
# Search for protein-related queries
if 'nxq' in ns:
    results = search_queries(ns['nxq'], 'protein', limit=3)
    print(f"Found {len(results)} protein-related queries:\n")
    for r in results:
        uri_short = r['uri'].split('/')[-1]
        print(f"  {uri_short}")
        print(f"    Comment: {r['comment']}")
        print(f"    Keyword: {r['matched_keyword']}\n")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[1], line 2
      1 # Search for protein-related queries
----> 2 if 'nxq' in ns:
      3     results = search_queries(ns['nxq'], 'protein', limit=3)
      4     print(f"Found {len(results)} protein-related queries:\n")

NameError: name 'ns' is not defined
NameError: name 'ns' is not defined
Traceback (most recent call last):

  File "/Users/cvardema/uvws/.venv/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3701, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-1-dc4b1e07ccd1>", line 2, in <module>
    if 'nxq' in ns:
                ^^

NameError: name 'ns' is not defined
# Describe a specific query
if 'nxq' in ns and results:
    query_uri = results[0]['uri']
    desc = describe_query(ns['nxq'], query_uri)
    
    print(f"Query: {query_uri.split('/')[-1]}")
    print(f"Type: {desc['query_type']}")
    print(f"Comment: {desc['comment'][:150]}...")
    if desc['endpoints']:
        print(f"Endpoint: {desc['endpoints'][0]}")
    print(f"\nKeywords: {', '.join(desc['keywords'][:5])}")
    print(f"\nQuery preview:\n{desc['query_preview']}")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[1], line 2
      1 # Describe a specific query
----> 2 if 'nxq' in ns and results:
      3     query_uri = results[0]['uri']
      4     desc = describe_query(ns['nxq'], query_uri)

NameError: name 'ns' is not defined
NameError: name 'ns' is not defined
Traceback (most recent call last):

  File "/Users/cvardema/uvws/.venv/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3701, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-1-858a9fb03277>", line 2, in <module>
    if 'nxq' in ns and results:
                ^^

NameError: name 'ns' is not defined
# Get full query text for execution
if 'nxq' in ns and results:
    query_uri = results[0]['uri']
    full_query = get_query_text(ns['nxq'], query_uri)
    
    print(f"Full query ({len(full_query)} chars):\n")
    print(full_query[:500])
    if len(full_query) > 500:
        print("\n... (truncated for display)")
    
    print("\n# This query could now be adapted and executed:")
    print("# sparql_query(full_query, endpoint=desc['endpoints'][0], name='results', ns=ns)")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[1], line 2
      1 # Get full query text for execution
----> 2 if 'nxq' in ns and results:
      3     query_uri = results[0]['uri']
      4     full_query = get_query_text(ns['nxq'], query_uri)

NameError: name 'ns' is not defined
NameError: name 'ns' is not defined
Traceback (most recent call last):

  File "/Users/cvardema/uvws/.venv/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3701, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-1-54dffe882117>", line 2, in <module>
    if 'nxq' in ns and results:
                ^^

NameError: name 'ns' is not defined
# Search for PTM-related queries (phosphorylation)
if 'nxq' in ns:
    ptm_results = search_queries(ns['nxq'], 'phosphorylation', limit=5)
    print(f"Found {len(ptm_results)} phosphorylation-related queries:\n")
    for r in ptm_results:
        uri_short = r['uri'].split('/')[-1]
        comment_short = r['comment'][:80] + '...' if len(r['comment']) > 80 else r['comment']
        print(f"  • {uri_short}: {comment_short}")
---------------------------------------------------------------------------
NameError                                 Traceback (most recent call last)
Cell In[1], line 2
      1 # Search for PTM-related queries (phosphorylation)
----> 2 if 'nxq' in ns:
      3     ptm_results = search_queries(ns['nxq'], 'phosphorylation', limit=5)
      4     print(f"Found {len(ptm_results)} phosphorylation-related queries:\n")

NameError: name 'ns' is not defined
NameError: name 'ns' is not defined
Traceback (most recent call last):

  File "/Users/cvardema/uvws/.venv/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3701, in run_code
    exec(code_obj, self.user_global_ns, self.user_ns)

  File "<ipython-input-1-3132b14d7a6b>", line 2, in <module>
    if 'nxq' in ns:
                ^^

NameError: name 'ns' is not defined