# Test detect_shacl with empty graph
g = Graph()
result = detect_shacl(g)
assert result['has_shacl'] == False
assert result['paradigm'] == 'none'
print("✓ Empty graph detection works")✓ Empty graph detection works
Detect sh:SPARQLExecutable content in RDF graphs.
Detect sh:SPARQLExecutable content in a graph.
Args: graph: RDF graph to analyze
Returns: Dict with: has_executables: True if any SPARQLExecutable found select_count: Count of sh:SPARQLSelectExecutable instances construct_count: Count of sh:SPARQLConstructExecutable instances ask_count: Count of sh:SPARQLAskExecutable instances total_count: Total query count
Index of sh:SPARQLExecutable query templates for retrieval.
Attributes: queries: List of query URIs comments: Mapping from query URI to rdfs:comment description keywords: Inverted index from keyword to query URIs endpoints: Mapping from query URI to target endpoint URLs query_text: Mapping from query URI to sh:select/sh:construct text query_type: Mapping from query URI to type (‘select’, ‘construct’, ‘ask’) source_file: Mapping from query URI to source file path
Extract searchable keywords from query templates.
Extract keywords from schema:keywords and rdfs:comment.
Sources: - schema:keywords (explicit tags) - rdfs:comment (word extraction) - Query URI local name
Args: graph: RDF graph containing the query query_uri: Query URI comment: rdfs:comment text
Returns: List of lowercase keywords
Build searchable index from sh:SPARQLExecutable templates.
Build searchable index from sh:SPARQLExecutable templates.
Extracts: sh:select, sh:construct, sh:ask, rdfs:comment, schema:keywords, schema:target
Args: graph: RDF graph containing query templates source_path: Optional source file path for tracking
Returns: QueryIndex with indexed queries
Progressive disclosure functions for exploring query templates.
Find query templates matching keyword.
Args: index: QueryIndex to search keyword: Search term limit: Maximum number of results
Returns: List of dicts with: uri, comment, endpoints, matched_keyword
Get bounded description of a query template.
Args: index: QueryIndex to query query_uri: URI of query
Returns: Dict with: uri, comment, endpoints, query_type, keywords, query_preview (200 chars)
Get full SPARQL query text for execution.
Args: index: QueryIndex to query query_uri: URI of query
Returns: Full SPARQL query text
Load SPARQL example files from directory into QueryIndex.
Recursively loads all .ttl files and builds combined index.
Args: path: Directory path containing .ttl example files ns: Namespace dict for storing the index name: Variable name for the index in ns
Returns: Status message
The SHACLIndex holds indexed SHACL shapes for efficient retrieval.
Index of SHACL shapes for retrieval.
Attributes: shapes: List of shape URIs targets: Mapping from shape URI to target class URIs properties: Mapping from shape URI to property constraint dicts keywords: Inverted index from keyword to shape URIs paradigm: SHACL usage paradigm (‘validation’, ‘shacl-first’, ‘mixed’)
Detect whether a graph contains SHACL shapes and determine the usage paradigm.
Detect SHACL content in a graph.
Args: graph: RDF graph to analyze
Returns: Dict with: has_shacl: True if any SHACL patterns found node_shapes: Count of sh:NodeShape instances property_shapes: Count of sh:PropertyShape instances paradigm: ‘validation’, ‘shacl-first’, or ‘mixed’
Extract searchable keywords from shape metadata.
Extract searchable keywords from a shape.
Args: graph: RDF graph containing the shape shape: Shape URI target_classes: Target class URIs props: Property constraint dicts
Returns: List of lowercase keywords
Build a searchable index from SHACL shapes in a graph.
Build searchable index from SHACL shapes in graph.
Args: graph: RDF graph containing SHACL shapes
Returns: SHACLIndex with indexed shapes
Progressive disclosure primitives for exploring SHACL shapes.
Get bounded description of a SHACL shape.
Args: index: SHACL index to query shape_uri: URI of shape to describe limit: Maximum number of properties to return
Returns: Dict with: uri: Shape URI targets: List of target class URIs properties: First limit property constraints property_count: Total property count truncated: True if property list was truncated
Find shapes matching keyword.
Args: index: SHACL index to search keyword: Search term limit: Maximum number of results
Returns: List of dicts with: uri: Shape URI targets: Target class URIs matched_keyword: The keyword that matched
Get human-readable property constraints for a shape.
Args: index: SHACL index to query shape_uri: URI of shape
Returns: Formatted string with property constraints
This project uses SHACL-AF in two ways:
| Feature | sh:SPARQLRule | sh:SPARQLExecutable |
|---|---|---|
| Purpose | Inference/validation | Query templates |
| Attached to | sh:NodeShape via sh:rule | Standalone resource |
| Query types | CONSTRUCT (inference) | SELECT/CONSTRUCT/ASK |
| Example files | rdfsplus.rule.ttl (6 rules) | uniprot/examples (1,228+ queries) |
| owlrl-shacl.ttl (13 rules) | ||
| datacube.shapes.ttl (1 rule) |
rdfsplus.rule.ttl - Basic RDFS inference: - Subclass transitivity - Type propagation via subclass - Subproperty transitivity - Property inclusion - Domain/range typing
owlrl-shacl.ttl - OWL RL subset inference: - All RDFS rules plus - Equivalent class expansion - Inverse properties - Transitive properties - Symmetric properties - Functional properties → sameAs (optional) - Property chain axioms (length-2)
datacube.shapes.ttl - RDF Data Cube validation: - sh:SPARQLRule for copying component properties - sh:sparql constraints for integrity checking - Validates observations against data structure definitions
Sources: - SHACL Advanced Features - W3C SHACL GitHub - datacube.shapes.ttl - SIB Swiss SPARQL Examples Utils
The QueryIndex holds indexed sh:SPARQLExecutable query templates for retrieval and reuse.
Basic tests for SHACL detection and indexing.
✓ Empty graph detection works
# Test detect_shacl with a simple NodeShape
g = Graph()
EX = Namespace("http://example.org/")
g.add((EX.PersonShape, RDF.type, SH.NodeShape))
g.add((EX.PersonShape, SH.targetClass, EX.Person))
result = detect_shacl(g)
assert result['has_shacl'] == True
assert result['node_shapes'] == 1
assert result['paradigm'] == 'validation'
print("✓ NodeShape detection works")✓ NodeShape detection works
# Test build_shacl_index
g = Graph()
EX = Namespace("http://example.org/")
g.add((EX.PersonShape, RDF.type, SH.NodeShape))
g.add((EX.PersonShape, SH.targetClass, EX.Person))
g.add((EX.PersonShape, RDFS.label, Literal("Person Shape")))
index = build_shacl_index(g)
assert len(index.shapes) == 1
assert str(EX.PersonShape) in index.shapes
assert str(EX.Person) in index.targets[str(EX.PersonShape)]
assert 'person' in index.keywords or 'personshape' in index.keywords
print("✓ Index building works")
print(f" {index.summary()}")✓ Index building works
SHACLIndex: 1 shapes, 3 keywords, paradigm=validation
✓ Shape search works
Found 1 shapes for 'person'
✓ Shape description works
Targets: ['http://example.org/Person']
Property count: 0
# Test with DCAT-AP shapes
from pathlib import Path
dcat_path = Path('../ontology/dcat-ap/dcat-ap-SHACL.ttl')
if dcat_path.exists():
g_dcat = Graph()
g_dcat.parse(dcat_path)
detection = detect_shacl(g_dcat)
print(f"\n✓ DCAT-AP detection: {detection['node_shapes']} node shapes, paradigm={detection['paradigm']}")
index_dcat = build_shacl_index(g_dcat)
print(f" {index_dcat.summary()}")
# Search for Dataset shape
dataset_shapes = search_shapes(index_dcat, 'dataset', limit=3)
print(f" Found {len(dataset_shapes)} shapes matching 'dataset':")
for s in dataset_shapes[:3]:
shape_name = s['uri'].split('/')[-1].split('#')[-1]
print(f" - {shape_name}")
else:
print("\n(DCAT-AP shapes not found, skipping test)")
(DCAT-AP shapes not found, skipping test)
This demonstrates the ‘retrieve example → adapt → run’ workflow for discovering how to query unfamiliar SPARQL endpoints.
# Load UniProt example queries from neXtProt
from pathlib import Path
nxp_path = Path('../ontology/uniprot/examples/neXtProt')
if nxp_path.exists():
ns = {}
result = load_query_examples(str(nxp_path), ns, 'nxq')
print(result)
print(f"\nIndex summary: {ns['nxq'].summary()}")
else:
print(f"neXtProt examples not found at {nxp_path}")neXtProt examples not found at ../ontology/uniprot/examples/neXtProt
# Search for protein-related queries
if 'nxq' in ns:
results = search_queries(ns['nxq'], 'protein', limit=3)
print(f"Found {len(results)} protein-related queries:\n")
for r in results:
uri_short = r['uri'].split('/')[-1]
print(f" {uri_short}")
print(f" Comment: {r['comment']}")
print(f" Keyword: {r['matched_keyword']}\n")---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[1], line 2
1 # Search for protein-related queries
----> 2 if 'nxq' in ns:
3 results = search_queries(ns['nxq'], 'protein', limit=3)
4 print(f"Found {len(results)} protein-related queries:\n")
NameError: name 'ns' is not defined
NameError: name 'ns' is not defined
Traceback (most recent call last):
File "/Users/cvardema/uvws/.venv/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3701, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-1-dc4b1e07ccd1>", line 2, in <module>
if 'nxq' in ns:
^^
NameError: name 'ns' is not defined
# Describe a specific query
if 'nxq' in ns and results:
query_uri = results[0]['uri']
desc = describe_query(ns['nxq'], query_uri)
print(f"Query: {query_uri.split('/')[-1]}")
print(f"Type: {desc['query_type']}")
print(f"Comment: {desc['comment'][:150]}...")
if desc['endpoints']:
print(f"Endpoint: {desc['endpoints'][0]}")
print(f"\nKeywords: {', '.join(desc['keywords'][:5])}")
print(f"\nQuery preview:\n{desc['query_preview']}")---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[1], line 2
1 # Describe a specific query
----> 2 if 'nxq' in ns and results:
3 query_uri = results[0]['uri']
4 desc = describe_query(ns['nxq'], query_uri)
NameError: name 'ns' is not defined
NameError: name 'ns' is not defined
Traceback (most recent call last):
File "/Users/cvardema/uvws/.venv/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3701, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-1-858a9fb03277>", line 2, in <module>
if 'nxq' in ns and results:
^^
NameError: name 'ns' is not defined
# Get full query text for execution
if 'nxq' in ns and results:
query_uri = results[0]['uri']
full_query = get_query_text(ns['nxq'], query_uri)
print(f"Full query ({len(full_query)} chars):\n")
print(full_query[:500])
if len(full_query) > 500:
print("\n... (truncated for display)")
print("\n# This query could now be adapted and executed:")
print("# sparql_query(full_query, endpoint=desc['endpoints'][0], name='results', ns=ns)")---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[1], line 2
1 # Get full query text for execution
----> 2 if 'nxq' in ns and results:
3 query_uri = results[0]['uri']
4 full_query = get_query_text(ns['nxq'], query_uri)
NameError: name 'ns' is not defined
NameError: name 'ns' is not defined
Traceback (most recent call last):
File "/Users/cvardema/uvws/.venv/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3701, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-1-54dffe882117>", line 2, in <module>
if 'nxq' in ns and results:
^^
NameError: name 'ns' is not defined
# Search for PTM-related queries (phosphorylation)
if 'nxq' in ns:
ptm_results = search_queries(ns['nxq'], 'phosphorylation', limit=5)
print(f"Found {len(ptm_results)} phosphorylation-related queries:\n")
for r in ptm_results:
uri_short = r['uri'].split('/')[-1]
comment_short = r['comment'][:80] + '...' if len(r['comment']) > 80 else r['comment']
print(f" • {uri_short}: {comment_short}")---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[1], line 2
1 # Search for PTM-related queries (phosphorylation)
----> 2 if 'nxq' in ns:
3 ptm_results = search_queries(ns['nxq'], 'phosphorylation', limit=5)
4 print(f"Found {len(ptm_results)} phosphorylation-related queries:\n")
NameError: name 'ns' is not defined
NameError: name 'ns' is not defined
Traceback (most recent call last):
File "/Users/cvardema/uvws/.venv/lib/python3.12/site-packages/IPython/core/interactiveshell.py", line 3701, in run_code
exec(code_obj, self.user_global_ns, self.user_ns)
File "<ipython-input-1-3132b14d7a6b>", line 2, in <module>
if 'nxq' in ns:
^^
NameError: name 'ns' is not defined