Towards Trusted LLM based Curator Agents

Charles F. Vardeman II

Center for Research Computing, University of Notre Dame

2023-10-27

GitHub Repo

DoD Data Vision

Norquist, David L. n.d. “DOD Data Strategy.” https://media.defense.gov/2020/Oct/08/2002514180/-1/-1/0/DOD-DATA-STRATEGY.PDF.

Ontology Design Patterns as a Semantic Bridge

AI Agents for Interoperability

Tangi, Luca, Marco Combetto, BOSCH Jaume Martin, and MÜLLER Paula Rodriguez. 2023. “Artificial Intelligence for Interoperability in the European Public Sector.” JRC Publications Repository. October 4, 2023. https://doi.org/10.2760/633646.

Problem – Can we use LLM Based Cognitive Agents to accelerate and create “Active Metadata”?

Problem – How can LLM Based Cognitive Agents use Data Centric AI to be more FACTUAL through Retrieval Augmented Generation (RAG) and Tool Use?

Problem – Data Centric AI is Hard but necessary for Trusted AI – Can we use LLM Based Cognitive Agents to lower the barrier to Data Centric AI?

Problem – How can we Trust, Validate, and integrate Human in the loop for LLM Based Agents used for Data Curation?

Motivation: TAMMS KG

Starting Architecture…

AI Curator “Agents”: Team “LEMON”

Framework for architecture design of LLM Based Agents

Cognitive Architectures for Language Agents

LLM Powered Agents

Example

Activity Specific Agents: Visual Agents

Visual Agents Architecture: Different LLMs based on Role

Activity Specific Agents: Visual Agents Transition Graph

Different LLM’s for Different Tasks

Local LLMs vs API based LLMs

LM Compatibility Tracking

LLMs fine tuned to be Agents

GPTQ model files for Knowledge Engineering GroupAgentLM 70B.

LLMs fine tuned to be Agents

GPTQ model files for Knowledge Engineering GroupAgentLM 70B.

Tool Use (Calling Python Functions)

Structured Responses and LLMs

AWS Agents for Amazon Bedrock

Fully Managed Agents – Amazon Bedrock – AWS

Curation State Graphs?

Modeling The World!

Ontology Engineering: A View from the Trenches - WOP 2015 Keynote | PPT (slideshare.net)

Moo Architecture

We need to think through what Trusted Means!

Frameworks – Data Engine

Copying Tesla’s Data Engine

Frameworks to Capture Provenance of Models!

  • SBoMs and AI BoMs for Agents
    • They are KGs Themselves!
  • Data Cards and Model Cards for Models
  • Agents will be exposed as Microservices themselves
  • We should be able to ask the Microservice Layer for “Trust Information”
  • Agent should store “Metadata” in the Graph Fragment they are constructing.

Starting with a CSV (Navy Maintenance Data)

Korini, Keti, and Christian Bizer. 2023. “Column Type Annotation Using ChatGPT.” arXiv. http://arxiv.org/abs/2306.00745.

Context Matters!

Converting Legacy Enterprise Data into Knowledge Graphs with AI and JSON LD | Eliud Polanco

JSON-LD as a Bridge

Converting Legacy Enterprise Data into Knowledge Graphs with AI and JSON LD | Eliud Polanco

Aside: Curator AI’s should be multimodal

  • Dr. Vardeman’s Law: Data “Lives” in different locations and formats – not every digital object can or should be in the KG layer. The Curator AI should “Catalog” this information.
  • Multimodal LLM’s like AVIS can bridge that Gap!

Semantic AI-based Micro Services

How can we create “Semantic Microservices”

Tim Berners-Lee, James Hendler, and Ora Lassila. “The Semantic Web.” Scientific American 284, no. 5 (2001): 34–43. https://lassila.org/publications/2001/SciAm.html

Semantic Web “Layer Cake”

John Sowa, “Semantics.” n.d. Accessed October 17, 2023. https://www.jfsowa.com/ikl/. Q92665

Aside: Sowa’s law of standards

“Whenever a major organization develops a new system as an official standard for X, the primary result is the widespread adoption of some simpler system as a de facto standard for X.”

Jano’s Layer Cake

Ontology Engineering: A View from the Trenches - WOP 2015 Keynote | PPT (slideshare.net)

Distributed Knowledge Graph Layer Cake

DKG Example

“Web 2.0 Architecture – Microservices”

RESTful web API design

Documenting REST-APIs

Example: HuggingFace Embedding Service

A blazing fast inference solution for text embeddings models

Example: HuggingFace Embedding Service

Text Generation Inference API

OpenAI “Plugins”

Microsoft and “OpenAI Plugins”

Create and run a ChatGPT plugin with Semantic Kernel | Microsoft Learn

Bridging Rest to AI using JSON-LD

JSON-LD Best Practices

JSON as JSON-LD

GET /ordinary-json-document.json HTTP/1.1
Host: example.com
Accept: application/ld+json,application/json,*/*;q=0.1

====================================

HTTP/1.1 200 OK
...
Content-Type: application/json
Link: <https://json-ld.org/contexts/person.jsonld>; rel="http://www.w3.org/ns/json-ld#context"; type="application/ld+json"

{
  "name": "Markus Lanthaler",
  "homepage": "http://www.markus-lanthaler.com/",
  "image": "http://twitter.com/account/profile_image/markuslanthaler"
}

Gorilla: Retrieval Aware Training for APIs

Gorilla: Retrieval Aware Training for APIs

Problem with REST – Interoperability, Scale and Queriability

“Semantic APIs for KG’s”

SPARQL 1.1 Federated Queries

How do we provide “Context” to LLMs to QUERY a KG?

SPARQL 1.1 Service Description to provide Context!

Example in the Wild

UniProt: https://sparql.uniprot.org/.well-known/void

ChatGPT “Plugin” Architecture as Example

Example Service – Retrieval Augmented Generation (We’re not doing this yet!)

SPARQL Interfaces

KG Interpretation in Contexts

FAIR Vocabularies and Ontologies