AI Success Factors: Engineering Trust in Deployments

Slides Link: https://la3d.github.io/nuggets/slideindex.html

Building “Stuff”…

Building Stuff?

Building Agents based on Large Language Models!

You are “almost” here ⬇️

Building “Agents” Involves Pre-trained Foundation Models…

How do we “program” a Large Language Model?

Autoregressive Large Language Model

“An autoregressive large language model (AR-LLM) is a type of neural network model that can generate natural language text. It has a very large number of parameters (billions or trillions) that are trained on a huge amount of text data from various sources. The main goal of an AR-LLM is to predict the next word or token based on the previous words or tokens in the input text. For example, if the input text is”The sky is”, the AR-LLM might predict “blue” as the next word. AR-LLMs can also generate text from scratch by sampling words from a probability distribution. For example, if the input text is empty, the AR-LLM might generate “Once upon a time, there was a princess who lived in a castle.” as the output text.”¹

AR-LLMs can simulate “Turing Machines”

Abstract: We show that transformer-based large language models are computationally universal when augmented with an external memory. Any deterministic language model that conditions on strings of bounded length is equivalent to a finite automaton, hence computationally limited. However, augmenting such models with a read-write memory creates the possibility of processing arbitrarily large inputs and, potentially, simulating any algorithm. We establish that an existing large language model, Flan-U-PaLM 540B, can be combined with an associative read-write memory to exactly simulate the execution of a universal Turing machine, \(U_{15,2}\). A key aspect of the finding is that it does not require any modification of the language model weights. Instead, the construction relies solely on designing a form of stored instruction computer that can subsequently be programmed with a specific set of prompts.

What Kind of LLM Agents are we trying to build?

Conversational Agents
vs. Cognitive Autonomous Agents
vs. Agents tuned for a Data Processing Task

We will focus on Conversational Agents…

The Best Advice we can Give…

Caveat: You are at the Edge of Research and Practice!

Prompt Engineering

“Prompt engineering is the process of designing and refining the prompts or input stimuli for a language model to generate specific types of output. Prompt engineering involves selecting appropriate keywords, providing context, and shaping the input in a way that encourages the model to produce the desired response and is a vital technique to actively shape the behavior and output of foundation models.”¹

(GPT-3) Instruct-GPT Reinforcement Learning from Human Feedback

Instruction Tuning Facilitate Conversational Agents to “Converse” in a Set Style!

Tools for “Prompt Engineering”

Trusted “Prompt Engineering” for Conversational Agents

“NeMo Guardrails is an open-source toolkit for easily adding programmable guardrails to LLM-based conversational systems. Guardrails (or”rails” for short) are specific ways of controlling the output of a large language model, such as not talking about politics, responding in a particular way to specific user requests, following a predefined dialog path, using a particular language style, extracting structured data, and more.”¹

Trusted “Prompt Engineering” for Conversational Agents

Building Trustworthy, Safe, and Secure LLM Conversational Systems: The core value of using NeMo Guardrails is the ability to write rails to guide conversations. You can choose to define the behavior of your LLM-powered application on specific topics and prevent it from engaging in discussions on unwanted topics.
Connect models, chains, services, and more via actions: NeMo Guardrails provides the ability to connect an LLM to other services (a.k.a. tools) seamlessly and securely.

(GPT-3) Large Language Models are Zero Shot Reasoners (Chain-of-Thought Reasoning)

LLMs as Reasoners using Prompts!

Prompt Engineering Guide

We want Large Language Models to be Factual!

Fine-Tuning: augment the behavior of the model
Retrieval: introduce new knowledge to the model
Retreval Aware Training (RAT) Fine-tune the model to use or ignore retrieved content

Fine-Tuning Foundation Models

“Foundation models are computationally expensive and trained on a large, unlabeled corpus. Fine-tuning a pre-trained foundation model is an affordable way to take advantage of their broad capabilities while customizing a model on your own small, corpus. Fine-tuning is a customization method that involved further training and does change the weights of your model…”

“…There are two main approaches that you can take for fine-tuning depending on your use case and chosen foundation model. If you’re interested in fine-tuning your model on domain-specific data, see Domain adaptation fine-tuning. If you’re interested in instruction-based fine-tuning using prompt and response examples, see Instruction-based fine-tuning.”¹

Retrieval Augmented Generation (RAG)

“Foundation models are usually trained offline, making the model agnostic to any data that is created after the model was trained. Additionally, foundation models are trained on very general domain corpora, making them less effective for domain-specific tasks. You can use Retrieval Augmented Generation (RAG) to retrieve data from outside a foundation model and augment your prompts by adding the relevant retrieved data in context. For more information about RAG model architectures”¹

Retrieval Augmented Generation (RAG)

LlamaIndex to Build Hybrid KGs

Gorilla: Retrieval Aware Training for APIs

Gorilla: Retrieval Aware Training for APIs

“Big Models” vs “Small Models”

Models as a service (Bedrock, OpenAI API, Anthropic Claude)
- Generally more difficult to “Fine-Tune” (GPT-3.5 turbo)¹
- Models are generally more capable (Factuality, Instructions, Reasoning)]
- “Coin-operated” pay per/token
“Open License” 7B-70B Parameter Models
- Mostly based on Meta AI LLama or LLama 2 models
- Require more effort to work consistently
- Models can run on reduced hardware requirements
- Can be fine-tuned for task specific workflows

Small Models with custom grammar (llama.cpp)

JSON-Grammar

root   ::= object
value  ::= object | array | string | number | ("true" | "false" | "null") ws

object ::=
  "{" ws (
            string ":" ws value
    ("," ws string ":" ws value)*
  )? "}" ws

array  ::=
  "[" ws (
            value
    ("," ws value)*
  )? "]" ws

string ::=
  "\"" (
    [^"\\] |
    "\\" (["\\/bfnrt] | "u" [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F] [0-9a-fA-F]) # escapes
  )* "\"" ws

number ::= ("-"? ([0-9] | [1-9] [0-9]*)) ("." [0-9]+)? ([eE] [-+]? [0-9]+)? ws

# Optional space: by convention, applied in this grammar after literal chars when allowed
ws ::= ([ \t\n] ws)?

Building “Stuff”?

Slides Link: https://la3d.github.io/nuggets/slideindex.html

Building “Stuff”…

Building “Stuff”…

You are “almost” here ⬇️

Building “Agents” Involves Pre-trained Foundation Models…

How do we “program” a Large Language Model?

Autoregressive Large Language Model

AR-LLMs can simulate “Turing Machines”

What Kind of LLM Agents are we trying to build?

The Best Advice we can Give…

Caveat: You are at the Edge of Research and Practice!

Prompt Engineering

(GPT-3) Instruct-GPT Reinforcement Learning from Human Feedback

Instruction Tuning Facilitate Conversational Agents to “Converse” in a Set Style!

Tools for “Prompt Engineering”

Tools for “Prompt Engineering”

Trusted “Prompt Engineering” for Conversational Agents

Trusted “Prompt Engineering” for Conversational Agents

(GPT-3) Large Language Models are Zero Shot Reasoners (Chain-of-Thought Reasoning)

LLMs as Reasoners using Prompts!

Prompt Engineering Guide

We want Large Language Models to be Factual!

Fine-Tuning Foundation Models

Retrieval Augmented Generation (RAG)

Retrieval Augmented Generation (RAG)

LlamaIndex to Build Hybrid KGs

Gorilla: Retrieval Aware Training for APIs

Gorilla: Retrieval Aware Training for APIs

“Big Models” vs “Small Models”

Small Models with custom grammar (llama.cpp)

“The state of GPT” Recommendations

Open Source Community

On to the First Steps to Building LLM Based Applications…