Retrieval Augmented Generation – Part 1

Indexing
- Data Indexing: Cleaning and extracting data from PDF, HTML, Word, Markdown, Images
- Chunking: Dividing text into smaller chunks for LLM limited context window
- Embedding and Creating Index: Encoding text/images into vectors through a language model
Retrieve: Given a user input, retrieve relevant information
Generation: The user query to the LLM and related documents from retrieval are combined into a new prompt. The LLM generates a response based on this new context window.

Naive RAG Architecture

Langchain Q&A with RAG

Some text

DoD must accelerate its progress towards becoming a data-centric organization. DoD has lacked the enterprise data management to ensure that trusted, critical data is widely available to or accessible by mission commanders, warfighters, decision-makers, and mission partners in a real- time, useable, secure, and linked manner. This limits data-driven decisions and insights, which hinders the execution of swift and appropriate action.

Additionally, DoD software and hardware systems must be designed, procured, tested, upgraded, operated, and sustained with data interoperability as a key requirement. All too often these gaps are bridged with unnecessary human-machine interfaces that introduce complexity, delay, and increased risk of error. This constrains the Department’s ability to operate against threats at
machine speed across all domains.

DoD also must improve skills in data fields necessary for effective data management. The Department must broaden efforts to assess our current talent, recruit new data experts, and retain our developing force while establishing policies to ensure that data talent is cultivated. We must also spend the time to increase the data acumen resident across the workforce and find optimal ways to promote a culture of data awareness.

“Chunking”

“Chunkviz”

“Chunking” with Overlap

“Chunkviz”

Smarter “Chunking”

LangChain - Recursively Split by Character

“Chunking” recursive character splitter

“Chunkviz”

“Chunking” with larger segment size

“Chunkviz”

Vector Indexing of the “Chunks”

from langchain_community.embeddings import FakeEmbeddings
embeddings = FakeEmbeddings(size=1352)
query_result = embeddings.embed_query(dod_text)
print(dod_text[:5])
query_result[:5]

DoD m

[0.28925496400357076,
 0.42954295410387294,
 -0.75042013219397,
 -0.21105104953004536,
 -0.655199848252018]

Figure 1: Vector representation of the text

Trusted AI Point of View…

Failure points for RAG

Barnett, Scott, Stefanus Kurniawan, Srikanth Thudumu, Zach Brannelly, and Mohamed Abdelrazek. 2024. “Seven Failure Points When Engineering a Retrieval Augmented Generation System.”

Problem: A global constant for Chunk Size doesn’t take into account the semantic structure of a document.

“Agentic” Chunking

LangChain on X: Proposition-Based Retrieval

Agentic Example: Proposition Based Dense Retrieval

Chen, Tong, Hongwei Wang, Sihao Chen, Wenhao Yu, Kaixin Ma, Xinran Zhao, Hongming Zhang, and Dong Yu. 2023. “Dense X Retrieval: What Retrieval Granularity Should We Use?” arXiv.Org. December 11, 2023. https://arxiv.org/abs/2312.06648v2.

RAG Complexity Overview

Comparison with other optimization methods

LLMs and Trusted AI

Sun, Lichao, Yue Huang, Haoran Wang, Siyuan Wu, Qihui Zhang, Chujie Gao, Yixin Huang, et al. 2024. “TrustLLM: Trustworthiness in Large Language Models.” arXiv. http://arxiv.org/abs/2401.05561.

Graph Based Vector Retrieval

Neo4j Ontology Vectors