Data Centric AI

Charles F Vardeman II

2023-09-15

Trusted AI Frameworks for Knowledge Engineering

Foundational Components for Trusted AI

  • Automate the integration and deployment of code, ensuring quality and operational efficiency.
  • Standardized Development Environments: Establish consistent, easily replicable environments to accelerate development and experimentation.
  • Data & Experiment Versioning: Implement robust systems to track changes in data and experiments, allowing for traceability and repeatability.
  • Model Lifecycle Management: Streamline the training, deployment, monitoring, and updating of machine learning models.
  • Flexibility Across Layers: Design the architecture to allow for different levels of customization, from high-level APIs to low-level controls, facilitating adaptability.

Data Centric AI

Data-centric AI Resource Hub

Aside: Lessons from the Semantic Web?

“38971912 · Towards a Framework for Data Excellence in Data-Centric AI: Lessons from the Semantic Web,” SlidesLive. (accessed Sep. 14, 2023).

DVC and Data Centric AI?

Data-centric AI is an emerging concept that emphasizes the importance of data quality and data engineering in building AI systems. Data-centric AI aims to improve the performance and robustness of AI models by systematically characterizing, evaluating, and monitoring the underlying data used to train and evaluate them⁴. Data-centric AI also involves using data-driven methods and tools to inform the considerations at each stage of the ML pipeline⁴.

One of the tools that can help with data-centric AI is Data Version Control (DVC), which is a system for versioning machine learning models, data sets, and intermediate files. DVC connects them with code, and uses various storage options to store file contents³. DVC allows users to track and reproduce the experiments, share data and models, and collaborate effectively on AI projects³.

If you want to learn more about data-centric AI, you can check out some of the web search results I found for you. For example, you can read a survey paper that discusses the necessity, goals, methods, challenges, and benchmarks of data-centric AI¹. You can also watch a video lecture by Andrew Ng, who popularized the term data-centric AI⁶. Or you can explore a website that provides a checklist and resources for applying data-centric AI in practice⁴. I hope this helps you understand what data-centric AI is and how DVC can be useful for it. 😊

Source: Conversation with Bing, 9/14/2023

  1. What is Data-Centric AI? - van der Schaar Lab..

  2. Data Version Control · DVC..

  3. [2303.10158] Data-centric Artificial Intelligence: A Survey - arXiv.org..

  4. The Principles of Data-Centric AI (DCAI) - arXiv.org..

  5. What is Data-Centric Architecture in AI? | Simplilearn..

Data Version Control (DVC)

Hugging Face

AI Testimony before US Senate

Clement Delangue Senate Statement

DVC and Huggingface Integration (Team Frameworks – Peter)

JSON-LD Model and “AI Based Microservices”

Motivation…

How do we develop a curriculum for training large language models?

The “Pile”

LLama: Open and Efficient Foundation Language Models

Touvron, Hugo, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux,Timothée Lacroix, Baptiste Rozière, et al. “LLaMA: Open and Efficient Foundation Language Models.” arXiv, February 27, 2023. https://doi.org/10.48550/arXiv.2302.13971.

(GPT-4) “Sparks of AGI”?

Sparks of AGI: Early Experiments with GPT-4, 2023. https://www.youtube.com/watch?v=qbIk7-JPB2c.

Textbooks Are All You Need!

Textbooks Are All You Need, 2023. https://www.youtube.com/watch?v=24O1KcIO3FM.

Textbooks Are All You Need!

Gunasekar, Suriya, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, et al. “Textbooks Are All You Need.” arXiv, June 20, 2023. http://arxiv.org/abs/2306.11644.

Textbooks are all you need II: phi-1.5

Y. Li, S. Bubeck, R. Eldan, A. Del Giorno, S. Gunasekar, and Y. T. Lee, “Textbooks Are All You Need II: phi-1.5 technical report.” arXiv, Sep. 11, 2023. Accessed: Sep. 12, 2023. [Online]. Available: http://arxiv.org/abs/2309.05463

“Coding Textbooks”

Improving data quality by using “Big Brain LLM”

Textbooks are all you need II: phi-1.5

Microsoft/phi-1_5

phi-1.5 Doesn’t want to kill us all…

Sebastien Bubeck on X