2023-08-07
Testing and CI/CD are software engineering practices that aim to ensure the quality, reliability, and security of software applications. They are especially important for AI applications, which involve complex and dynamic data, models, and algorithms.
Software 1.0 refers to the traditional way of developing software by writing code that specifies the logic and rules of the application. Software 2.0 refers to the emerging way of developing software by using machine learning (ML) models that learn from data and generate code or behavior.
AI applications pose unique challenges and risks for testing and CI/CD, such as:
Testing and CI/CD for AI involve applying software engineering best practices to the data, model, and code components of AI applications. Some examples are:
Twitter: Andrej Karpathy
Learn More: [Tesla’s Data Engine and what we should learn from it]
Learn More: Effective testing for machine learning systems
Youtube Discussion: MLOps Chat: How Should We Test ML Models? with Data Scientist Jeremy Jordan
ML systems pose unique challenges and risks for testing, such as:
Testing ML systems can help address these challenges and risks by:
Testing ML systems involves applying software engineering best practices to the data, model, and code components of ML systems. Some examples are:
There are different tools and techniques that can help us test ML systems effectively and efficiently. Some examples are:
Learn More: Effective testing for machine learning systems
Learn More: Microsoft Recommenders GitHub
Learn More: Microsoft Recommenders GitHub
Learn More: Microsoft Recommenders GitHub
The Microsoft Recommenders repository contains examples and best practices for building recommendation systems, provided as Jupyter notebooks. The repository also includes various tests to ensure the quality, reliability, and security of the code and the notebooks.
There are three types of tests in the Microsoft Recommenders repository:
unit
folder and use pytest
as the testing framework. They are triggered by pull requests to the main
or staging
branches.smoke
folder and use papermill
and scrapbook
as the testing tools. They are run nightly on the main
or staging
branches.integration
folder and use papermill
and scrapbook
as the testing tools. They are run nightly on the main
or staging
branches.The Microsoft Recommenders repository uses Azure DevOps as the testing infrastructure. Azure DevOps is a cloud-based platform that provides various services and tools for software development, such as version control, project management, testing, deployment, and monitoring.
There are 19 pipelines for Linux tests and 19 pipelines for Windows tests, each corresponding to a different type of test, branch, and environment. For example:
main
branch.staging
branch.main
branch.staging
branch.main
branch.staging
branch.main
branch.staging
branch.main
branch.staging
branch.Learn More: [Test Strategy · microsoft/recommenders Wiki · GitHub]
The pipelines use conda environments to manage dependencies and run tests. Conda is an open-source package and environment management system that allows us to create and use different configurations of software packages and libraries.
A script, generate_conda_file.py, is used to create conda environments for different combinations of CPU, GPU, and Spark. For example:
Learn More: [Conda — Conda documentation]
The pipelines also use Azure Machine Learning (AML) to run some of the tests on different compute clusters. AML is a cloud-based service that provides various tools and features for ML development, such as data preparation, model training, model deployment, model management, and model monitoring.
AML provides scalable and flexible compute resources for ML development. For example:
Learn More: [What is Azure Machine Learning? - Azure Machine Learning | Microsoft Docs]
Learn More: How is ChatGPT’s behavior changing over time?
Learn More: How is ChatGPT’s behavior changing over time?
Learn More: AgentBench: Evaluating LLMs as Agents
Learn More: AgentBench: Evaluating LLMs as Agents
Learn More: Deploy Llama 2 7B/13B/70B on Amazon SageMaker
Learn More: Deploy Llama 2 7B/13B/70B on Amazon SageMaker
Learn More: Generative AI in Jupyter
Learn More: Generative AI in Jupyter
Learn More: Getting Started With LLMs
Learn More: Prompt Engineering Guide