The following questions have been designed to help you assess risks throughout the implementation of AI projects. The questions are organized based on the AI product life cycle. We recommend you return to this tool throughout the project and complete the relevant questions. At the end you will be able to export answers as a PDF or JSON.
3. Assessment
3.1 Assess Requirements, Statements of Concern, Mitigations, and Metrics
Ensure all mitigation action measures and controls have a method of being assessed and monitored throughout the life cycle.
Do all requirements (operational, functional, and technical) have appropriate modes of assessment, and benchmarks of success and failure?
Do all statements of concern and mitigation actions have appropriate modes of assessment, and benchmarks of success and failure?
Do all tradeoffs, trustworthiness, and confidence measures have appropriate modes of assessment, and benchmarks of success and failure?
How will performance metrics be established?
How will baseline metrics for system performance be established?
How will errors be detected?
Given the use case, potential consequences, and affected stakeholders for this context, is it better to minimize certain types of error (i.e. considerations of precision vs. recall, Type I error vs. Type II error, etc.)?
How will error rates be measured, and how will they be measured in terms of how they affect different sub-populations?
Will error rates be recorded in the Impact Assessment?
Will the metrics need to evolve as the system behavior changes during use (i.e. feedback loops)?
How will user understanding be measured?
Are there any measurement gaps or limits to the precision of measurement?
Are there latent constructs or other factors that will be difficult to operationalize or measure?
How will these issues affect risk calculations/impact analyses – do these need to be revisited?
Is the system’s ontology appropriate for the use case and for tracking alignment with the DoD AI Ethical Principles?
Are all of your Statements of Concern and all aspects of your legal/ethical/policy frameworks sufficiently addressed?
If not, re-conduct activities under the Intake and Ideation phases.
What are the anticipated failures? How will these be detected?
Is there a process for system rollback and/or stoppage?
Please describe the required artifacts your organization requires, like data ethics reviews, and your team's plans to complete them.
Describe the access controls that verify the model is only accessed by those who are approved to do so, and their access is appropriate for their specific roles. Please see DoDD 5411 for guidance.
3.2 Exploratory Data Analysis
How was the data collected or acquired?
Could certain classes or populations have been undersampled?
Is the data representative of the use case/deployment context?
Has the data become stale? How often will it need to be updated?
Given the above, does the data need to be re-collected?
Will re-collection of the data introduce additional operational risks or risks to the force?
What other steps can be taken to improve the quality of the data?
How was the data labelled?
Is ground truth accessible given the data type?
Could human biases affect how the data was labelled?
Given the above, does the data need to be re-labeled or is it insufficient to proceed to development?
Data Provenance, Protection, and Access
How is the data accessed? Who has access and how is it controlled?
Where is the data stored?
How is the data protected?
What ensures data provenance? How are transformations and cleaning recorded?
Is any of the data generated synthetically, or should it be?
How is the data used?
Data Exploration
What abnormalities, outliers, or irregularities are present in the data?
Were these irregularities a source of human error, sensor error, processing error, or natural or adversarial perturbation? What mitigations are required for greater accuracy?
Will data or feedback used to update or fine-tune the model at later stages (such as through Reinforcement Learning with Human Feedback [RLHF]) have any of the issues contained in 3.2.1 or 3.2.2? How will these be mitigated?
What data science techniques were used to check the data and model for statistical validity?
Have you determined which group parity optimization methods are most appropriate?
What mitigations (including dataset, in-processing, or post-processing) have been applied to the data or models to ensure statistical validity?
How have stakeholders used domain knowledge to identify the appropriate measurements of statistical validity?
How does the system design or training allow for users to understand and account for any remaining statistical validation issues?
According to your legal/ethical/policy frameworks, SOCs, mitigations, use case, and mission domain – is AI suitable for this use case? Be sure to answer the following:
Is AI suitable for the task at hand? Is the model type appropriate for the task at hand? Do the advantages outweigh the disadvantages, known or possible? Does utilizing AI in this case achieve something that a non-AI tool could not accomplish?
What is the specific task that the system performs?
What is the system input and output required to perform that task?