Responsible AI Toolkit IconHome

What is Responsible AI (RAI)?

“America and China are competing to shape the future of the 21st century, technologically and otherwise. That competition is one which we intend to win — not in spite of our values, but because of them.”
— Deputy Secretary of Defense Kathleen Hicks

AI-enabled capabilities are advancing rapidly, providing powerful new possibilities for prediction, automation, content generation, and accelerating the speed of decision. These technologies are already transforming the face of society and warfare. Whether such transformations will have positive or negative effects on our nation depends on whether or not these technologies are designed, developed, and used responsibly. A responsible approach to AI means innovating at a speed that can outpace existing and emerging threats and with a level of effectiveness that provides justified confidence in the technology and its employment.

Responsible AI (RAI) involves ensuring that our technology matches our values. It positions our nation to lead in technological innovation while remaining committed to and advocates of our democratic principles. In order to accomplish this goal, RAI work at the DoD translates ethical principles into concrete benchmarks for each use case. The RAI Toolkit provides the resources to do this work and is designed for accessibility, modularity, and customization.

Because it is an emerging field, RAI means different things to different organizations. To explain the DoD’s unique approach to RAI (Realities), this document directly addresses common misconceptions about RAI in the defense context (Myths) below.

Why does Responsible AI Matter?

“...ultimately, AI systems only work when they are based in trust. We have a principled approach to AI that anchors everything that this Department does. We call this Responsible AI, and it’s the only kind of AI that we do. Responsible AI is the place where cutting-edge tech meets timeless values.”
— Secretary of Defense Lloyd J. Austin III

Responsible AI Toolkit: Context

Why use the Responsible AI Toolkit?

As stated in the U.S. Department of Defense Responsible Artificial Intelligence Strategy and Implementation Pathway, “DoD must demonstrate that our military’s steadfast commitment to lawful and ethical behavior apply when designing, developing, testing, procuring, deploying, and using AI.” As part of the implementation of RAI within DoD, this toolkit/assessment aims to provide users a wide array of considerations to help ensure DoD AI capabilities are developed, deployed, and used responsibly. As the RAI Strategy & Implementation Pathway states, an RAI approach “ensures the safety of our systems and their ethical employment”. This toolkit/assessment also provides linkages to ensure that the entire lifecycle of AI-enabled systems (to include design, development, deployment, and use) is consistent with DoD’s AI Ethical Principles.

This assessment tool will serve as a resource for individuals traversing the intricate phases of the AI lifecycle, empowering them to navigate the pivotal dimensions of Responsible AI (RAI) within their projects. It will facilitate their journey by providing an array of tools supported by the Department of Defense, alongside exemplar artifacts and invaluable insights derived from analogous endeavors.

Who can use the RAI Toolkit?

This RAI Toolkit contains two main portions – a series of assessment questions along a project development lifecycle, and a set of RAI tools that support answering those questions. The RAI Toolkit is designed to address the equities of a wide variety of stakeholders and personas – a full list of personas (with descriptions) can be found in Appendix 7. This list of personas has been adapted from the Defense Cyber Workforce Framework (DCWF). Furthermore, each item of the Toolkit has been tagged with the relevant personas, and with the nature of their responsibilities to that particular Toolkit item, in order to streamline and clarify roles and responsibilities for each item of the Toolkit.

The RAI Toolkit is built to support the various facets of a formal program of record (POR) lifecycle process, in which the user community identifies an operational need and sets operational requirements for an AI capability to meet that need. An institutional requirements owner then translates these into functional requirements and engages with the acquisition community to either build or buy a capability that meets those requirements. The acquisition community further translates the requirements into performance specifications for the capability. A program manager oversees the procurement of the capability, which is executed by a development team including data engineers, model developers, and user experience designers, among others. Senior leaders monitor and oversee the process across the lifecycle. Additional experts support across the phases as well, including AI Ethics and Risk Specialists, test & evaluation specialists, privacy, and cybersecurity personnel. These personas are used in the RAI Assessment to identify relevant aspects of the RAI lifecycle that apply to them.

Beyond the PoR pathway described above, users of this toolkit may come from smaller teams (or teams of one) where they are trying to develop an AI capability in house for an operational need they have identified. In this case, multiple personas are applicable to individuals on that team. Irrespective of the magnitude of the undertaking, this comprehensive toolkit and evaluative framework ascertain the critical dimensions that must be contemplated in order to effectively operationalize RAI. Even small-scale teams should thoroughly delve into all the steps and roles delineated within this resource, thus ensuring due consideration of these pivotal facets.

The toolkit interface is designed to be tailorable/modular according to the specific context of the AI program. The steps within the RAI Assessment follow the entire AI development lifecycle, from intake through use – as depicted in the RAI Strategy & Implementation Pathway (p. 13, diagram below). Users of this toolkit can engage with the material at any point in the project lifecycle to identify relevant RAI questions to consider at that stage and access the available tools to support answering those questions. However, the toolkit is more powerful when used at the start of an AI program and revisited throughout the lifecycle.

AI Lifecycle

Throughout the assessment sections, certain questions are highlighted with the label [GATE]. These items should all be addressed prior to continuing to the next stage in the development path. For users who engage with the toolkit during the intermediate stages of development, it can be constructive to retrospectively examine preceding phases to identify any critical elements, and use the tools highlighted in those respective sections to address them. The [GATE] labels will in some cases need to be tailored or more concretely defined (e.g., in terms of particular performance measures or metrics, thresholds, other evaluation criteria, etc.) by the particular projects, who have the domain expertise to understand what is appropriate for their given case. Future versions of the toolkit will include a Responsibility Flows tool, to ensure proper accountability over the establishment and navigation of the Gates. Additionally, future versions of the toolkit will provide additional guidance in terms of benchmarks and test metrics that would be appropriate for common models and use cases throughout the DoD.

The RAI Toolkit can be leveraged by any project that involves Artificial Intelligence. It will also be adaptable to many data analytics projects as well. For the purposes of the RAI Toolkit, the definition of AI will follow what was laid out in the 2019 NDAA (pp. 62-63):

  1. Any artificial system that performs tasks under varying and unpredictable circumstances without significant human oversight, or that can learn from experience and improve performance when exposed to data sets.
  2. An artificial system developed in computer software, physical hardware, or other context that solves tasks requiring human-like perception, cognition, planning, learning, communication, or physical action.
  3. An artificial system designed to think or act like a human, including cognitive architectures and neural networks.
  4. A set of techniques, including machine learning that is designed to approximate a cognitive task.
  5. An artificial system designed to act rationally, including an intelligent software agent or embodied robot that achieves goals using perception, planning, reasoning, learning, communicating, decision-making, and acting.

How can you use the RAI Toolkit?

DoD AI Ethical Principles & the RAI Strategy and Implementation Pathway

In 2020, the DoD was the first military in the world to adopt AI ethical principles. These principles are based on the US military’s existing ethics framework, grounded in the US Constitution, Title 10 of the U.S. Code, Law of War, and existing international treaties and longstanding norms and values. The principles apply to both combat and non-combat functions and assist the U.S. Military in upholding legal, ethical, and policy commitments in the field of AI. The five principles are:

  1. Responsible: DoD personnel will exercise appropriate levels of judgment and care, while remaining responsible for the development, deployment, and use of AI capabilities.
  2. Equitable: The Department will take deliberate steps to minimize unintended bias in AI capabilities.
  3. Traceable: The Department’s AI capabilities will be developed and deployed such that relevant personnel possess an appropriate understanding of the technology, development processes, and operational methods applicable to AI capabilities, including with transparent and auditable methodologies, data sources, and design procedure and documentation.
  4. Reliable: The Department’s AI capabilities will have explicit, well-defined uses, and the safety, security, and effectiveness of such capabilities will be subject to testing and assurance within those defined uses across their entire life-cycles.
  5. Governable: The Department will design and engineer AI capabilities to fulfill their intended functions while possessing the ability to detect and avoid unintended consequences, and the ability to disengage or deactivate deployed systems that demonstrate unintended behavior.

In 2022, the Deputy Secretary of Defense issued the Responsible AI Strategy & Implementation Pathway, which established 64 Lines of Effort (LOEs) for operationalizing the high-level guidance provided by these principles into concrete resources, processes, and tools for implementing Responsible AI across the DoD. The 64 LOEs are coordinated by the CDAO RAI Division, and owned by RAI leads and teams from across the Components and Services. The capabilities being built out through these LOEs are being continuously integrated into this Toolkit (see Appendix: Roadmap to Future Versions for more detail).

RAI Toolkit Overview

The RAI Toolkit is a centralized process through which personnel supporting AI projects can identify, track, and mitigate RAI-related issues (and capitalize on RAI-related opportunities for innovation) throughout the AI product lifecycle, via the use of tailorable and modular assessments, tools, and artifacts. The process enables traceability and assurance of responsible AI practice, development, and use.

The RAI Toolkit itself is the process by which AI projects can translate high-level principles, policy, and guidance into concrete metrics and technical processes. For example, if an object detection and tracking project were looking to operationalize the governability principle from the DoD AI Ethical Principles, the RAI Toolkit addresses this issue through providing tools for competence estimation. These tools (e.g., uncertainty quantification, out-of-distribution detection [OOD]) help with determining how competent the system is to make a determination in a given case, which addresses part of the governability principle’s clause of “possessing the ability to detect and avoid unintended consequences.”

Due to the wide variety of use cases, risk profiles, and priorities across the DoD the assessment has been built out in a highly modular and tailorable way. As such, it is meant to be adjustable by projects for their particular use case, priorities, and needs. This also means that the different components of the toolkit can be pulled out and used on their own (e.g., the Toolkit’s Tools List, the risk management guidance, the RAI checklist). The RAI Toolkit can also be employed as an educational tool, for individuals or teams to read through or wargame various real or notional projects in order to upskill in RAI.

RAI Toolkit itself is built around the DoD AI Ethical Principles, which are mapped to the tools and artifacts being developed under the RAI Strategy & Implementation Pathway. For this MVP, the majority of the tools are industry-standard, open-source options that serve to illustrate the types of tools that could be used to address a particular issue. Many of these open-source tools listed will eventually be replaced by tools for DoD-use that are currently being built or acquired by the DoD’s RAI team and the RAI Working Council. The inclusion of these open-source tools in the Tools List should not be seen as an endorsement of their use; DoD personnel should still go through normal software deployment approval processes before using them.

RAI Toolkit contains an assessment, the SHIELD Assessment, which walks a project through various steps for identifying and working through RAI-related risks and opportunities at all stages of the product lifecycle. The SHIELD Assessment accomplishes this by guiding projects through the RAI considerations and activities that are needed to ensure responsible design, development, and deployment of an AI-enabled capability. SHIELD breaks these activities down into six sequential categories:

  • Set Foundations Identify the relevant RAI, ethical, legal, and policy foundations for your project – along with potential risks, harms, opportunities, and impacts. Create a list of issues (‘Statements of Concern’ – SOCs) for tracking throughout the product lifecycle.
  • Hone Operationalizations Operationalize the foundations and the SOCs into concrete methods for determining how to assess the extent to which the principles are being met and the issues are being addressed.
  • Improve & Innovate Leverage mitigation tools and activities to improve progress toward meeting the foundations and addressing the SOCs. Scope and implement new innovations for further improving the technology beyond the minimum requirements.
  • Evaluate Progress Benchmark and evaluate the extent to which the foundations are being met, the SOCs are being addressed, and any innovations are improving upon baselines.
  • Log for Traceability Verify documentation is in order to ensure traceability, feed lessons learned into repositories for sharing. Track where new methods of assessment or mitigation are needed.
  • Detect via Continuous Monitoring Continuously monitor system for degradation in performance. Watch innovation ecosystem for new developments that could lead to improvements of the tech.
AI Lifecycle

These six categories map to particular stages of the product lifecycle (with some exceptions).

The SHIELD Assessment was developed alongside and is grounded in the Defense AI Guide on Risk (DAGR; see Appendix 1), which is intended to provide DoD AI stakeholders with guiding principles to promote improved trustworthiness, effectiveness, responsibility, and operations – and help to align AI-enabled capabilities to the DoD AI Ethical Principles, NIST AI Risk Management Framework (AI RMF), best practices, and other governing DoD guidance.

The guidance is holistic, involving an analysis spanning examines Social, Technological, Operational, Political, Economic, and Sustainability (STOPES) factors.

It also aids with mapping the shifting, bidirectional, and interconnected natures of AI risk dynamics and provides a roadmap to quantifying and calculating these risk relationships.

Next year, DAGR will be built out further into an AI Risk Management Framework (RMF) in collaboration with NIST (MVP Q3 2024).

The activities of the SHIELD assessment map to particular tools within the Tools List which are used to act upon the findings from conducting the assessment. Underpinning the SHIELD Assessment is the DAGR (DoD AI Guide on Risk), which is a tool for determining sources of risk. The RAI Toolkit was also built with a RASCI matrix for each item, which sets up clear swimlanes for the team and indicates the level and form of involvement that each member of the team should have for each item. Running through the SHIELD Assessment are the Statements of Concern (SOCs) – short summaries of risks or opportunities that were identified during the harms and impact modelling that have been consolidated for easy reference throughout the product lifecycle. The SOCs are the thread that tie all of the activities and pieces of the Toolkit together.

The RAI Toolkit is currently a voluntary document and process – it does not, in itself, possess any legal or policy authority. Nevertheless, leveraging the RAI Toolkit will help to address the types of concerns that would arise in legal or ATO reviews, in addition to providing the kind of documentation that would be useful to such reviews. Relatedly, the updated DoD Directive 3000.09, Autonomy in Weapons Systems now requires projects to demonstrate that they have plans in place to ensure consistency with the DoD AI Ethical Principles and the DoD Responsible AI Strategy and Implementation Pathway – and the RAI Toolkit could be used for this purpose. As additional directives and guidance emerge, the RAI Toolkit will be updated to ensure that it can be used for these purposes, as well. The overall purpose of the RAI Toolkit, however, is to help manage RAI risk and considerations effectively, and capitalize on opportunities for innovation – in addition to enhancing and promoting justified confidence in the technology and its employment for operational users, commanders, and other stakeholders. The Gates seek to provide flexible points of connection to address the needs of these existing or future review processes, with the larger RAI Toolkit ensuring each project has the technical and programmatic maturity to confidently and successfully defend design decisions in these reviews. Additionally, should a project come under increased scrutiny at any stage, the documentation that the RAI Toolkit provides facilitates the ability to quickly demonstrate a project’s due diligence in identifying and mitigating risk and navigating tradeoffs – enabling the project owners to focus on mission success rather than on actively managing those sources of scrutiny.

Because of these wide variety of use cases and needs across the DoD, and the rapid advancements in the fields of AI and RAI, this Toolkit MVP will be continually updated or iterated upon. We welcome your/your component’s feedback and partnership, so that we can ensure the RAI Toolkit closely tracks your needs and priorities.

Please direct any feedback or questions to the RAI Team via email.

Design Philosophy

PropertyDescription
Modular & TailorableThe RAI Toolkit is built in a modular fashion, so that it can be adjusted and recombined for a project’s purposes – the individual pieces can even be used on their own. Since each item is tagged with relevant labels that enable the content to be sorted, the RAI Toolkit can be tailored manually or automatically (through using the filters).
MinimalistEach item is designed around and filterable through a RASCI Matrix meaning that each member of the team only sees the items they need to see.
IntegratedThe RAI Toolkit can either be integrated and into existing workflows or used entirely as an AI program management tool and tracker – allowing you to coordinate all of your project’s activities with the RAI activities built in.
TraceableThe RAI Toolkit makes documentation seamless, allowing traceability, insight, and historical records into project development, system status, and previous choices and decisions.
UpskillingThe RAI Toolkit is designed to upskill those who use it in topics related to design, human factors, data science, MLOps, TEVV, continuous monitoring, cybersecurity, and RAI. It also links to further resources for additional detail, education, and training.
Iterative & UpdatingThe RAI Toolkit is a living document, continually updating in response to feedback, novel applications and use cases, and the emergence of new technologies.

Ways to Use

Ideally, the RAI Toolkit would be used end-to-end throughout the product lifecycle. Nevertheless, because it is built in a modular and tailorable way, it is possible to adapt the RAI Toolkit to any particular point of the lifecycle. Indeed, because of the wide variety of use cases, risk profiles, and priorities across the DoD, the RAI Toolkit is designed to be tailored according to each project’s needs. Each item has been tagged with various labels to ensure it is easy to navigate to the items of interest. In order to tailor the RAI Toolkit for your project and purposes, you can either do this manually or use the interactive filters to generate a version of RAI Toolkit containing only the relevant items.

This can be filled out asynchronously or collaboratively as a collaborative group exercise.

Other possible uses include:

  • RAI Toolkit LITE: this contains the most critical elements of the RAI Toolkit, which involve the recommended Gates (issues which indicate that progress to the next stage of the lifecycle is not recommended, until they are addressed).
  • Using the individual tools from the tools list in isolation
  • Supporting Documentation Package for 3000.09 Reviews (in development) or for various generative AI/LLM use cases (in development)
  • Supporting Documentation Package for other legal, policy, or ATO review
  • Package to demonstrate consistency with DoD AI Ethical Principles
  • Educational resource for wargaming AI projects

Operationalization of the Principles

The following section provides a close reading of the DoD AI Ethical Principles and indicates the kinds of RAI activities or tools needed to address each piece of each Principle. This work is crucial for demonstrating how the approach to developing the RAI Toolkit was both top-down and bottom-up. The below is the methodology that grounds the top-down approach – mapping high level principles to inform the tools used in the Tools List. This also provided a gaps analysis – indicating where the tools did not yet exist to fully meet the Principle (such that the RAI team would need to focus on developing such tools). The bottom-up approach involved extensive market research to identify the leading tools, activities, assessments, and resources that should be fed into the RAI Toolkit. The methodological approach of both top-down and bottom-up ways of creating the Toolkit ensured holistic coverage.