Page 1 GAO-25-107651 ARTIFICIAL INTELLIGENCE
Generative artificial intelligence (AI) can create content such as text, images,
audio, or video when prompted by a user. Generative AI differs from other AI
systems in its ability to generate novel content, in the vast volumes of data it
requires for training, and in the greater size and complexity of its models.
Commercial developers have created a wide range of generative AI models that
produce text, code, image, and video outputs, as well as products and services
that enhance existing products or support customized development and
refinement of models. Use of generative AI has exploded, with one commercial
developer stating that it has reached more than 200 million weekly active users
for one of its models. Commercial development of generative AI technologies has
rapidly accelerated, with industry continually updating models with new features
and capabilities. However, some stakeholders have raised trust, safety, and
privacy concerns over the use of training data for models and the potential for
harmful outputs.
For this technology assessment, we were asked to describe commercial
development of generative AI technologies. This report provides an overview of
common generative AI development practices, limitations with these technologies
and their susceptibility to attack, and processes commercial developers follow to
collect, use, and store training data for generative AI technologies. This report is
the second in a body of work looking at generative AI. In future reports, we plan
to assess (1) societal and environmental effects of the use of generative AI and
(2) federal research, development, and adoption of generative AI technologies.
• The common practices developers use to facilitate responsible development
and deployment of generative AI technologies include benchmark tests;
development of trust, privacy, and safety policies; use of multi-disciplinary
teams; and red teaming (testing efforts to identify flaws or vulnerabilities).
• Commercial developers face several limitations when developing generative
AI technologies. Commercial developers recognize that despite efforts to
continuously monitor models after deployment, their models may be
susceptible to attacks or may produce outputs that are factually incorrect or
exhibit bias.
• Developers collect data from a variety of sources to train their generative AI
models, including publicly available information, data sourced from third
parties, and user-provided data. However, specifics of the training data used
by commercial developers are not entirely available to the public.
U.S. Government Accountability Office
Artificial Intelligence: Generative AI Training,
Development, and Deployment
Considerations
-25-107651
Report to Congressional Requesters
22, 2024