:::info Author:
(1) Andrew J. Peterson, University of Poitiers ([email protected]).
:::
Table of LinksThe media, filter bubbles and echo chambers
Network effects and Information Cascades
\ Appendix
AbstractWhile artificial intelligence has the potential to process vast amounts of data, generate new insights, and unlock greater productivity, its widespread adoption may entail unforeseen consequences. We identify conditions under which AI, by reducing the cost of access to certain modes of knowledge, can paradoxically harm public understanding. While large language models are trained on vast amounts of diverse data, they naturally generate output towards the ‘center’ of the distribution. This is generally useful, but widespread reliance on recursive AI systems could lead to a process we define as “knowledge collapse”, and argue this could harm innovation and the richness of human understanding and culture. However, unlike AI models that cannot choose what data they are trained on, humans may strategically seek out diverse forms of knowledge if they perceive them to be worthwhile. To investigate this, we provide a simple model in which a community of learners or innovators choose to use traditional methods or to rely on a discounted AI-assisted process and identify conditions under which knowledge collapse occurs. In our default model, a 20% discount on AI-generated content generates public beliefs 2.3 times further from the truth than when there is no discount. Finally, based on the results, we consider further research directions to counteract such outcomes.
IntroductionBefore the advent of generative AI, all text and artwork was produced by humans, in some cases aided by tools or computer systems. The capability of large language models (LLMs) to generate text with near-zero human effort, however, along with models to generate images, audio, and video, suggest that the data to which humans are exposed may come to be dominated by AI-generated or AI-aided processes.
\ Researchers have noted that the recursive training of AI models on synthetic text may lead to degeneration, known as “model collapse” (Shumailov et al., 2023). Our interest is in the inverse of this concern, focusing instead on the equilibrium effects on the distribution of knowledge within human society. We ask under what conditions the rise of AI-generated content and AI-mediated access to information might harm the future of human thought, information-seeking, and knowledge.
\ The initial effect of AI-generated information is presumably limited, and existing work on the harms of AI rightly focuses on the immediate effects of false information spread by “deepfakes” (Heidari et al., 2023), bias in AI algorithms (Nazer et al., 2023), and political misinformation (Chen and Shu, 2023). Our focus has a somewhat longer time horizon, and probes the impact of widespread, rather than marginal adoption.
\ Researchers and engineers are currently building a variety of systems whereby AI would mediate our experience with other humans and with information sources. These range from learning from LLMs (Chen, Chen, and Lin, 2020), ranking or summarizing search results with LLMs (Sharma, Liao, and Xiao, 2024), suggesting search terms or words to write as with traditional autocomplete (Graham, 2023; Chonka, Diepeveen, and Haile, 2023), designing systems to pair collaborators (Ball and Lewis, 2018), LLM-based completion of knowledge bases sourced from Wikipedia (Chen, Razniewski, and Weikum, 2023), interpreting government data (Fisher, 2024) and aiding journalists (Opdahl et al., 2023), to cite only a few from an ever-growing list.
\ Over time, dependence on these systems, and the existence of multifaceted interactions among them, may create a “curse of recursion” (Shumailov et al., 2023), in which our access to the original diversity of human knowledge is increasingly mediated by a partial and increasingly narrow subset of views. With increasing integration of LLM-based systems, certain popular sources or beliefs which were common in the training data may come to be reinforced in the public mindset (and within the training data), while other “long-tail” ideas are neglected and eventually forgotten.
\ Such a process might be reinforced by an ‘echo chamber’ or information cascade effect, in which repeated exposure to this restricted set of information leads individuals to believe that the neglected, unobserved tails of knowledge are of little value. To the extent AI can radically discount the cost of access to certain kinds of information, it may further generate harm through the “streetlight effect”, in which a disproportionate amount of search is done under the lighted area not because it is more likely to contain one’s keys but because it’s easier to look there. We argue that the resulting curtailment of the tails of human knowledge would have significant effects on a range of concerns, including fairness, inclusion of diversity, lost-gains in innovation, and the preservation of the heritage of human culture.
\ In our simulation model, however, we also consider the possibility that humans are strategic in actively curating their information sources. If, as we argue, there is significant value in the tai’ areas of knowledge that come to be neglected by AI-generated content, some individuals may put in additional effort to realize the gains, assuming they are sufficiently informed about the potential value.
Summary of Main ContributionsWe identify a dynamic whereby AI, despite only reducing the cost of access to certain kinds of information, may lead to “knowledge collapse,” neglecting the long-tails of knowledge and creating an degenerately narrow perspective over generations. We provide a positive knowledge spillovers model with in which individuals decide whether to rely on cheaper AI technology or invest in samples from the full distribution of true knowledge. We examine through simulations the conditions under which individuals are sufficiently informed to prevent knowledge collapse within society. Finally, we conclude with an overview of possible solutions to prevent knowledge collapse in the AI-era.
\
:::info This paper is available on arxiv under CC BY-NC-SA 4.0 DEED license.
:::
\
All Rights Reserved. Copyright , Central Coast Communications, Inc.