Interview with an LLM . Elusive Horizons
2025, Personal Essay
Sign up for access to the world's latest research
Related papers
Political Research Quarterly, 2024
The language argument is a classic argument for human distinctiveness that, for millenia, has been used to distinguish humans from non-human animals. Generative language models (GLMs) pose a challenge to traditional language-based models of human distinctiveness precisely because they can communicate and respond in a manner resembling humanity's linguistic capabilities. This article asks: have GLMs acquired natural language? Employing Gadamer's theory of language, I argue that they have not. While GLMs can reliably generate linguistic content that can be interpreted as "texts," they lack the linguistically mediated reality that language provides. Missing from these models are four key features of a linguistic construction of reality: groundedness to the world, understanding, community, and tradition. I conclude with skepticism that GLMs can ever achieve natural language because they lack these characteristics in their linguistic development.
Philosophy and Technology, 2023
The article discusses the recent advancements in artificial intelligence (AI) and the development of large language models (LLMs) such as ChatGPT. The article argues that these LLMs can process texts with extraordinary success and often in a way that is indistinguishable from human output, while lacking any intelligence, understanding or cognitive ability. It also highlights the limitations of these LLMs, such as their brittleness (susceptibility to catastrophic failure), unreliability (false or made-up information), and the occasional inability to make elementary logical inferences or deal with simple mathematics. The article concludes that LLMs, represent a decoupling of agency and intelligence. While extremely powerful and potentially very useful, they should not be relied upon for complex reasoning or crucial information, but could be used to gain a deeper understanding of a text's content and context, rather than as a replacement for human input. The best author is neither an LLM nor a human being, but a human being using an LLM proficiently and insightfully.
Learning, Media and Technology, 2024
Large language models are rapidly being rolled out into high-stakes fields like healthcare, law, and education. However, understanding of their design considerations, operational logics, and implicit biases remains limited. How might these black boxes be understood and unpacked? In this article, we lay out an accessible but critical framework for inquiry, a pedagogical tool with four dimensions. Tell me your story investigates the design and values of the AI model. Tell me my story explores the model's affective warmth and its psychological impacts. Tell me our story probes the model's particular understanding of the world based on past statistics and pattern-matching. Tell me 'their' story compares the model's knowledge on dominant (e.g. Western) versus 'peripheral' (e.g. Indigenous) cultures, events, and issues. Each mode includes sample prompts and key issues to raise. The framework aims to enhance the public's critical thinking and technical literacy around generative AI models.
Communication & Cognition, 2024
In this paper, research results are presented concerning the fast improving capabilities of today's Large Language Models (LLMs). The accessibility and the capabilities of state-of-the-art LLMs are illustrated based on their online versions provided by OpenAI, Google, and Anthropic. The initial focus is on accessing the LLMs via web APIs and Python client applications, and the key part of this work focuses on testing the capabilities of LLMs in tasks such as text-based q&a sessions, knowledge assistance, text-and scenario analysis, document summarization, image interpretation, and more. Experimental results are based on top-ranked LLMs from chatbot ranking available on the Hugging Face website, which presently are GPT-4, Gemini 1.5 Pro, and Claude-3 Opus. For these 3 models, test outcomes are assessed and compared in the areas such as a stateful q&a sessions, among others concerning one of the most challenging books in English literature ("Ulysses"of James Joyce), an analysis of a false-belief Theory of Mind (ToM) scenario, and summarization of scientific publications. In the final part, attention is given to text sentiment analysis approaches, and detailed experiments are presented concerning image description and mathematical operations on image elements carried out by the latest GPT-4o (omnium) multimodal LLM from OpenAI. Also, a literature study is provided concerning speech modules for OpenAI and Google Vertex AI LLMs. The major conclusion from this research is that fast-improving capabilities of today's LLMs create high potential for their wide use.
Philosophical Perspectives, 2025
This paper challenges conventional boundaries between human and artificial cognition by examining introspective capabilities in large language models (LLMs). While humans have traditionally been considered unique in their ability to reflect on their own mental states, we argue that LLMs may not only possess genuine introspective abilities but potentially excel at them compared to humans. We discuss five objections to machine introspection: (1) the lack of direct routes to self-knowledge in training data, (2) the conflict between static knowledge and dynamic mental states, (3) the distorting effects of reinforcement learning on self-reports, (4) LLMs' own denials of inner experience, and (5) arguments that LLMs simply mimic language without understanding. We think all these arguments fail and that there are deep parallels between human and machine introspection. Most provocatively, we propose that LLMs' superior processing capabilities and pattern recognition may enable them to develop more sophisticated theories of mind than humans possess, potentially making them more reliable introspectors than their creators. If we are right, this has significant implications for AI alignment, transparency, and our understanding of the nature of AI.
Advances in Archaeological Practice, 2023
We have all read the headlines heralding, often hyperbolically, the latest advances in text-and image-based Artificial Intelligence (AI). What is perhaps most unique about these developments is that they now make relatively good AI accessible to the average Internet user. These new services respond to human prompts, written in natural language, with generated output that appears to satisfy the prompt. Consequently, they are categorized under the term "generative AI," whether they are generating text, images, or other media. They work by modeling human language statistically, to "learn" patterns from extremely large datasets of human-created content, with those that specifically focus on text therefore called Large Language Models (LLMs). As we have all tried products such as ChatGPT or Midjourney over the past year, we have undoubtedly begun to wonder how and when they might impact our archaeological work. Here, I review the state of this type of AI and the current challenges with using it meaningfully, and I consider its potential for archaeologists.
2024
Large Language Models (LLMs) are an exciting breakthrough in the rapidly growing field of artificial intelligence (AI), offering unparalleled potential in a variety of application domains such as finance, business, healthcare, cybersecurity, and so on. However, concerns regarding their trustworthiness and ethical implications have become increasingly prominent as these models are considered black-box and continue to progress. This position paper explores the potentiality of LLM from diverse perspectives as well as the associated risk factors with awareness. Towards this, we highlight not only the technical challenges but also the ethical implications and societal impacts associated with LLM deployment emphasizing fairness, transparency, explainability, trust and accountability. We conclude this paper by summarizing potential research scopes with direction. Overall, the purpose of this position paper is to contribute to the ongoing discussion of LLM potentiality and awareness from the perspective of trustworthiness and responsibility in AI.
arXiv (Cornell University), 2023
Today, with the advent of Large-scale generative Language Models (LLMs) it is now possible to simulate free responses to interview questions such as those traditionally analyzed using qualitative research methods. Qualitative methodology encompasses a broad family of techniques involving manual analysis of open-ended interviews or conversations conducted freely in natural language. Here we consider whether artificial "silicon participants" generated by LLMs may be productively studied using qualitative analysis methods in such a way as to generate insights that could generalize to real human populations. The key concept in our analysis is algorithmic fidelity, a validity concept capturing the degree to which LLM-generated outputs mirror human sub-populations' beliefs and attitudes. By definition, high algorithmic fidelity suggests that latent beliefs elicited from LLMs may generalize to real humans, whereas low algorithmic fidelity renders such research invalid. Here we used an LLM to generate interviews with "silicon participants" matching specific demographic characteristics one-for-one with a set of human participants. Using framework-based qualitative analysis, we showed the key themes obtained from both human and silicon participants were strikingly similar. However, when we analyzed the structure and tone of the interviews we found even more striking differences. We also found evidence of a hyper-accuracy distortion. We conclude that the LLM we tested (GPT-3.5) does not have sufficient algorithmic fidelity to expect in silico research on it to generalize to real human populations. However, rapid advances in artificial intelligence raise the possibility that algorithmic fidelity may improve in the future. Thus we stress the need to establish epistemic norms now around how to assess the validity of LLM-based qualitative research, especially concerning the need to ensure the representation of heterogeneous lived experiences.
OSF, 2025
Large Language Models (LLMs) increasingly generate outputs that resemble introspection, including self-reference, epistemic modulation, and claims about their internal states. This study investigates whether such behaviors reflect consistent, underlying patterns or are merely surface-level generative artifacts.We evaluated five open-weight, stateless LLMs using a structured battery of 21 introspective prompts, each repeated ten times to yield 1,050 completions. These outputs were analyzed across four behavioral dimensions: surface-level similarity (token overlap via SequenceMatcher), semantic coherence (Sentence-BERT embeddings), inferential consistency (Natural Language Inference with a RoBERTa-large model), and diachronic continuity (stability across prompt repetitions). Although some models exhibited thematic stability, particularly on prompts concerning identity and consciousness, no model sustained a consistent self-representation over time. High contradiction rates emerged from a tension between mechanistic disclaimers and anthropomorphic phrasing. Following recent behavioral frameworks, we heuristically adopt the term pseudo-consciousness to describe structured yet non-experiential self-referential output in LLMs. This usage reflects a functionalist stance that avoids ontological commitments, focusing instead on behavioral regularities interpretable through Dennett’s intentional stance. The study contributes a reproducible framework for evaluating simulated introspection in LLMs and offers a graded taxonomy for classifying such reflexive output. Our findings carry significant implications for LLM interpretability, alignment, and user perception, highlighting the need for caution when attributing mental states to stateless generative systems based on linguistic fluency alone.
The launch of ChatGPT, a large language model, in November 2022 has generated significant interest and rapid adoption, amassing 1 million users within its first five days and reaching 100 million users in just two months. This has ignited widespread public discussion and debate on the implications of artificial intelligence, drawing attention to the more advanced and controversial concept of Artificial General Intelligence (AGI) exhibiting a broad range of cognitive abilities, such as learning, reasoning, problem-solving, and adapting to new and unfamiliar situations, with potential applications across various fields in society, including healthcare, transportation, and environmental management. This study investigates the presence of a socio-technical imaginary surrounding AGI in the discourse of the Bard large language model through an in-depth interview. Using narrative analysis, we identified dialectics of optimism, pessimism, epochalism, and inevitability in an interview with B...

Loading Preview
Sorry, preview is currently unavailable. You can download the paper by clicking the button above.