Academia.eduAcademia.edu

Outline

The Turing Deception

2022, arXiv (Cornell University)

https://doi.org/10.48550/ARXIV.2212.06721

Abstract

This research revisits the classic Turing test and compares recent large language models such as ChatGPT for their abilities to reproduce human-level comprehension and compelling text generation. Two task challenges-summary and question answering-prompt ChatGPT to produce original content (98-99%) from a single text entry and sequential questions initially posed by Turing in 1950. We score the original and generated content against the OpenAI GPT-2 Output Detector from 2019, and establish multiple cases where the generated content proves original and undetectable (98%). The question of a machine fooling a human judge recedes in this work relative to the question of "how would one prove it?" The original contribution of the work presents a metric and simple grammatical set for understanding the writing mechanics of chatbots in evaluating their readability and statistical clarity, engagement, delivery, overall quality, and plagiarism risks. While Turing's original prose scores at least 14% below the machinegenerated output, whether an algorithm displays hints of Turing's true initial thoughts (the "Lovelace 2.0" test) remains unanswerable.

References (25)

  1. Turing, A. M. (1950). Computing Machinery and Intelligence. https://redirect.cs.umbc.edu/courses/471/papers/turing.pdf
  2. Riedl, M. O. (2014). The Lovelace 2.0 test of artificial creativity and intelligence. arXiv preprint arXiv:1410.6142.
  3. Loebner, H. (2009). How to hold a Turing Test contest. In Parsing the Turing test (pp. 173-179). Springer, Dordrecht.
  4. Bradeško, L., & Mladenić, D. (2012, October). A survey of chatbot systems through a Loebner prize competition. In Proceedings of Slovenian language technologies society eighth conference of language technologies (pp. 34-37). Ljubljana, Slovenia: Institut Jožef Stefan.
  5. Russell, S. J., & Norvig, P. (2003). Artificial Intelligence: A Modern Approach (Harlow).
  6. Lowe, R., Noseworthy, M., Serban, I. V., Angelard-Gontier, N., Bengio, Y., & Pineau, J. (2017). Towards an automatic Turing test: Learning to evaluate dialogue responses. arXiv preprint arXiv:1708.07149.
  7. McCoy, J. P., & Ullman, T. D. (2018). A minimal Turing test. Journal of Experimental Social Psychology, 79, 1-8.
  8. Elkins, K., & Chun, J. (2020). Can GPT-3 pass a Writer's Turing test?. Journal of Cultural Analytics, 5(2), 17212.
  9. French, R. M. (2000). The Turing Test: the first 50 years. Trends in cognitive sciences, 4(3), 115-122.
  10. Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2019). Language models are unsupervised multitask learners. OpenAI blog, 1(8), 9.
  11. Radford, A., Narasimhan, K., Salimans, T., & Sutskever, I. (2018). Improving language understanding by generative pre-training. OpenAI, https://cdn.openai.com/research-covers/language- unsupervised/language_understanding_paper.pdf
  12. Floridi, L., & Chiriatti, M. (2020). GPT-3: Its nature, scope, limits, and consequences. Minds and Machines, 30(4), 681-694.
  13. Rousseaux, F., Barkati, K., Bonardi, A., & Vincent, A. (2012). From Alan Turing's Imitation Game to Contemporary Lifestreaming Attempts. Computational Creativity, Concept Invention, and General Intelligence, 1, 45.
  14. BBC News, (2014), Computer AI passes Turing test in 'world first', https://www.bbc.com/news/technology-27762088
  15. Big Think, (2022), "The Turing test: AI still hasn't passed the "imitation game", https://bigthink.com/the- future/turing-test-imitation-game/
  16. OpenAI, (2022), "ChatGPT: Optimizing Language Models for Dialogue", https://openai.com/blog/chatgpt/
  17. Thunstrom, A. O. (2022). We asked GPT-3 to write an academic paper about itself: Then we tried to get it published. Scientific American, 30.
  18. Grammarly, (2022), "How Grammarly's Performance Reports Make You a Stronger Writer", https://www.grammarly.com/blog/grammarly-performance-report/
  19. Google Scholar (2022), https://scholar.google.com/citations?scioq=turing+COMPUTING+MACHINERY+AND+INTELLIGENC E
  20. Nova, M. (2018). Utilizing Grammarly in evaluating academic writing: A narrative research on EFL students' experience. Premise: Journal of English Education and Applied Linguistics, 7(1), 80-96.
  21. Schramowski, P., Turan, C., Andersen, N., Rothkopf, C. A., & Kersting, K. (2022). Large pre-trained language models contain human-like biases of what is right and wrong to do. Nature Machine Intelligence, 4(3), 258-268.
  22. Bostrom, N., & Yudkowsky, E. (2018). The ethics of artificial intelligence. In Artificial intelligence safety and security (pp. 57-69). Chapman and Hall/CRC.
  23. Solaiman, I., Brundage, M., Clark, J., Askell, A., Herbert-Voss, A., Wu, J., ... & Wang, J. (2019). Release strategies and the social impacts of language models. arXiv preprint arXiv:1908.09203. https://github.com/openai/gpt-2-output-dataset/tree/master/detector and https://huggingface.co/openai- detector
  24. Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., ... & Stoyanov, V. (2019). RoBERTA: A robustly optimized BERT pretraining approach. arXiv preprint arXiv:1907.11692.
  25. Searle, J. (2009). Chinese room argument. Scholarpedia, 4(8), 3100.