Error-Correction for AI Safety

Leon Kester

doi:10.1007/978-3-030-52152-3_2

Outline

Error-Correction for AI Safety

Leon Kester

2020, Artificial General Intelligence

https://doi.org/10.1007/978-3-030-52152-3_2

visibility

…

description

10 pages

link

1 file

Abstract

The complex socio-technological debate underlying safetycritical and ethically relevant issues pertaining to AI development and deployment extends across heterogeneous research subfields and involves in part conflicting positions. In this context, it seems expedient to generate a minimalistic joint transdisciplinary basis disambiguating the references to specific subtypes of AI properties and risks for an error-correction in the transmission of ideas. In this paper, we introduce a high-level transdisciplinary system clustering of ethical distinction between antithetical clusters of Type I and Type II systems which extends a cybersecurityoriented AI safety taxonomy with considerations from psychology. Moreover, we review relevant Type I AI risks, reflect upon possible epistemological origins of hypothetical Type II AI from a cognitive sciences perspective and discuss the related human moral perception. Strikingly, our nuanced transdisciplinary analysis yields the figurative formulation of the so-called AI safety paradox identifying AI control and value alignment as conjugate requirements in AI safety. Against this backdrop, we craft versatile multidisciplinary recommendations with ethical dimensions tailored to Type II AI safety. Overall, we suggest proactive and importantly corrective instead of prohibitive methods as common basis for both Type I and Type II AI safety.

Figures (2)

Fig. 1. Taxonomy of pathways to dangerous AI. Adapted from [37]. references to assumed AI risks explicit. For this purpose, we employ and subse- quently extend a cybersecurity-oriented risk taxonomy introduced by Yampol- skiy [37] dis modifying it clustering o played in Figure 1. Taking this taxonomy as point of departure and while considering insights from psychology, an ethically relevant f systems into Type I and Type IT systems with a disparate set of properties and risk instantiations becomes explicitly expressible. Concerning the set of Type representing the complement of the set of Type II systems. Conversely, Type II systems as systems with a scientifically plausible ability to hypothetica systems of which present-day Als represent a subset, we d efine it as we regard act independently, intentionally, deliberately and consciously. Given the contro- versial ambiguities linked to these attributes, we clarify our idiosyncratic use with a working definition for which we do not claim any higher sui general, but Type II systems, which is particularly conceptualized for our line of argum tability in ent. With we refer to systems having the ability to construct counterfac- tual hypotheses about what could happen, what could have happened and why including the abi cluding the possibility of what-if deliberations with counterfactual de ity to simulate “what I could do”, “what I could have done” and the generation of “what if’ questions. (Given this conjunction of a bilities in- pth about self and other, we assume that Type II systems would not represent philosophi- cal zombies. A detailed account of this type of view is provided by Friston in [20] stating e.g. that “the key difference between a conscious and non-conscious me is that the non-conscious me would not be able to formulate a “hard problem”; quite simply because I could not entertain a thought experiment” .)

Fig. 2. Transdisciplinary system clustering of ethical distinction with specified safety and security risks. Internal causes assignments require scientific plausibility (see text). to the risks [7g and Ih for the former as well as [Jt and IT) for the latter subcat- egory respectively. The reason for augmenting the granularity of the taxonomy is that since Type II systems would be capable of intentionality, it is consequent to distinguish between internal causes of risks resulting from intentional actions of the system and risks stemming from its unintentional mistakes as parallel to the consideration of external human-caused risks a and b versus c and d in the matrix. (From the angle of moral psychology, failing to preemptively consider this subtle further distinction could reinforce human biases in the moral percep- tion of Type II AI due to a fundamental reluctance to assign experience [25], allibility and vulnerability to artificial systems which we briefly touch upon in Section 3.2.) Especially, given this modification, the risks [7g and Ih are not necessarily congruent with the original indices g and h, since our working def- inition was not a prerequisite for the attribute “independently” in the original taxonomy. The resulting system clustering is illustrated in Figure 2.

References (37)

Aliman, N.M., Kester, L., Werkhoven, P., Yampolskiy, R.: Orthogonality-Based Disentanglement of Responsibilities for Ethical Intelligent Systems. In: Interna- tional Conference on Artificial General Intelligence. pp. 22-31. Springer (2019)
Aliman, N.M., Kester, L., Werkhoven, P., Ziesche, S.: Sustainable AI Safety? Delphi -Interdisciplinary review of emerging technologies p. to appear (2020)
Atzil, S., Gao, W., Fradkin, I., Barrett, L.F.: Growing a social brain. Nature human behaviour 2(9), 624-636 (2018)
Barrett, L.F.: The theory of constructed emotion: an active inference account of interoception and categorization. Social cognitive and affective neuroscience 12(1), 1-23 (2017)
Barrett, L.F., Simmons, W.K.: Interoceptive predictions in the brain. Nature Re- views Neuroscience 16(7), 419 (2015)
Baum, S.D.: Reconciliation between factions focused on near-term and long-term artificial intelligence. AI & SOCIETY 33(4), 565-572 (2018)
Benedek, M.: The neuroscience of creative idea generation. In: Exploring Trans- disciplinarity in Art and Sciences, pp. 31-48. Springer (2018)
Bieger, J., Thórisson, K.R., Wang, P.: Safe baby AGI. In: International Conference on Artificial General Intelligence. pp. 46-49. Springer (2015)
Bigman, Y.E., Waytz, A., Alterovitz, R., Gray, K.: Holding robots responsible: The elements of machine morality. Trends in cognitive sciences 23(5), 365-368 (2019)
Bostrom, N.: The superintelligent will: Motivation and instrumental rationality in advanced artificial agents. Minds and Machines 22(2), 71-85 (2012)
Brockman, J.: Possible Minds: Twenty-five Ways of Looking at AI. Penguin Press (2019)
Bruineberg, J., Kiverstein, J., Rietveld, E.: The anticipating brain is not a scientist: the free-energy principle from an ecological-enactive perspective. Synthese 195(6), 2417-2444 (2018)
Clark, A., Friston, K., Wilkinson, S.: Bayesing qualia: consciousness as inference, not raw datum. Journal of Consciousness Studies 26(9-10), 19-33 (2019)
Cleeremans, A., Achoui, D., Beauny, A., Keuninckx, L., Martin, J.R., Muñoz- Moldes, S., Vuillaume, L., de Heering, A.: Learning to be conscious. Trends in Cognitive Sciences (2019)
De Rooij, A., Valtulina, J.: The predictive creative mind: A first look at sponta- neous predictions and evaluations during idea generation. Frontiers in psychology 10, 2465 (2019)
Deutsch, D.: Creative blocks. https://aeon.co/essays/ how-close-are-we-to-creating-artificial-intelligence, accessed: 2019-11
Deutsch, D.: The beginning of infinity: Explanations that transform the world. Penguin UK (2011)
Deutsch, D.: Constructor theory. Synthese 190(18), 4331-4359 (2013)
Dietrich, A.: How creativity happens in the brain. Springer (2015)
Friston, K.: Am I self-conscious?(Or does self-organization entail self- consciousness?). Frontiers in psychology 9, 579 (2018)
Friston, K.: A free energy principle for a particular physics. arXiv preprint arXiv:1906.10184 (2019)
Goertzel, B.: The real reasons we don' t have AGI yet. https://www.kurzweilai. net/the-real-reasons-we-dont-have-agi-yet, accessed: 2019-11-21
Goertzel, B.: Infusing advanced AGIs with human-like value systems: Two theses. Journal of Evolution and Technology 26(1), 50-72 (2016)
Gray, K., Schein, C., Ward, A.F.: The myth of harmless wrongs in moral cogni- tion: Automatic dyadic completion from sin to suffering. Journal of Experimental Psychology: General 143(4), 1600 (2014)
Gray, K., Wegner, D.M.: Feeling robots and human zombies: Mind perception and the uncanny valley. Cognition 125(1), 125-130 (2012)
Greenland, S.: Induction versus Popper: substance versus semantics. International Journal of Epidemiology 27(4), 543-548 (1998)
Kleckner, I.R., Zhang, J., Touroutoglou, A., Chanes, L., Xia, C., Simmons, W.K., Quigley, K.S., Dickerson, B.C., Barrett, L.F.: Evidence for a large-scale brain sys- tem supporting allostasis and interoception in humans. Nature human behaviour 1(5), 0069 (2017)
Parr, T., Da Costa, L., Friston, K.: Markov blankets, information geometry and stochastic thermodynamics. Philosophical Transactions of the Royal Society A 378(2164), 20190159 (2019)
Popper, K.: In: Schilpp, P.A. (ed.) The Philosophy of Karl Popper. vol. 2, p. 1015. Open Court Press (1974)
Popper, K.R.: The poverty of historicism. Routledge & Kegan Paul (1966)
Russell, S.: How to Stop Superhuman A.I. Before It Stops Us. https: //www.nytimes.com/2019/10/08/opinion/artificial-intelligence.html? module=inline, accessed: 2019-11-21
Schein, C., Gray, K.: The theory of dyadic morality: Reinventing moral judgment by redefining harm. Personality and Social Psychology Review 22(1), 32-70 (2018)
Schulkin, J., Sterling, P.: Allostasis: A brain-centered, predictive mode of physio- logical regulation. Trends in neurosciences (2019)
Thórisson, K.R., Bieger, J., Li, X., Wang, P.: Cumulative learning. In: International Conference on Artificial General Intelligence. pp. 198-208. Springer (2019)
Wang, P.: Motivation management in AGI systems. In: International Conference on Artificial General Intelligence. pp. 352-361. Springer (2012)
Wiese, W.: Perceptual Presence in the Kuhnian-Popperian Bayesian Brain: A Com- mentary on Anil K. Seth. Johannes Gutenberg-Universität Mainz (2016)
Yampolskiy, R.V.: Taxonomy of pathways to dangerous artificial intelligence. In: Workshops at the Thirtieth AAAI Conference on Artificial Intelligence (2016)

Error-Correction for AI Safety

Sign up for access to the world's latest research

Abstract

Related papers

References (37)

Related papers

Related topics