The role of explainable AI in the context of the AI Act
2023 ACM Conference on Fairness, Accountability, and Transparency
https://doi.org/10.1145/3593013.3594069Abstract
The proposed EU regulation for Artificial Intelligence (AI), the AI Act, has sparked some debate about the role of explainable AI (XAI) in high-risk AI systems. Some argue that black-box AI models will have to be replaced with transparent ones, others argue that using XAI techniques might help in achieving compliance. This work aims to bring some clarity as regards XAI in the context of the AI Act and focuses in particular on the AI Act requirements for transparency and human oversight. After outlining key points of the debate and describing the current limitations of XAI techniques, this paper carries out an interdisciplinary analysis of how the AI Act addresses the issue of opaque AI systems. In particular, we argue that neither does the AI Act mandate a requirement for XAI, which is the subject of intense scientific research and is not without technical limitations, nor does it ban the use of black-box AI systems. Instead, the AI Act aims to achieve its stated policy objectives with the focus on transparency (including documentation) and human oversight. Finally, in order to concretely illustrate our findings and conclusions, a use case on AI-based proctoring is presented.
References (124)
- 2016. Regulation (EU) 2016/679 of the European Parliament and of the Council of 27 April 2016 on the Protection of Natural Persons with Regard to the Pro- cessing of Personal Data and on the Free Movement of Such Data, and Repealing Directive 95/46/EC (General Data Protection Regulation).
- 2022. Draft standardisation request to the European Standardisation Organ- isations in support of safe and trustworthy artificial intelligence. https: //ec.europa.eu/docsroom/documents/52376
- Ashraf Abdul, Jo Vermeulen, Danding Wang, Brian Y Lim, and Mohan Kankan- halli. 2018. Trends and trajectories for explainable, accountable and intelligible systems: An hci research agenda. In Proceedings of the 2018 CHI conference on human factors in computing systems. 1-18.
- Alessandro Achille and Stefano Soatto. 2018. Emergence of invariance and disentanglement in deep representations. The Journal of Machine Learning Research 19, 1 (2018), 1947-1980.
- Amina Adadi and Mohammed Berrada. 2018. Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE access 6 (2018), 52138- 52160.
- Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, and Been Kim. 2018. Sanity checks for saliency maps. Advances in neural information processing systems 31 (2018).
- Ulrich Aïvodji, Hiromi Arai, Sébastien Gambs, and Satoshi Hara. 2021. Char- acterizing the risk of fairwashing. Advances in Neural Information Processing Systems 34 (2021), 14822-14834.
- David Alvarez-Melis and Tommi S Jaakkola. 2018. On the robustness of inter- pretability methods. arXiv preprint arXiv:1806.08049 (2018).
- José P Amorim, Pedro H Abreu, João Santos, Marc Cortes, and Victor Vila. 2023. Evaluating the faithfulness of saliency maps in explaining deep learning models using realistic perturbations. Information Processing & Management 60, 2 (2023), 103225.
- Elvio Amparore, Alan Perotti, and Paolo Bajardi. 2021. To trust or not to trust an explanation: using LEAF to evaluate local linear XAI methods. PeerJ Computer Science 7 (2021), e479.
- Christopher Anders, Plamen Pasliev, Ann-Kathrin Dombrowski, Klaus-Robert Müller, and Pan Kessel. 2020. Fairwashing explanations with off-manifold detergent. In International Conference on Machine Learning. PMLR, 314-323.
- Daniel W Apley and Jingyu Zhu. 2020. Visualizing the effects of predictor variables in black box supervised learning models. Journal of the Royal Statistical Society: Series B (Statistical Methodology) 82, 4 (2020), 1059-1086.
- Nishanth Arun, Nathan Gaw, Praveer Singh, Ken Chang, Mehak Aggarwal, Bryan Chen, Katharina Hoebel, Sharut Gupta, Jay Patel, Mishka Gidwani, et al. 2020. Assessing the (un) trustworthiness of saliency maps for localizing abnor- malities in medical imaging. arXiv preprint arXiv:2008.02766 (2020).
- Tadas Baltrusaitis, Amir Zadeh, Yao Chong Lim, and Louis-Philippe Morency. 2018. Openface 2.0: Facial behavior analysis toolkit. In 2018 13th IEEE inter- national conference on automatic face & gesture recognition (FG 2018). IEEE, 59-66.
- David Bau, Jun-Yan Zhu, Hendrik Strobelt, Agata Lapedriza, Bolei Zhou, and Antonio Torralba. 2020. Understanding the role of individual units in a deep neural network. Proceedings of the National Academy of Sciences 117, 48 (2020), 30071-30078.
- Umang Bhatt, Alice Xiang, Shubham Sharma, Adrian Weller, Ankur Taly, Yun- han Jia, Joydeep Ghosh, Ruchir Puri, José MF Moura, and Peter Eckersley. 2020. Explainable machine learning in deployment. In Proceedings of the 2020 confer- ence on fairness, accountability, and transparency. 648-657.
- Sebastian Bordt, Michèle Finck, Eric Raidl, and Ulrike von Luxburg. 2022. Post- Hoc Explanations Fail to Achieve Their Purpose in Adversarial Contexts. In 2022 ACM Conference on Fairness, Accountability, and Transparency (Seoul, Republic of Korea) (FAccT '22). Association for Computing Machinery, New York, NY, USA, 891-905. https://doi.org/10.1145/3531146.3533153
- Zana Buçinca, Phoebe Lin, Krzysztof Z Gajos, and Elena L Glassman. 2020. Proxy tasks and subjective measures can be misleading in evaluating explainable AI systems. In Proceedings of the 25th international conference on intelligent user interfaces. 454-464.
- Zana Buçinca, Maja Barbara Malaya, and Krzysztof Z Gajos. 2021. To trust or to think: cognitive forcing functions can reduce overreliance on AI in AI-assisted decision-making. Proceedings of the ACM on Human-Computer Interaction 5, CSCW1 (2021), 1-21.
- Joy Buolamwini and Timnit Gebru. 2018. Gender shades: Intersectional accu- racy disparities in commercial gender classification. In Conference on fairness, accountability and transparency. PMLR, 77-91.
- Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. 2015. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In Proceedings of the 21th ACM SIGKDD inter- national conference on knowledge discovery and data mining. 1721-1730.
- Chaofan Chen, Oscar Li, Daniel Tao, Alina Barnett, Cynthia Rudin, and Jonathan K Su. 2019. This looks like that: deep learning for interpretable image recognition. Advances in neural information processing systems 32 (2019).
- European Commission. 2020. White Paper: On Artificial Intelli- gence -A European Approach to Excellence and Trust. https: //commission.europa.eu/publications/white-paper-artificial-intelligence- european-approach-excellence-and-trust_en
- Roberto Confalonieri, Ludovik Coba, Benedikt Wagner, and Tarek R Besold. 2021. A historical perspective of explainable Artificial Intelligence. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 11, 1 (2021), e1391.
- US Congress. 2022. Algorithmic Accountability Act of 2022. https://www. congress.gov/bill/117th-congress/senate-bill/3572/text
- Rogier Creemers, Graham Webster, and Helen Toner. 2022. Translation: Internet Information Service Algorithmic Recommendation Management Provisions -Effective March 1, 2022. https://digichina.stanford.edu/work/translation- internet-information-service-algorithmic-recommendation-management- provisions-effective-march-1-2022/
- Francesco Croce and Matthias Hein. 2019. Sparse and imperceivable adversarial attacks. In Proceedings of the IEEE/CVF International Conference on Computer Vision. 4724-4732.
- Botty Dimanov, Umang Bhatt, Mateja Jamnik, and Adrian Weller. 2020. You Shouldn't Trust Me: Learning Models Which Conceal Unfairness From Multiple Explanation Methods.. In SafeAI@ AAAI.
- Ann-Kathrin Dombrowski, Maximillian Alber, Christopher Anders, Marcel Ackermann, Klaus-Robert Müller, and Pan Kessel. 2019. Explanations can be manipulated and geometry is to blame. Advances in Neural Information Processing Systems 32 (2019).
- Ann-Kathrin Dombrowski, Christopher J Anders, Klaus-Robert Müller, and Pan Kessel. 2022. Towards robust explanations for deep neural networks. Pattern Recognition 121 (2022), 108194.
- Finale Doshi-Velez and Been Kim. 2017. Towards a rigorous science of inter- pretable machine learning. arXiv preprint arXiv:1702.08608 (2017).
- Upol Ehsan, Samir Passi, Q Vera Liao, Larry Chan, I Lee, Michael Muller, Mark O Riedl, et al. 2021. The who in explainable ai: How ai background shapes percep- tions of ai explanations. arXiv preprint arXiv:2107.13509 (2021).
- Malin Eiband, Daniel Buschek, Alexander Kremer, and Heinrich Hussmann. 2019. The impact of placebic explanations on trust in intelligent systems. In Extended abstracts of the 2019 CHI conference on human factors in computing systems. 1-6.
- Fabian Eitel, Kerstin Ritter, Alzheimer's Disease Neuroimaging Initiative (ADNI, et al. 2019. Testing the robustness of attribution methods for convolutional neu- ral networks in MRI-based Alzheimer's disease classification. In Interpretability of Machine Intelligence in Medical Image Computing and Multimodal Learning for Clinical Decision Support. Springer, 3-11.
- Rosenberg Ekman. 1997. What the face reveals: Basic and applied studies of spontaneous expression using the Facial Action Coding System (FACS). Oxford University Press, USA.
- European Commission. 2018. Artificial Intelligence for Europe. Technical Report COM(2018) 137 final. https://eur-lex.europa.eu/legal-content/EN/TXT/PDF/ ?uri=CELEX:52018DC0237&from=EN
- European Commission. 2018. Coordinated Plan on Artificial Intelligence. Techni- cal Report COM(2018) 795 final.
- European Commission. 2019. Building Trust in Human-Centric Artificial Intelli- gence. Technical Report COM(2019) 168 final. https://eur-lex.europa.eu/legal- content/EN/TXT/HTML/?uri=CELEX:52019DC0168&from=EN
- European Commission. 2021. Proposal for a Regulation laying down harmonised rules on Artificial Intelligence and amending certain union legislative acts. https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=celex%3A52021PC0206
- Heike Felzmann, Eduard Fosch-Villaronga, Christoph Lutz, and Aurelia Tamò- Larrieux. 2020. Towards transparency by design for artificial intelligence. Science and Engineering Ethics 26, 6 (2020), 3333-3361.
- M. Fink and M. Finck. 2022. Reasoned A(I)Administration: Explanation Require- ments in EU Law and the Automation of Public Administration. European Law Review 47, 3 (2022), 376-392.
- Samuel G Finlayson, Hyung Won Chung, Isaac S Kohane, and Andrew L Beam. 2018. Adversarial attacks against medical deep learning systems. arXiv preprint arXiv:1804.05296 (2018).
- Aaron Fisher, Cynthia Rudin, and Francesca Dominici. 2019. All Models are Wrong, but Many are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously. J. Mach. Learn. Res. 20, 177 (2019), 1-81.
- G20. 2019. G20 Ministerial Statement on Trade and Digital Economy. https: //wp.oecd.ai/app/uploads/2021/06/G20-AI-Principles.pdf
- Amirata Ghorbani, Abubakar Abid, and James Zou. 2019. Interpretation of neural networks is fragile. In Proceedings of the AAAI conference on artificial intelligence, Vol. 33. 3681-3688.
- Kate Goddard, Abdul Roudsari, and Jeremy C Wyatt. 2012. Automation bias: a systematic review of frequency, effect mediators, and mitigators. Journal of the American Medical Informatics Association 19, 1 (2012), 121-127.
- Riccardo Guidotti. 2021. Evaluating local explanation methods on ground truth. Artificial Intelligence 291 (2021), 103428.
- Riccardo Guidotti, Anna Monreale, Salvatore Ruggieri, Franco Turini, Fosca Giannotti, and Dino Pedreschi. 2018. A survey of methods for explaining black box models. ACM computing surveys (CSUR) 51, 5 (2018), 1-42.
- David Gunning. 2017. Explainable artificial intelligence (xai). Defense Advanced Research Projects Agency (DARPA), nd Web 2 (2017).
- Ronan Hamon, Henrik Junklewitz, Ignacio Sanchez, Gianclaudio Malgieri, and Paul De Hert. 2022. Bridging the Gap Between AI and Explainability in the GDPR: Towards Trustworthiness-by-Design in Automated Decision-Making. IEEE Computational Intelligence Magazine 17, 1 (2022), 72-85. https://doi.org/ 10.1109/MCI.2021.3129960
- Peter Hase, Harry Xie, and Mohit Bansal. 2021. The out-of-distribution prob- lem in explainability and search methods for feature importance explanations. Advances in Neural Information Processing Systems 34 (2021), 3650-3666.
- Dan Hendrycks and Thomas Dietterich. 2019. Benchmarking neural net- work robustness to common corruptions and perturbations. arXiv preprint arXiv:1903.12261 (2019).
- Robert R Hoffman, Shane T Mueller, Gary Klein, and Jordan Litman. 2018. Met- rics for explainable AI: Challenges and prospects. arXiv preprint arXiv:1812.04608 (2018).
- Fred Hohman, Andrew Head, Rich Caruana, Robert DeLine, and Steven M Drucker. 2019. Gamut: A design probe to understand how data scientists un- derstand machine learning models. In Proceedings of the 2019 CHI conference on human factors in computing systems. 1-13.
- Andreas Holzinger, Anna Saranti, Christoph Molnar, Przemyslaw Biecek, and Wojciech Samek. 2022. Explainable AI methods-a brief overview. In Interna- tional Workshop on Extending Explainable AI Beyond Deep Models and Classifiers. Springer, 13-38.
- The White House. 2022. Blueprint for an AI Bill of Rights. https://www. whitehouse.gov/ostp/ai-bill-of-rights/
- Xiyang Hu, Cynthia Rudin, and Margo Seltzer. 2019. Optimal sparse decision trees. Advances in Neural Information Processing Systems 32 (2019).
- Isabelle Hupont, Songül Tolan, Hatice Gunes, and Emilia Gómez. 2022. The Landscape of Facial Processing Applications in the Context of the European AI Act and the Development of Trustworthy Systems. Nature Scientific Reports (2022).
- Sarthak Jain and Byron C Wallace. 2019. Attention is not explanation. arXiv preprint arXiv:1902.10186 (2019).
- Yunzhe Jia, James Bailey, Kotagiri Ramamohanarao, Christopher Leckie, and Xingjun Ma. 2020. Exploiting patterns to explain individual predictions. Knowl- edge and Information Systems 62, 3 (2020), 927-950.
- Margot E. Kaminski. 2019. The Right to Explanation, Explained. Berkeley Technology Law Journal 34 (2019), 189. https://doi.org/10.15779/Z38TD9N83H
- Harmanpreet Kaur, Harsha Nori, Samuel Jenkins, Rich Caruana, Hanna Wallach, and Jennifer Wortman Vaughan. 2020. Interpreting interpretability: under- standing data scientists' use of interpretability tools for machine learning. In Proceedings of the 2020 CHI conference on human factors in computing systems. 1-14.
- Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T Schütt, Sven Dähne, Dumitru Erhan, and Been Kim. 2019. The (un) reliability of saliency methods. In Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. Springer, 267-280.
- Anastasiya Kiseleva. 2021. Making AI's Transparency Transparent: notes on the EU Proposal for the AI Act. https://europeanlawblog.eu/2021/07/29/making- ais-transparency-transparent-notes-on-the-eu-proposal-for-the-ai-act/
- Erwan Le Merrer and Gilles Trédan. 2020. Remote explainability faces the bouncer problem. Nature Machine Intelligence 2, 9 (2020), 529-539.
- John D Lee and Katrina A See. 2004. Trust in automation: Designing for appro- priate reliance. Human factors 46, 1 (2004), 50-80.
- Benjamin Letham, Cynthia Rudin, Tyler H McCormick, and David Madigan. 2015. Interpretable classifiers using rules and bayesian analysis: Building a better stroke prediction model. The Annals of Applied Statistics 9, 3 (2015), 1350-1371.
- Ariel Levy, Monica Agrawal, Arvind Satyanarayan, and David Sontag. 2021. Assessing the impact of automated suggestions on decision making: Domain experts mediate model errors but take less initiative. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1-13.
- Gabriel Lima, Nina Grgić-Hlača, Jin Keun Jeong, and Meeyoung Cha. 2022. The Conflict Between Explainable and Accountable Decision-Making Algorithms. In 2022 ACM Conference on Fairness, Accountability, and Transparency (Seoul, Republic of Korea) (FAccT '22). Association for Computing Machinery, New York, NY, USA, 2103-2113. https://doi.org/10.1145/3531146.3534628
- Zachary C Lipton. 2018. The mythos of model interpretability: In machine learning, the concept of interpretability is both important and slippery. Queue 16, 3 (2018), 31-57.
- Jiachang Liu, Chudi Zhong, Margo Seltzer, and Cynthia Rudin. 2022. Fast Sparse Classification for Generalized Linear and Additive Models. Proceedings of machine learning research 151 (2022), 9304.
- Pedro Lopes, Eduardo Silva, Cristiana Braga, Tiago Oliveira, and Luís Rosado. 2022. XAI Systems Evaluation: A Review of Human and Computer-Centred Methods. Applied Sciences 12, 19 (2022), 9423.
- Scott M Lundberg and Su-In Lee. 2017. A unified approach to interpreting model predictions. Advances in neural information processing systems 30 (2017).
- Gianclaudio Malgieri and Giovanni Comandé. 2017. Why a right to legibility of automated decision-making exists in the general data protection regulation. International Data Privacy Law 7, 4 (2017), 243-265.
- Gabriele Mazzini and Salvatore Scalzo. 2022. The Proposal for the Artificial Intelligence Act: Considerations around Some Key Concepts. La via europea per l'Intelligenza artificiale. Atti del Convegno del Progetto Dottorale di Alta Formazione in Scienze Giuridiche-Ca'Foscari Venezia, 25-26 novembre 2021 (2022). https://doi.org/10.2139/ssrn.4098809
- Tim Miller. 2019. Explanation in artificial intelligence: Insights from the social sciences. Artificial intelligence 267 (2019), 1-38.
- Tim Miller, Piers Howe, and Liz Sonenberg. 2017. Explainable AI: Beware of inmates running the asylum or: How I learnt to stop worrying and love the social and behavioural sciences. arXiv preprint arXiv:1712.00547 (2017).
- Sina Mohseni, Haotao Wang, Chaowei Xiao, Zhiding Yu, Zhangyang Wang, and Jay Yadawa. 2021. Taxonomy of Machine Learning Safety: A Survey and Primer. ACM Computing Surveys (CSUR) (2021).
- Christoph Molnar, Giuseppe Casalicchio, and Bernd Bischl. 2020. Interpretable machine learning-a brief history, state-of-the-art and challenges. In Joint Eu- ropean Conference on Machine Learning and Knowledge Discovery in Databases. Springer, 417-431.
- Johanna D Moore and William R Swartout. 1988. Explanation in expert sys- temss: A survey. Technical Report. UNIVERSITY OF SOUTHERN CALIFORNIA MARINA DEL REY INFORMATION SCIENCES INST.
- Jan Nikolas Morshuis, Sergios Gatidis, Matthias Hein, and Christian F Baumgart- ner. 2022. Adversarial Robustness of MR Image Reconstruction Under Realistic Perturbations. In International Workshop on Machine Learning for Medical Image Reconstruction. Springer, 24-33.
- W James Murdoch, Chandan Singh, Karl Kumbier, Reza Abbasi-Asl, and Bin Yu. 2019. Definitions, methods, and applications in interpretable machine learning. Proceedings of the National Academy of Sciences 116, 44 (2019), 22071-22080.
- Meike Nauta, Jan Trienes, Shreyasi Pathak, Elisa Nguyen, Michelle Peters, Yas- min Schmitt, Jörg Schlötterer, Maurice van Keulen, and Christin Seifert. 2022. From anecdotal evidence to quantitative evaluation methods: A systematic review on evaluating explainable ai. arXiv preprint arXiv:2201.08164 (2022).
- Ian E Nielsen, Dimah Dera, Ghulam Rasool, Ravi P Ramachandran, and Nid- hal Carla Bouaynaya. 2022. Robust explainability: A tutorial on gradient-based attribution methods for deep neural networks. IEEE Signal Processing Magazine 39, 4 (2022), 73-84.
- Mahsan Nourani, Joanie King, and Eric Ragan. 2020. The role of domain expertise in user trust and the impact of first impressions with intelligent systems. In Proceedings of the AAAI Conference on Human Computation and Crowdsourcing, Vol. 8. 112-121.
- Ziad Obermeyer, Brian Powers, Christine Vogeli, and Sendhil Mullainathan. 2019. Dissecting racial bias in an algorithm used to manage the health of populations. Science 366, 6464 (2019), 447-453.
- OECD. 2019. Recommendation of the Council on Artificial Intelligence OECD/LEGAL/0449.
- High Level Expert Group on Artificial Intelligence. 2019. Ethics Guidelines for Trustworthy AI.
- Cecilia Panigutti, Andrea Beretta, Daniele Fadda, Fosca Giannotti, Dino Pe- dreschi, Alan Perotti, and Salvatore Rinzivillo. 2023. Co-design of human- centered, explainable AI for clinical decision support. ACM Transactions on Interactive Intelligent Systems (2023).
- Cecilia Panigutti, Andrea Beretta, Fosca Giannotti, and Dino Pedreschi. 2022. Understanding the impact of explanations on advice-taking: a user study for AI-based clinical Decision Support Systems. In CHI Conference on Human Factors in Computing Systems. 1-9.
- Andrea Papenmeier, Dagmar Kern, Gwenn Englebienne, and Christin Seifert. 2022. It's Complicated: The Relationship between User Trust, Model Accuracy and Explanations in AI. ACM Transactions on Computer-Human Interaction (TOCHI) 29, 4 (2022), 1-33.
- Danish Pruthi, Mansi Gupta, Bhuwan Dhingra, Graham Neubig, and Zachary C Lipton. 2019. Learning to deceive with attention-based explanations. arXiv preprint arXiv:1909.07913 (2019).
- Luyu Qiu, Yi Yang, Caleb Chen Cao, Jing Liu, Yueyuan Zheng, Hilary Hei Ting Ngai, Janet Hsiao, and Lei Chen. 2021. Resisting out-of-distribution data problem in perturbation of xai. arXiv preprint arXiv:2107.14000 (2021).
- Luyu Qiu, Yi Yang, Caleb Chen Cao, Yueyuan Zheng, Hilary Ngai, Janet Hsiao, and Lei Chen. 2022. Generating Perturbation-based Explanations with Robust- ness to Out-of-Distribution Data. In Proceedings of the ACM Web Conference 2022. 3594-3605.
- Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. " Why should i trust you?" Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. 1135-1144.
- Cynthia Rudin, Chaofan Chen, Zhi Chen, Haiyang Huang, Lesia Semenova, and Chudi Zhong. 2022. Interpretable machine learning: Fundamental principles and 10 grand challenges. Statistics Surveys 16 (2022), 1-85.
- Mirka Saarela and Lilia Geogieva. 2022. Robustness, Stability, and Fidelity of Explanations for a Deep Skin Cancer Classification Model. Applied Sciences 12, 19 (2022), 9545.
- Rabia Saleem, Bo Yuan, Fatih Kurugollu, Ashiq Anjum, and Lu Liu. 2022. Ex- plaining Deep Neural Networks: A Survey on the Global Interpretation Methods. Neurocomputing (2022).
- Philipp Schmidt and Felix Biessmann. 2020. Calibrating human-ai collaboration: Impact of risk, ambiguity and transparency on algorithmic bias. In Interna- tional Cross-Domain Conference for Machine Learning and Knowledge Extraction. Springer, 431-449.
- Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedan- tam, Devi Parikh, and Dhruv Batra. 2017. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE inter- national conference on computer vision. 618-626.
- Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. 2017. Learning important features through propagating activation differences. In International conference on machine learning. PMLR, 3145-3153.
- Dylan Slack, Anna Hilgard, Himabindu Lakkaraju, and Sameer Singh. 2021. Counterfactual explanations can be manipulated. Advances in Neural Information Processing Systems 34 (2021), 62-75.
- Dylan Slack, Anna Hilgard, Sameer Singh, and Himabindu Lakkaraju. 2021. Reliable post hoc explanations: Modeling uncertainty in explainability. Advances in Neural Information Processing Systems 34 (2021), 9391-9404.
- Dylan Slack, Sophie Hilgard, Emily Jia, Sameer Singh, and Himabindu Lakkaraju. 2020. Fooling lime and shap: Adversarial attacks on post hoc explanation methods. In Proceedings of the AAAI/ACM Conference on AI, Ethics, and Society. 180-186.
- Francesco Sovrano, Salvatore Sapienza, Monica Palmirani, and Fabio Vitali. 2022. Metrics, Explainability and the European AI Act Proposal. J 5, 1 (March 2022), 126-138. https://doi.org/10.3390/j5010010
- Timo Speith. 2022. A review of taxonomies of explainable artificial intelli- gence (XAI) methods. In 2022 ACM Conference on Fairness, Accountability, and Transparency. 2239-2250.
- Mukund Sundararajan, Ankur Taly, and Qiqi Yan. 2017. Axiomatic attribution for deep networks. In International conference on machine learning. PMLR, 3319- 3328.
- Harini Suresh, Steven R Gomez, Kevin K Nam, and Arvind Satyanarayan. 2021. Beyond expertise and roles: A framework to characterize the stakeholders of interpretable machine learning and their needs. In Proceedings of the 2021 CHI Conference on Human Factors in Computing Systems. 1-16.
- Yi Chern Tan and L Elisa Celis. 2019. Assessing social and intersectional biases in contextualized word representations. Advances in Neural Information Processing Systems 32 (2019).
- Kaya ter Burg and Heysem Kaya. 2022. Comparing Approaches for Explaining DNN-Based Facial Expression Classifications. Algorithms 15, 10 (2022), 367.
- Media UK Secretary of State for Digital, Culture and Sport. 2022. AI Regulation Policy Paper. https://www.gov.uk/government/publications/establishing-a- pro-innovation-approach-to-regulating-ai/establishing-a-pro-innovation- approach-to-regulating-ai-policy-statement
- UNESCO. 2022. Recommendation on the Ethics of Artificial Intelligence. https: //unesdoc.unesco.org/ark:/48223/pf0000381137
- Van Roy V, Rossetti F, Perset K, and Galindo-Romero L. 2021. AI Watch -National strategies on Artificial Intelligence: A European perspective, 2021 edition. Scien- tific analysis or review, Policy assessment, Country report KJ-NA-30745-EN-N (online). Luxembourg (Luxembourg). https://doi.org/10.2760/069178(online)
- Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. 2017. Attention is all you need. Advances in neural information processing systems 30 (2017).
- Giorgio Visani, Enrico Bagli, Federico Chesani, Alessandro Poluzzi, and Davide Capuzzo. 2022. Statistical stability indices for LIME: Obtaining reliable explana- tions for machine learning models. Journal of the Operational Research Society 73, 1 (2022), 91-101.
- Sandra Wachter, Brent Mittelstadt, and Luciano Floridi. 2017. Why a Right to Explanation of Automated Decision-Making Does Not Exist in the General Data Protection Regulation. International Data Privacy Law 7, 2 (2017), 776-99.
- Xinru Wang and Ming Yin. 2021. Are explanations helpful? a comparative study of the effects of explanations in ai-assisted decision-making. In 26th International Conference on Intelligent User Interfaces. 318-328.
- David S Watson, Limor Gultchin, Ankur Taly, and Luciano Floridi. 2021. Local explanations via necessity and sufficiency: Unifying theory and practice. In Uncertainty in Artificial Intelligence. PMLR, 1382-1392.
- Magdalena Wischnewski, Nicole Krämer, and Emmanuel Müller. 2023. Measur- ing and Understanding Trust Calibrations for Automated Systems: A Survey of the State-Of-The-Art and Future Directions. In Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems (Hamburg, Germany) (CHI '23). Association for Computing Machinery, New York, NY, USA, Article 755, 16 pages. https://doi.org/10.1145/3544548.3581197
- Matthew D Zeiler and Rob Fergus. 2014. Visualizing and understanding convo- lutional networks. In European conference on computer vision. Springer, 818-833.
- Yunfeng Zhang, Q. Vera Liao, and Rachel K. E. Bellamy. 2020. Effect of Confi- dence and Explanation on Accuracy and Trust Calibration in AI-Assisted Deci- sion Making. In Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency (Barcelona, Spain) (FAT* '20). Association for Computing Ma- chinery, New York, NY, USA, 295-305. https://doi.org/10.1145/3351095.3372852
- Yujia Zhang, Kuangyan Song, Yiming Sun, Sarah Tan, and Madeleine Udell. 2019. " Why Should You Trust My Explanation?" Understanding Uncertainty in LIME Explanations. arXiv preprint arXiv:1904.12991 (2019).
- Jianlong Zhou, Amir H Gandomi, Fang Chen, and Andreas Holzinger. 2021. Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics 10, 5 (2021), 593.
- Yilun Zhou, Serena Booth, Marco Tulio Ribeiro, and Julie Shah. 2022. Do feature attribution methods correctly attribute features?. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 36. 9623-9633.