Against Purposeful Artificial Intelligence Failures
Keywords:
AI Accident, AI Failure, AI Safety, AI Terrorism, Superintelligence, X-RiskAbstract
Thousands of researchers are currently of opinion that advanced artificial intelligence could cause significant damage if developed without appropriate safety measures, but such measures are not currently deployed or even developed. A fringe theory suggests that a severe AI accident could serve as a fire alarm for humanity to take existential dangers of AI seriously and so it is desirable to create such a failure on purpose ASAP to prevent greater harm in the future. In this paper we rely on analogy to inoculation theory to argue against creating purposeful AI failures.
References
Yampolskiy, R.V., On monitorability of AI. AI and Ethics, 2024: p. 1-19.
Baum, S., A. Barrett, and R.V. Yampolskiy, Modeling and interpreting expert disagreement about artificial superintelligence. Informatica, 2017. 41(7): p. 419-428.
Yampolskiy, R.V., AI: Unexplainable, Unpredictable, Uncontrollable. 2024: CRC Press.
Yampolskiy, R.V., Untestability of AI.https://www.researchgate.net/publication/378126414_Untestability_of_AI_and_Unfalsifi ability_of_AI_Safety_Claims
Brcic, M. and R.V. Yampolskiy, Impossibility Results in AI: a survey. ACM Computing Surveys, 2023. 56(1): p. 1-24.
Bengio, Y., et al., Managing AI risks in an era of rapid progress. arXiv preprint arXiv:2310.17688, 2023.
Yampolskiy, R.V. AI Risk Skepticism. in Conference on Philosophy and Theory of Artificial Intelligence. 2021. Springer.
Ambartsoumean, V.M. and R.V. Yampolskiy, AI Risk Skepticism, A Comprehensive Survey. arXiv preprint arXiv:2303.03885, 2023.
Yudkowsky, E., There’s No Fire Alarm for Artificial General Intelligence. October 14, 2017: Available at: https://intelligence.org/2017/10/13/fire-alarm/.
Pihelgas, M., Mitigating risks arising from false-flag and no-flag cyber attacks. CCD COE, NATO, Tallinn, 2015.
Pistono, F. and R.V. Yampolskiy. Unethical Research: How to Create a Malevolent Artificial Intelligence. in 25th International Joint Conference on Artificial Intelligence (IJCAI-16). Ethics for Artificial Intelligence Workshop (AI-Ethics-2016). 2016.
Hubinger, E., et al., Sleeper agents: Training deceptive llms that persist through safety training. arXiv preprint arXiv:2401.05566, 2024.
Chessen, M., We Need To Build Doomsday AI. November 26, 2023: Available at: https://solarpunkfuture.substack.com/p/we-need-to-build-doomsday-ai.
Pistono, F., Coronavirus is a Tragedy. But it May Save Humanity. March 22, 2020: Available at: https://medium.com/@FedericoPistono/conoravirus-is-tragedy-but-it-maysave-humanity-6f105a1d5fee.
Whitby, B., Reflections on artificial intelligence: the legal, moral and ethical dimensions. 1996: Intellect Ltd.
Barrat, J., Our final invention: Artificial intelligence and the end of the human era. 2013: Macmillan.
Ord, T., The precipice: Existential risk and the future of humanity. 2020: Hachette Books. 18. Anonymous, The Case for Inspiring Disasters. June 7, 2020: Available at:
https://forum.effectivealtruism.org/posts/zCvWn6f8NdT3mxiZg/the-case-for-inspiringdisasters.
McGuire, W.J., Resistance to persuasion conferred by active and passive prior refutation of the same and alternative counterarguments. The Journal of Abnormal and Social Psychology, 1961. 63(2): p. 326.
Compton, J., Inoculation theory. The SAGE handbook of persuasion: Developments in theory and practice, 2013. 2: p. 220-237.
Yampolskiy, R.V., On the controllability of artificial intelligence: An analysis of limitations. Journal of Cyber Security and Mobility, 2022. 11(3): p. 321-404.
Yampolskiy, R.V., Unpredictability of AI: On the impossibility of accurately predicting all actions of a smarter agent. Journal of Artificial Intelligence and Consciousness, 2020. 7(01): p. 109-118.
Yampolskiy, R.V., Behavioral modeling: an overview. American Journal of Applied Sciences, 2008. 5(5): p. 496-503.
Yampolskiy, R.V. and V. Govindaraju. Use of behavioral biometrics in intrusion detection and online gaming. in Biometric Technology for Human Identification III. 2006. SPIE.
Yampolskiy, R.V., Predicting future AI failures from historic examples. foresight, 2019. 21(1): p. 138-152.
Scott, P.J. and R.V. Yampolskiy, Classification schemas for artificial intelligence failures. Delphi, 2019. 2: p. 186.
Williams, R. and R. Yampolskiy, Understanding and avoiding ai failures: A practical guide. Philosophies, 2021. 6(3): p. 53.
Downloads
Published
How to Cite
Issue
Section
Categories
License
Copyright (c) 2024 Roman Yampolskiy
This work is licensed under a Creative Commons Attribution-NoDerivatives 4.0 International License.