Against Purposeful Artificial Intelligence Failures

Roman Yampolskiy

Authors

Roman Yampolskiy University of Louisville

Keywords:

AI Accident, AI Failure, AI Safety, AI Terrorism, Superintelligence, X-Risk

Abstract

Thousands of researchers are currently of opinion that advanced artificial intelligence could cause significant damage if developed without appropriate safety measures, but such measures are not currently deployed or even developed. A fringe theory suggests that a severe AI accident could serve as a fire alarm for humanity to take existential dangers of AI seriously and so it is desirable to create such a failure on purpose ASAP to prevent greater harm in the future. In this paper we rely on analogy to inoculation theory to argue against creating purposeful AI failures.

Author Biography

Roman Yampolskiy, University of Louisville

Associate Professor, Speed School of Engineering, Computer Science and Engineering

References

Yampolskiy, R.V., On monitorability of AI. AI and Ethics, 2024: p. 1-19.

Baum, S., A. Barrett, and R.V. Yampolskiy, Modeling and interpreting expert disagreement about artificial superintelligence. Informatica, 2017. 41(7): p. 419-428.

Yampolskiy, R.V., AI: Unexplainable, Unpredictable, Uncontrollable. 2024: CRC Press.

Yampolskiy, R.V., Untestability of AI.https://www.researchgate.net/publication/378126414_Untestability_of_AI_and_Unfalsifi ability_of_AI_Safety_Claims

Brcic, M. and R.V. Yampolskiy, Impossibility Results in AI: a survey. ACM Computing Surveys, 2023. 56(1): p. 1-24.

Bengio, Y., et al., Managing AI risks in an era of rapid progress. arXiv preprint arXiv:2310.17688, 2023.

Yampolskiy, R.V. AI Risk Skepticism. in Conference on Philosophy and Theory of Artificial Intelligence. 2021. Springer.

Ambartsoumean, V.M. and R.V. Yampolskiy, AI Risk Skepticism, A Comprehensive Survey. arXiv preprint arXiv:2303.03885, 2023.

Yudkowsky, E., There’s No Fire Alarm for Artificial General Intelligence. October 14, 2017: Available at: https://intelligence.org/2017/10/13/fire-alarm/.

Pihelgas, M., Mitigating risks arising from false-flag and no-flag cyber attacks. CCD COE, NATO, Tallinn, 2015.

Pistono, F. and R.V. Yampolskiy. Unethical Research: How to Create a Malevolent Artificial Intelligence. in 25th International Joint Conference on Artificial Intelligence (IJCAI-16). Ethics for Artificial Intelligence Workshop (AI-Ethics-2016). 2016.

Hubinger, E., et al., Sleeper agents: Training deceptive llms that persist through safety training. arXiv preprint arXiv:2401.05566, 2024.

Chessen, M., We Need To Build Doomsday AI. November 26, 2023: Available at: https://solarpunkfuture.substack.com/p/we-need-to-build-doomsday-ai.

Pistono, F., Coronavirus is a Tragedy. But it May Save Humanity. March 22, 2020: Available at: https://medium.com/@FedericoPistono/conoravirus-is-tragedy-but-it-maysave-humanity-6f105a1d5fee.

Whitby, B., Reflections on artificial intelligence: the legal, moral and ethical dimensions. 1996: Intellect Ltd.

Barrat, J., Our final invention: Artificial intelligence and the end of the human era. 2013: Macmillan.

Ord, T., The precipice: Existential risk and the future of humanity. 2020: Hachette Books. 18. Anonymous, The Case for Inspiring Disasters. June 7, 2020: Available at:

https://forum.effectivealtruism.org/posts/zCvWn6f8NdT3mxiZg/the-case-for-inspiringdisasters.

McGuire, W.J., Resistance to persuasion conferred by active and passive prior refutation of the same and alternative counterarguments. The Journal of Abnormal and Social Psychology, 1961. 63(2): p. 326.

Compton, J., Inoculation theory. The SAGE handbook of persuasion: Developments in theory and practice, 2013. 2: p. 220-237.

Yampolskiy, R.V., On the controllability of artificial intelligence: An analysis of limitations. Journal of Cyber Security and Mobility, 2022. 11(3): p. 321-404.

Yampolskiy, R.V., Unpredictability of AI: On the impossibility of accurately predicting all actions of a smarter agent. Journal of Artificial Intelligence and Consciousness, 2020. 7(01): p. 109-118.

Yampolskiy, R.V., Behavioral modeling: an overview. American Journal of Applied Sciences, 2008. 5(5): p. 496-503.

Yampolskiy, R.V. and V. Govindaraju. Use of behavioral biometrics in intrusion detection and online gaming. in Biometric Technology for Human Identification III. 2006. SPIE.

Yampolskiy, R.V., Predicting future AI failures from historic examples. foresight, 2019. 21(1): p. 138-152.

Scott, P.J. and R.V. Yampolskiy, Classification schemas for artificial intelligence failures. Delphi, 2019. 2: p. 186.

Williams, R. and R. Yampolskiy, Understanding and avoiding ai failures: A practical guide. Philosophies, 2021. 6(3): p. 53.

Against Purposeful Artificial Intelligence Failures

Authors

Keywords:

Abstract

Author Biography

Roman Yampolskiy, University of Louisville

References

Downloads

Published

How to Cite

Issue

Section

Categories

License