Implementing Morality in AI


Responsibility, loyalty, trustworthiness: what do these words have in common? Firstly, they are abstractions, meaning that they do not make direct reference to any specific object in the world. This class of words presents particular challenges when it comes to explaining them to an artificial intelligence. Overcoming this challenge may prove critically important because the second feature that unites these words is that they all relate to ethical behavior.

Why is it important for AI to have the capacity for ethical behavior in the first place? Insofar as AI is only a tool for humans to use, the onus of moral responsibility falls on its human operators. But AI systems are increasingly being entrusted with responsibility, from helping in the diagnosis of medical patients to operating a fully autonomous network of taxis in San Francisco. With the trend towards increasing responsibility paired with decreasing human oversight comes greater potential for the actions of AI to have far-reaching societal consequences. It is up to us to ensure that the impact this new technology comes to have is not a negative one.

The first line of defense is to assign goals to the machine that are congruent with our own ethical sensibilities. This sounds simple enough, but is easier said than done. Even in cases where the primary goal assigned to such a machine seems innocuous, the machine may go on to self-generate sub goals in service of its overarching goal, and these sub goals will not necessarily be subjected to human oversight[1]. In a scenario where this machine has not been instilled with an underlying value system congruent with the best interests of humans, it cannot be taken as a given that the actions generated by the machine will not violate principles we may have considered too basic to even bother describing to the machine in the first place. One doesn’t need to look far to find examples of AI developing completely unexpected routes to achieve its goals[2].

When specifying goals for a machine and imagining the ways in which those machines may elect to complete them, we’re likely to only explore the solution space that makes sense to humans—which is to say that we bring many assumptions into our brainstorming session. A machine given the instruction to get grandma away from the burning building may select to explode the building with dynamite in order to blow grandma away from it at a high velocity[3]. This technically solves the problem of getting grandma away from the burning building but implicit in our goal specification was the desire that grandma would be alive and well as a result of the solution. A machine would not necessarily make this assumption if it was not explicitly specified. And it’s not as simple as clarifying in our specification that she should be “alive and well”—perhaps this is not possible. In this case, the machine would need to select the best of the undesirable options available to it. Alternative options would follow a chain of preferences the machine would not by default be able to infer: having her alive is a high priority, 1st degree burns, if unavoidable, would be preferable to 3rd degree burns, losing a toe in the process would be preferred over losing a leg. Coming to these conclusions requires a sort of common sense reasoning that the machine would not inherently possess.

Attempts have been made to create large-scale repositories of common sense knowledge for AI to draw upon, but manual creation of such databases has major drawbacks. As far as manually entering such knowledge is concerned, the sheer number of man hours required for such an undertaking is staggering. Furthermore, we don’t even realize all of the common sense assumptions that we carry with us in our day-to-day life, so how successful are we to be in any attempt to codify them? So how about instead attempting to extract patterns from large data sets, as the currently trendy deep learning methods would do? Attempts have already been made to train a neural network to learn moral behavior from a training set of moral judgements made by real humans[4], but the resulting model is only a reflection of the pooled opinions of those contributing to it.

The first problem with these approaches is that the moral guidelines directly coded into the machine or the situations that make up the training data set cannot possibly cover every situation that the machine may encounter once it is put to use in a natural environment. The real world is far too nuanced for any set of preprogrammed moral principles to cover all possible circumstances that may arise. There may be a situation that resembles nothing in the training set which translates into the machine having nothing to draw upon to address it. Its architecture is not capable of the logical gymnastics required to address the novel situation. In these cases, the machine will fail, and failures in the moral domain can yield devastating consequences when the moral agent has been delegated real-world responsibilities.

The second problem is that when it comes to the domain of moral reasoning, there is little consensus to be found on even the simplest of points. Avoid killing other humans? Many people would say that is morally permissible if it is done in a protective capacity. But this is only a circumstantial permissibility, and many people would not agree with its permissibility even under those circumstances.  So not only is there variation in our moral sentiments, it is difficult to pin down a moral principle even on an individual level because our moral judgements vary with the circumstances of the situation.

Any attempt at a static system of moral guidance, despite attempts to make it as inclusive as possible, is doomed to failure. The only hope of success in this domain is the creation of a machine capable of flexible moral reasoning that begins from a place of comprehension. It needs to possess a set of values that it understands and reasons with while exploring a problem space for suitable routes to meet its goal. A machine that does not understand the meaning of words like loyalty and responsibility cannot consistently produce output congruent with these values. For this goal to ever be achieved, an entirely new approach to AI is essential. It requires an architecture built on an understanding of words and the way that they relate to one another, rather than mere statistically-driven symbol manipulation. That approach is precisely what we at Active Structure are striving to bring into being.



[1] See Nick Bostrom’s Paperclip Maximizer thought experiment for an over-the-top illustration of this issue.

[2] For a collection of examples, see: Lehman, J., Clune, J., Misevic, D., et al., 2020. “The Surprising Creativity of Digital Evolution: A Collection of Anecdotes from the Evolutionary Computation and Artificial Life Research Communities.” In Artificial Life, 26(2): 274–306. doi:10.1162/artl_a_00319

[3] Example taken from: Yudkowsky, E. 2011. “Complex Value Systems in Friendly AI.” In Artificial General Intelligence: 4th International Conference, AGI 2011, Mountain View, CA, USA, August 3–6, 2011. Proceedings, edited by Jürgen Schmidhuber, Kristinn R. Thórisson, and Moshe Looks, 388–393. Vol. 6830. Lecture Notes in Computer Science. Berlin: Springer. doi:10.1007/978-3-642-22887-2_48

[4] Read about the Allen Institute for AI’s Delphi project here: https://www.nytimes.com/2021/11/19/technology/can-a-machine-learn-morality.html

Comments

Popular Posts