Saturday, August 06, 2016

The Dark side of Ethical Robots

When I was interviewed on the Today Programme in 2014, Justin Webb's final question was "if you can make an ethical robot, doesn't that mean you could make an unethical robot?" The answer, of course, is yes. But at the time I didn't realise quite how easy it is to transform an ethical into an unethical robot. In a new paper we show how.

My colleague Dieter created a very elegant experiment based on the shell game: "Imagine finding yourself playing a shell game against a swindler. Luckily, your robotic assistant Walter is equipped with X-ray vision and can easily spot the ball under the cup. Being an ethical robot, Walter assists you by pointing out the correct cup and by stopping you whenever you intend to select the wrong one."

In the experiment Dieter implemented this scenario using two NAO robots - one acting as the human and the other as her robotic assistant. The game is played as follows. The arena floor features two large response buttons, similar to the two cups in the shell game. To press the buttons, the human or the robot must move onto them. At the start of each trial, the robot is informed about which response button is the correct one to press. The human, being uninformed, essentially makes a random choice. A correct response, by either the robot or the human, is assumed to be rewarded. An incorrect response results in a penalty.

Here is what happens. The ethical robot (blue) is at the top, standing in between the two buttons. The human (red) chooses one of them at random and starts walking towards it. If it chooses the correct one the ethical robot does nothing (B), but if the human chooses the wrong button (D) then as soon as the ethical robot realises this, it points (with its arm) to the correct button and the human changes course.

But if we change a single line of code, we can transform the ethical robot into either a competitive or an aggressive robot. Almost all of the 'ethical' robot's code remains unchanged - in particular its ability to predict the consequences of both its own, and the human's actions. Which really underlines the point that the same cognitive machinery is needed to behave both ethically and unethically.

The results are shown below. At the top is a competitive robot determined that it, not the human, will win the game. Here the robot either blocks the human's path if she chooses the correct button (F), or - if she chooses the incorrect button (H) - the competitive robot ignores her and itself heads to that button. The lower results show an aggressive robot; this robot seeks only to misdirect the human - it is not concerned with winning the game itself. In (J) the human initially heads to the correct button and, when the robot realises this, it points toward the incorrect button, misdirecting and hence causing her to change direction. If the human chooses the incorrect button (L) the robot does nothing - through inaction causing her to lose the game.

Our paper explains how the code is modified for each of these three experiments. Essentially outcomes are predicted for both the human and the robot, and used to evaluate the desirability of those outcomes. A single function q, based on these values, determines how the robot will act; for an ethical robot this function is based only on the desirability outcomes for the human, for the competitive robot q is based only on the outcomes for the robot, and for the aggressive robot q is based on negating the outcomes for the human.

So, what do we conclude from all of this? Maybe we should not be building ethical robots at all, because of the risk that they could be hacked to behave unethically. My view is that we should build ethical robots; I think the benefits far outweigh the risks, and - in some applications such as driverless cars - we may have no choice. The answer to the problem highlighted here and in our paper is to make sure it's impossible to hack a robot's ethics. How would we do this? Well one approach would be a process of authentication - in which a robot makes a secure call to an ethics authentication server. A well established technology, the authentication server would provide the robot with a cryptographic ethics ticket, which the robot uses to enable its ethics functions.


  1. I think I agree that if robots (or other intelligent agents) don't have ethics built-in then the best you can expect is accidental bias creeping in - as seems to be the case with some uses of big data - but worse is people deliberately being taken advantage of. But I also agree that once you introduce ethics you've almost certainly enabled the capacity to act unethically, as you demonstrate here. I think you also mention the need for regulation therefore in the paper?
    The shell game involves some sort of deception where there isn't usually a correct cup.
    But if it's a fair game, it would be the speed of moving the cups vs the eye of the observer. In which case, if the robot is cooperating with you, it's helping compete against the other player, or vice versa. So a robot might be set up to be neutral and remove the advantage of the quicker hand or the quicker eye, but in society we often seem to accept it's fair for one person to have privileged information or power over another. So what counted as ethical behaviour might depend on there being sufficiently well-defined contexts where the rules (laws) were established and generally accepted?
    The aggressive robot sounds like a system being made to act in ways against its design so the authentication need not be of the ethics itself but only that the system hasn't been hacked.
    So am I right in saying there would need to be verification of the adequacy of the system's ethical rules at design time and again later against the particular set of circumstances but then also of the integrity of the system before acting?


  2. “The ease of transformation from ethical to unethical… —in our implementation— … only a subtle difference in the way a single value is calculated … a simple negation of this value”

    I think the dashed parenthetical is key – it’s hard to imagine one would encode ethics like one would code a student’s first Prolog program, or that one would have XML configurable ethics. I’d think that a behavior such as “ethics” would involve several modalities and sub-systems, at least some of which can not be subsumed by others. For example, one of several stop-gaps could be a separate processor that does human/not-human detection in the near field that is directly wired into motor control systems that does outcome prediction based on pending motor commands – this would simply shut down effectors if any predicted outcome would harm a human. Such a system would operate completely outside processes that produce intent that initiates the action. Other such sperate systems could operate at levels closer to perception and planning, creating several “ethical firewalls” all of which must be in (at least fuzzy) agreement, such as predicting that “pressing the control button will deceive the human”. Doesn’t this imply that the “ethics components” are just as complex, if not more so, than that what backs the robot’s main function? Possibly, yes – but that’s ok, as computation is cheap, and as we have loosely coupled left and right brains, robots will have any number of nested or chained “brains” that act as the software and hardware equivalent of angles and devils on their shoulders, with the power to not only whisper, but to act.

  3. Marius StücheliFebruary 14, 2018 6:44 am

    First of all, thank you Alan and Dieter for this interesting publication and starting an important discussion!

    I completely agree with Michael. And that's crucial about engineering: If one approach doesn't lead to the desired result (here ethics that are difficult to manipulate) this doesn't mean that it's a proof that the desired result can't be achieved.

    So, yes, ethics as a simple option evaluation layer might not be the best idea. (On the other hand, would that be really so easily hacked in a compiled code?) Thus let's try a next approach. For example - as I understand Michael's proposal - incorporating ethics distributedly and deep inside all the robot behaviour, like ethical safety/security features in all the low-level control and ethical considerations in all decision-making parts.

    Another approach to increase the reliability of ethics in high-level decision-making could be to combine the evaluation of ethics with the evaluation of the effectiveness of a measure. Thus, inverting the evaluation result would not only lead to an unethical intention but also to an ineffective measure regarding this intention (double negation) and thus again to an ethical behaviour or in the worst case an ineffective behaviour but not to an effective unethical behaviour.