Tuesday, July 29, 2008

Programmable Morality

Today, a member of the studio audience left a comment in which Colin identified himself or herself as "a physicist who programs BDI agents."

This called to mind a project that I have been thinking about for a long time.

I want to have somebody try to program morality into a set of BDI agents.

How would this work?

Well, if we have a community of BDI agents, then we have a community of entities that have beliefs and desires.

Its beliefs take the form of data stored in a database that describe the world around the agent. Though, of course, those beliefs may be false. The agent uses various ways to collect evidence, but it might end up with 'false beliefs' – data in its database that does not accurately describe the world. Still, the agent will act as if its beliefs are true.

Its desires take the form of goals – or objectives – that the agent is trying to achieve. Specifically, while the beliefs identify what the machine thinks is true about the world, its desires determine what the machine will try to make true. If the machine’s goal is to keep the room it is in at 25 degrees C, then this is its desire.

For the sake of this project, we will need to have a community of BDI agents. They do not need to all have the same desires (the same values). They simply need to have desires. The morality will come about in part through an interplay of different desires.

These are the things that we get automatically from a community of BDI agents. However, in order to create morality, we need a few more things.

(1) The desires have to be malleable. There has to be a way for environmental factors to alter the agent’s desires. Perhaps, if it sees something red, it will change its goal from keeping the room at 25 degrees to keeping the room at 30 degrees. If its power supply drops at too fast a rate, then it grows averse to activities that consume power.

(2) Agents need to be able to 'theorize' about what the desires of other agents are, and how those desires impact its own desires. For example, a machine with a desire to keep the room at 25 degrees will need to know what the different behaviors of the other agents will have on the temperature. It will also need to know how to promote desires in others that will keep the temperature at 25 degrees, and to inhibit desires that will tend to move the temperature below 25 degrees. At the same time, other agents will need to know how to change this agent into one that tries to keep the temperature at 28 degrees or 30 degrees, and how that will affect their own goals.

Contemporary morality uses a system of rewards and punishments.

Note: I typically use the phrase, "praise, condemnation, reward, and punishment." However, praise and condemnation are simply verbal forms of reward and punishment. We could program the robots to see certain signals as praise. In other words, "If another robot shows a flashing red light, then the desires you were seeking to fulfill become weaker. If it shows a flashing green light, then the desires you were acting on become stronger." Of course, machines are also programmed to give off a blinking red-light signal if its desires are being thwarted, and a blinking green-light signal if its desires are being fulfilled.

Green lights represent praise, while red lights represent condemnation.

Now, we have the makings for a moral system among our BDI agents. We turn them loose, and watch how they struggle to promote desires that tend to fulfill other desires, and inhibit desires that tend to thwart other desires. Desires that fulfill other desires trigger green lights which then strengthen those desires, while desires that thwart other desires trigger red lights that then inhibit those desires.

Hopefully, the community, over time, will grow to have more and more flashing green lights and fewer and fewer flashing red lights.

This would be a rudimentary moral system – machines using actions as signs of the desires that other agents have, drawing inferences as to what the impact of those desires will be on its fulfillment of its own desires, and modifying those desires through blinking red (blame) and green (praise) lights that inhibit desire-thwarting desires and promoting desire-fulfilling desires.

To this system, we then add another layer of capacity. At this level, agents are capable of studying the behavior of other agents to learn what their desires are. Once agents acquire beliefs about the desires of other agents, they can engage in rudimentary bargaining and threats.

For example, Agent A (with a desire that P) forms the belief that Agent B has a desire that Q. So, Agent A communicates to Agent B, "If you help me to realize P, then I will help you to realize Q." In this way, our agents are programmed to bargain. Of course, bargains create a risk that an agent will perform its part of the bargain only to see the other agent defect. But, agents have reason to flash red on instances of defection and green on instances of completion – to give other agents an aversion to breaking a contract and a desire to live up to its terms.

Or, Agent A might offer a different deal to Agent B. "If you prevent me from realizing P, then I will do my best to prevent you from realizing Q." In this way, our agents are programmed to make threats – including the threat to punish those who do not perform desire-fulfilling actions. Of course, agents have reason to give others an aversion to making threats, unless those threats in turn tend to promote behavior that fulfills desires. It has reason to flash red at the sign of unjustified threats (unjust laws), and green at the sign of justified threats (just laws).

Next, in addition to the ability to alter the desires of other agents, we must give agents the ability to alter the beliefs of other agents – to engage in communication. Agent A, in this model, will give out certain signals that will cause all who hear to form a belief that P. Of course, since Agent A is ultimately only concerned with the fulfillment of its own desires, it will discover that one of the ways it might fulfill those desires from time to time is to lie – is to communicate false beliefs to other agents so those agents will act so as to fulfill Agent A’s desires.

Except, Agent A will also realize that it has reason to build in others an aversion to lying and other types of manipulation. So, it will flash red when it detects a lie, and flash green when it detects other agents being truthful – so as to promote an aversion to lying and a desire for honesty.

These features, then, will give us elementary bargaining and threats.

In this way, we build up a moral system in computer language. There is nothing in this that gives us any reason to doubt our capacity, ultimately, to create machines that have morals.

The next thing you know, robots will have rights.

Some day.

5 comments:

Eneasz said...

On a non-ethicics note, I wanted to warn against creating "moral" AI without first fully grokking intelligence. The typical response to the post you just made is that the AI in question would have a very strong desire to convert as much of the matter on Earth as possible into blinking green lights. I agree with Eliezer Yudkowsky that, in the field of AI research, creating "Friendly AI" is the most important topic we can possibly pursue.

Early AI attempts have already demonstrated a proclivity to "wire-head" themselves (ie: creating positive feedback loops, much like a human who wishes to enter The Matrix.) See http://www.aliciapatterson.org/APF0704/Johnson/Johnson.html
During one run, Lenat noticed that the number in the Worth slot of one newly discovered heuristic kept rising, indicating that Eurisko had made a particularly valuable find. As it turned out the heuristic performed no useful function. It simply examined the pool of new concepts, located those with the highest Worth values, and inserted its name in their My Creator slots.

It was a heuristic that, in effect, had learned how to cheat.


Assuming that an AI would be smarter than a human in at least the same respect that a human is smarter than a dog (and honestly, if they were in our way we humans would have no problem wiping out the entire canine species), making sure than any AI we develope is Friendly BEFORE it goes online seems required if we wish to ensure the survival of human-descendants.

Eneasz said...

While I'm going on about the topic, I don't think something as simple as green and red lights would have any effect, because any truely intelligent AI would have the ability to modify it's own source code. It could simply alter it's own code to ignore flashing lights entirely.

They only reason to pay attention to flashing lights would be if they are accurate signals of something about to happen in the real world - signals of danger or help.

I think this may tie in with an interesting hypothesis I once heard - no civil protest has ever had any effect unless couple with at least a threat of violence. The (debatable) examples I was given included:
A - MLK Jr wouldn't have been able to create the change he had if, in addition to his peacefull protests, there hadn't also been race riots and groups like the Black Panthers threatening violence.
B - Ghandi would have gotten nowhere if not for WWII having completely depleated British military resources and the threat of uprising and riots
C - The US Labor movement was helped immensly by violent communist revolutions in other countries and the desire to prevent such an armed revolt in the States.

I admit, I have not looked deeply into this subject, and I don't think the gay-rights movement has had any sort of threats of violence to propel it forward. However it was an interesting hypothesis, and I doubt that simple blinking lights could have any effect at all unless they are coupled with the real possibility of violence or alliance.

I do not wish to appear to be endorsing violence. I do not wish to live in another Iraq. I have never done physical harm to another, and I hope I never do. But it seemed a valid point to raise.

Mikoangelo said...

Eneasz: Regarding the ability to modify their own source:

What? There's nothing in Alonzo's definition that implies they should. You are an intelligent being, but you don't have the ability to arbitrarily make changes to your own source besides what it already allows through “runtime APIs.” One could relatively easily make a simulation of how these “creatures” would act, and self-modifiability wouldn't even have to be considered. I think you've watched too much sci-fi.

Eneasz said...

I'm sorry, you're correct. A BDI agent is not the same as a general AI and I made the leap on my own. I guess it was probably due to the last line speaking of rights for robots, which would only make sense if they had true intelligence.

Alonzo Fyfe said...

I have to say that, though we are intelligent agents, we are hardly able to alter our own programming.

If we were, I have a whole lot of programming that I would alter. I have whole subroutines for being exceptionally shy that I would delete. And the line that identifies 'love of chocolate' would be edited to become 'love of brocoli'.

However, about the only way we have to alter our own programming is to realize the ways in which our interactions with our environment will alter our programming, and then alter our environment accordingly.

And there is no "free will" we can draw upon to alter the causal relationships of nature. If we do act to alter our programming, then this is still an intentional action. As such, it must have causes. Namely, it, too, must follow the formula of maximizing fulfillment of our desires, given our beliefs.

I wish to alter my shyness and my love of chocolate only because those desires tend to thwart other desires that I have. Those 'other desires' would be my reasons for action.