RECON-NSE-Mitigation

Mitigating Side Effects in Multi-Agent Systems Using Blame Assignment

Collaborative Robotics and Intelligent Systems (CoRIS) Institute
Oregon State University

Abstract

When independently trained or designed robots are deployed in a shared environment, their combined actions can lead to unintended negative side effects (NSEs). To ensure safe and efficient operation, robots must optimize task performance while minimizing the penalties associated with NSEs, balancing individual objectives with collective impact. We model the problem of mitigating NSEs in a cooperative multi-agent system as a bi-objective lexicographic decentralized Markov decision process. We assume independence of transitions and rewards with respect to the robots' tasks, but the joint NSE penalty creates a form of dependence in this setting. To improve scalability, the joint NSE penalty is decomposed into individual penalties for each robot using credit assignment, which facilitates decentralized policy computation. We empirically demonstrate, using mobile robots and in simulation, the effectiveness and scalability of our approach in mitigating NSEs.

Approach Overview

Agents independently compute policies to complete tasks described by \(R_1^i\) (Naive policy). The NSE Monitor computes the NSE penalty for the joint policy \(\vec{\pi}\). The Blame Resolver assigns a blame value for each agent, by evaluating counterfactual scenarios specific to each agent, as illustrated with warehouse robots handling different-sized boxes. Individual penalty functions \(R_N^i\) are derived for each agent, based on the estimate blame. Agents then recompute their policies by solving the bi-objective problem with \(R_1^i \succ R_N^i\), where \(\succ\) denotes preference ordering over the objectives and their associated reward functions.

Experiments with Mobile Robots

We demonstrate the effectiveness of our approach in mitigating NSEs using mobile robots in a warehouse scenario. The robots are tasked with transporting boxes of different sizes (encoded as different colors) to their respective destinations. The NSE penalty is incurred when either or both robots carrying boxes go over locations marked with X. We compare the performance of our approach with a naive policy, a difference reward, and a considerate reward. We also evaluate the performance of our generalized RECON approach with and without counterfactual data. The videos below show the robots executing the tasks with different reward functions.

Naive Policy

Difference Reward

Considerate Reward

RECON

Generalized RECON w/o CF data

Generalized RECON w/ CF data

@article{rustagi2024mitigating, title={Mitigating Negative Side Effects in Multi-Agent Systems Using Blame Assignment}, author={Rustagi, Pulkit and Saisubramanian, Sandhya}, journal={arXiv preprint arXiv:2405.04702}, year={2024} }

Mitigating Side Effects in Multi-Agent Systems Using Blame Assignment

Unanticipated (negative) side effects in multi-agent systems can lead to unintended consequences and safety hazards. While current approaches lack in scalability and efficiency, our method uses a novel blame assignment technique to mitigate these side effects in a decentralized manner.

Abstract

Approach Overview

Experiments in Simulation

Domains used for simulation experiments

Effect of generalization on NSE mitigation

Scalability