Domains used for simulation experiments

When independently trained or designed robots are deployed in a shared environment, their combined actions can lead to unintended negative side effects (NSEs). To ensure safe and efficient operation, robots must optimize task performance while minimizing the penalties associated with NSEs, balancing individual objectives with collective impact. We model the problem of mitigating NSEs in a cooperative multi-agent system as a bi-objective lexicographic decentralized Markov decision process. We assume independence of transitions and rewards with respect to the robots' tasks, but the joint NSE penalty creates a form of dependence in this setting. To improve scalability, the joint NSE penalty is decomposed into individual penalties for each robot using credit assignment, which facilitates decentralized policy computation. We empirically demonstrate, using mobile robots and in simulation, the effectiveness and scalability of our approach in mitigating NSEs.
We demonstrate the effectiveness of our approach in mitigating NSEs using mobile robots in a warehouse scenario. The robots are tasked with transporting boxes of different sizes (encoded as different colors) to their respective destinations. The NSE penalty is incurred when either or both robots carrying boxes go over locations marked with X. We compare the performance of our approach with a naive policy, a difference reward, and a considerate reward. We also evaluate the performance of our generalized RECON approach with and without counterfactual data. The videos below show the robots executing the tasks with different reward functions.
@article{rustagi2024mitigating,
title={Mitigating Negative Side Effects in Multi-Agent Systems Using Blame Assignment},
author={Rustagi, Pulkit and Saisubramanian, Sandhya},
journal={arXiv preprint arXiv:2405.04702},
year={2024}
}