原文题目：Performance of Bounded-Rational Agents With the Ability to Self-Modify
原文：Self-modification of agents embedded in complex environments is hard to avoid, whether it happens via direct means (e.g. own code modification) or indirectly (e.g. influencing the operator, exploiting bugs or the environment). While it has been argued that intelligent agents have an incentive to avoid modifying their utility function so that their future instances will work towards the same goals, it is not clear whether this also applies in non-dualistic scenarios, where the agent is embedded in the environment. The problem of self-modification safety is raised by Bostrom in Superintelligence (2014) in the context of safe AGI deployment.
In contrast to Everitt et al. (2016), who formally show that providing an option to self-modify is harmless for perfectly rational agents, we show that for agents with bounded rationality, selfmodification may cause exponential deterioration in performance and gradual misalignment of a previously aligned agent. We investigate how the size of this effect depends on the type and magnitude of imperfections in the agent’s rationality (1-4 below). We also discuss model assumptions and the wider problem and framing space.
Specifically, we introduce several types of a bounded-rational agent, which either (1) doesn’t always choose the optimal action, (2) is not perfectly aligned with human values, (3) has an innacurate model of the environment, or (4) uses the wrong temporal discounting factor. We show that while in the cases (2)-(4) the misalignment caused by the agent’s imperfection does not worsen over time, with (1) the misalignment may grow exponentially.
原文作者：Jakub Tˇetek，Marek Sklenka
如有侵权，请联系 firstname.lastname@example.org 删除。
Suppose you have encountered with this error when you launch your Fiori applicat...
We saw several feature detectors and many of them are really good. But when look...
There is a robot in a warehouse and nn packages he wants to collect. The warehou...