专栏首页arxiv.org翻译专栏具有自修改能力的有界理性代理的性能(CS AI)
原创

具有自修改能力的有界理性代理的性能(CS AI)

嵌入在复杂环境中的代理的自我修改是难以避免的,无论是通过直接手段(例如,自己的代码修改)还是间接手段(例如,影响操作员、利用漏洞或环境)。虽然有人认为智能代理有避免修改其效用函数的动机,以便它们的未来实例将朝着相同的目标工作,但尚不清楚这是否也适用于非二元场景,其中代理嵌入在环境中。博斯特罗姆在《超级智能》(2014)中提出了自我改造安全的问题,这是在AGI安全部署的背景下提出的。Everitt等人(2016)正式表明,提供自修改选项对完全理性的代理是无害的,与此相反,我们表明,对于有限理性的代理,自修改可能导致性能的指数级恶化和先前对齐的代理的逐渐错位。我们研究了这种影响的大小如何取决于代理人理性中缺陷的类型和大小(下面的1-4)。我们还讨论了模型假设和更广泛的问题和框架空间。具体来说,我们引入了几种类型的有限理性主体,它们要么(1)不总是选择最优行为,(2)不完全符合人类价值观,(3)具有不精确的环境模型,或者(4)使用了错误的时间贴现因子。我们表明,虽然在情况(2)-(4)中,由代理的缺陷引起的未对准不会随着时间而恶化,但是(1)未对准可能呈指数增长。

原文题目:Performance of Bounded-Rational Agents With the Ability to Self-Modify

原文:Self-modification of agents embedded in complex environments is hard to avoid, whether it happens via direct means (e.g. own code modification) or indirectly (e.g. influencing the operator, exploiting bugs or the environment). While it has been argued that intelligent agents have an incentive to avoid modifying their utility function so that their future instances will work towards the same goals, it is not clear whether this also applies in non-dualistic scenarios, where the agent is embedded in the environment. The problem of self-modification safety is raised by Bostrom in Superintelligence (2014) in the context of safe AGI deployment.

In contrast to Everitt et al. (2016), who formally show that providing an option to self-modify is harmless for perfectly rational agents, we show that for agents with bounded rationality, selfmodification may cause exponential deterioration in performance and gradual misalignment of a previously aligned agent. We investigate how the size of this effect depends on the type and magnitude of imperfections in the agent’s rationality (1-4 below). We also discuss model assumptions and the wider problem and framing space.

Specifically, we introduce several types of a bounded-rational agent, which either (1) doesn’t always choose the optimal action, (2) is not perfectly aligned with human values, (3) has an innacurate model of the environment, or (4) uses the wrong temporal discounting factor. We show that while in the cases (2)-(4) the misalignment caused by the agent’s imperfection does not worsen over time, with (1) the misalignment may grow exponentially.

原文作者:Jakub Tˇetek,Marek Sklenka

原文地址:https://arxiv.org/abs/2011.06275

原创声明,本文系作者授权云+社区发表,未经许可,不得转载。

如有侵权,请联系 yunjia_community@tencent.com 删除。

我来说两句

0 条评论
登录 后参与评论

相关文章

  • 光谱和的量子算法(CS AI)

    我们提出并分析了估计对称正定矩阵最常见谱和的新量子算法。对于函数f和矩阵A ∈ Rn×n,谱和定义为Sf(A) := Tr[f(A)] =P jf(λj),其中...

    识檐
  • 学习丢弃:基于拓扑去噪的鲁棒图神经网络(CS AI)

    形神经网络已被证明是图形分析的强大工具。关键思想是沿着给定图的边递归地传播和聚集信息。尽管已经取得了成功,但是现有的神经网络通常对输入图的质量很敏感。真实世界的...

    识檐
  • 新冠肺炎高原:适应性预防策略下的流行病发展现象(CS SI)

    自新冠肺炎扩散开始以来,关于流行病模型的研究数量急剧增加。对于决策者来说,了解疾病将如何传播,以及政策和环境对传播的影响是很重要的。在本文中,我们对标准传染病模...

    识檐
  • 如何处理UI5一般性错误Cannot read property md of undefined

    Suppose you have encountered with this error when you launch your Fiori applicat...

    Jerry Wang
  • How to handle the generic error Cannot read property md of undefined

    Suppose you have encountered with this error when you launch your Fiori applicat...

    Jerry Wang
  • Fiori Globalization实现的一个具体例子 - 关于数字显示格式的处理

    This issue is copied from one of Jerry’s workshop regarding Fiori Globalization ...

    Jerry Wang
  • 贝叶斯估计中极大似然估计、拉普拉斯平滑定理以及M-估计

    英文原文链接:http://www.temida.si/~bojan/probability_estimation.php 原文: Probabilit...

    学到老
  • FAST Algorithm for Corner Detection

    We saw several feature detectors and many of them are really good. But when look...

    bear_fish
  • POJ----The Suspects

    The Suspects Time Limit: 1000MS Memory Limit: 20000K Total Submissions: ...

    Gxjun
  • Codeforces Round #615 (Div. 3)B. Collecting Packages

    There is a robot in a warehouse and nn packages he wants to collect. The warehou...

    glm233

扫码关注云+社区

领取腾讯云代金券