首页> 外文会议>International conference on information security practice and experience >Secure Best Arm Identification in Multi-armed Bandits
【24h】

Secure Best Arm Identification in Multi-armed Bandits

机译:在多武装匪徒的安全最佳武器识别

获取原文

摘要

The stochastic multi-armed bandit is a classical decision making model, where an agent repeatedly chooses an action (pull a bandit arm) and the environment responds with a stochastic outcome (reward) coming from an unknown distribution associated with the chosen action. A popular objective for the agent is that of identifying the arm with the maximum expected reward, also known as the best-arm identification problem. We address the inherent privacy concerns that occur in a best-arm identification problem when outsourcing the data and computations to a honest-but-curious cloud. Our main contribution is a distributed protocol that computes the best arm while guaranteeing that (ⅰ) no cloud node can learn at the same time information about the rewards and about the arms ranking, and (ⅱ) by analyzing the messages communicated between the different cloud nodes, no information can be learned about the rewards or about the ranking. In other words, the two properties ensure that the protocol has no security single point of failure. We rely on the partially homomorphic property of the well-known Paillier's cryptosystem as a building block in our protocol. We prove the correctness of our protocol and we present proof-of-concept experiments suggesting its practical feasibility.
机译:随机多武装强盗是一种经典决策模型,其中代理重复选择动作(拉动臂),并且环境响应来自与所选动作相关的未知分布的随机结果(奖励)。代理人的流行目标是识别具有最大预期奖励的手臂,也称为最佳武器识别问题。在将数据和计算外包给诚实但好奇的云端时,我们解决了在最佳武装识别问题中发生的固有隐私问题。我们的主要贡献是一种分布式协议,可以计算最佳手臂,同时保证(Ⅰ)云节点可以通过分析不同云之间传递的消息,并通过分析传播的消息来在奖励和臂排名的同时学习信息。节点,无法了解奖励或关于排名的信息。换句话说,两个属性确保协议没有安全单个故障。我们依赖着名的Paillier密码系统的部分同性恋属性作为我们协议的构建块。我们证明了我们协议的正确性,我们提出了概念证明实验,表明其实际可行性。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号