...
首页> 外文期刊>Econometrica >Incomplete learning from endogenous data in dynamic allocation
【24h】

Incomplete learning from endogenous data in dynamic allocation

机译:动态分配中对内源数据的不完全学习

获取原文
获取原文并翻译 | 示例

摘要

This paper studies the problem of learning from endogenous data by an economic agent who chooses actions sequentially from a finite set {a_1,...,a_k} such that the reward R(a_j) of action a_j has a probability distribution depending on an unknown parameter θ_j that has a prior distribution II~((j)). The agent's objective is to maximize the total discounted reward ∫…∫E_(θ_1…θ_k){Σ from t = 0 to ∞ of β~tR(X_(t+1))}dII~((1))(θ_1)…dII~((k))(θ_k), where 0 < β < 1 is a discount factor and X_t denotes the action chosen by the agent at time t. The optimal solution to this problem, commonly called the "discounted multiarmed bandit problem," was shown by Gittins and Jones (1974) and Gittins (1979) to be the "index rule" that chooses at each stage the action with the largest "dynamic allocation index" (DAI). The theory of multi-armed bandits has been applied to decision making in labor markets (cf. Jovanovic (1979), Mortensen (1985)), general search problems involving nondurable goods (cf. Banks and Sundaram (1992)) and pricing under demand uncertainty (cf. Rothschild (1974)).
机译:本文研究了经济主体从内在数据中学习的问题,该主体从有限集{a_1,...,a_k}中顺序选择行动,使得行动a_j的奖励R(a_j)具有取决于未知数的概率分布具有先验分布II〜((j))的参数θ_j。代理的目标是使总折价奖励∫…∫E_(θ_1...θ_k){Σ从t〜0到β〜tR(X_(t + 1))} dII〜(((1))(θ_1)的∞ …dII〜((k))(θ_k),其中0 <β<1是折扣因子,X_t表示代理在时间t选择的动作。 Gittins和Jones(1974)和Gittins(1979)证明了这个问题的最佳解决方案,通常称为“折扣多臂匪徒问题”,它是在每个阶段选择具有最大“动态”作用的“索引规则”。分配指数”(DAI)。多臂匪徒理论已应用于劳动力市场的决策(参见Jovanovic(1979),Mortensen(1985)),涉及非耐用品的一般搜索问题(参见Banks and Sundaram(1992))和需求定价不确定性(参见Rothschild(1974))。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号