...
首页> 外文期刊>JMLR: Workshop and Conference Proceedings >Efficient Parameter Estimation of Truncated Boolean Product Distributions
【24h】

Efficient Parameter Estimation of Truncated Boolean Product Distributions

机译:截断布尔产品分布的有效参数估计

获取原文
   

获取外文期刊封面封底 >>

       

摘要

We study the problem of estimating the parameters of a Boolean product distribution in $d$ dimensions, when the samples are truncated by a set $S subset {0, 1}^d$ accessible through a membership oracle. This is the first time that the computational and statistical complexity of learning from truncated samples is considered in a discrete setting. We introduce a natural notion of emph{fatness} of the truncation set $S$, under which truncated samples reveal enough information about the true distribution. We show that if the truncation set is sufficiently fat, samples from the true distribution can be generated from truncated samples. A stunning consequence is that virtually any statistical task (e.g., learning in total variation distance, parameter estimation, uniformity or identity testing) that can be performed efficiently for Boolean product distributions, can also be performed from truncated samples, with a small increase in sample complexity. We generalize our approach to ranking distributions over $d$ alternatives, where we show how fatness implies efficient parameter estimation of Mallows models from truncated samples. Exploring the limits of learning discrete models from truncated samples, we identify three natural conditions that are necessary for efficient identifiability: (i) the truncation set $S$ should be rich enough; (ii) $S$ should be accessible through membership queries; and (iii) the truncation by $S$ should leave enough randomness in all directions. By carefully adapting the Stochastic Gradient Descent approach of (Daskalakis et al., FOCS 2018), we show that these conditions are also sufficient for efficient learning of truncated Boolean product distributions.
机译:我们研究了$ D $维度以$ S subset {0,1 } ^ d $可访问的$ d $维度在$ d $维度中估算BOOLEAN产品分发参数的问题。这是第一次在离散的设置中考虑从截断样本中学习的计算和统计复杂性。我们介绍了截断设置$ S $的 emph {Fatness}的自然概念,截断样本显示有关真实分布的足够信息。我们表明,如果截断集足够脂肪,则可以从截断的样本生成来自真正分布的样本。令人惊叹的结果是,可以从截断的样本执行可以有效地执行的任何统计任务(例如,在总变化距离,参数估计,均匀性或身份测试中的总体距离,参数估计,均匀性或身份测试),并且样本的小幅增加复杂。我们概括了我们对以D $替代品进行排序的方法,我们展示了肥胖意味着截断样本的Mallows模型的有效参数估计。探讨从截断样品学习离散模型的限制,我们确定了有效可识别性所需的三种自然条件:(i)截断设置$ S $应该富裕; (ii)应通过会员查询访问$ S $; (iii)以美元的截断为单位应留下足够的随机性。通过仔细调整(Daskalakis等,Focs 2018)的随机梯度血清方法,我们表明这些条件也足以高效地学习截断的布尔产品分布。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号