首页> 外文期刊>ACM transactions on knowledge discovery from data >Identifying Linear Models in Multi-Resolution Population Data Using Minimum Description Length Principle to Predict Household Income
【24h】

Identifying Linear Models in Multi-Resolution Population Data Using Minimum Description Length Principle to Predict Household Income

机译:使用最小描述长度原则识别多分辨率群体数据中的线性模型,以预测家庭收入

获取原文
获取原文并翻译 | 示例
           

摘要

One shirt size cannot fit everybody, while we cannot make a unique shirt that fits perfectly for everyone because of resource limitations. This analogy is true for policy making as well. Policy makers cannot make a single policy to solve all problems for all regions because each region has its own unique issue. At the other extreme, policy makers also cannot make a policy for each small village due to resource limitations. Would it be better if we can find a set of largest regions such that the population of each region within this set has common issues and we can make a single policy for them? In this work, we propose a framework using regression analysis and Minimum Description Length (MDL) to find a set of largest areas that have common indicators, which can be used to predict household incomes efficiently. Given a set of household features, and a multi-resolution partition that represents administrative divisions, our framework reports a set C* of largest subdivisions that have a common predictive model for population-income prediction. We formalize the problem of finding C* and propose an algorithm that can find C* correctly. We use both simulation datasets as well as a real-world dataset of Thailand's population household information to demonstrate our framework performance and application. The results show that our framework performance is better than the baseline methods. Moreover, we demonstrate that the results of our method can be used to find indicators of income prediction for many areas in Thailand. By adjusting these indicator values via policies, we expect people in these areas to gain more incomes. Hence, the policy makers will be able to make policies by using these indicators in our results as a guideline to solve low-income issues. Our framework can be used to support policy makers in making policies regarding any other dependent variable beyond income in order to combat poverty and other issues. We provide the R package, MRReg, which is the implementation of our framework in the R language. The MRReg package comes with a documentation for anyone who is interested in analyzing linear regression on multi-resolution population data.
机译:一件衬衫尺寸不能符合每个人,而我们不能制作一个独特的衬衫,因为资源限制而适合每个人。对于政策制作,这种类比也是如此。政策制定者不能进行单一的政策来解决所有地区的所有问题,因为每个地区都有自己的独特问题。在另一个极端,政策制定者由于资源限制而无法为每个小村庄作出政策。如果我们能找到一组最大的地区,使得这套中每个区域的人口有常见问题,那会更好,我们可以为他们制作一个单一的政策吗?在这项工作中,我们提出了一种使用回归分析和最小描述长度(MDL)的框架来查找具有共同指标的一组最大区域,可用于有效地预测家庭收入。鉴于一系列家庭功能,以及代表行政区划的多分辨率分区,我们的框架报告了一个集的C *最大细分的集合,具有普通的人口收入预测预测模型。我们正规化找到C *的问题,并提出了一种可以正确找到C *的算法。我们使用仿真数据集以及泰国人口家庭信息的真实数据集,以展示我们的框架性能和应用。结果表明,我们的框架性能优于基线方法。此外,我们证明了我们的方法的结果可用于找到泰国许多地区的收入预测指标。通过策略调整这些指示值,我们希望这些地区的人们获得更多收入。因此,政策制定者将能够在我们的结果中使用这些指标作为解决低收入问题的指导方针进行政策。我们的框架可用于支持政策制定者制定关于任何其他依赖变量的政策,以便打击贫困和其他问题。我们提供R包,MRREG,它是我们在R语言中实施我们的框架。 MRREG套餐为任何有兴趣分析多分辨率群体数据的线性回归的人提供文件。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号