首页> 美国卫生研究院文献>The Scientific World Journal >Application of Multiple Unsupervised Models to Validate Clusters Robustness in Characterizing Smallholder Dairy Farmers
【2h】

Application of Multiple Unsupervised Models to Validate Clusters Robustness in Characterizing Smallholder Dairy Farmers

机译:应用多个无监督模型验证小农户奶农特征的集群稳健性

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

The heterogeneity of smallholder dairy production systems complicates service provision, information sharing, and dissemination of new technologies, especially those needed to maximize productivity and profitability. In order to obtain homogenous groups within which interventions can be made, it is necessary to define clusters of farmers who undertake similar management activities. This paper explores robustness of production cluster definition using various unsupervised learning algorithms to assess the best approach to define clusters. Data were collected from 8179 smallholder dairy farms in Ethiopia and Tanzania. From a total of 500 variables, selection of the 35 variables used in defining production clusters and household membership to these clusters was determined by Principal Component Analysis and domain expert knowledge. Three clustering algorithms, K-means, fuzzy, and Self-Organizing Maps (SOM), were compared in terms of their grouping consistency and prediction accuracy. The model with the least household reallocation between clusters for training and testing data was deemed the most robust. Prediction accuracy was obtained by fitting a model with fixed effects model including production clusters on milk yield, sales, and choice of breeding method. Results indicated that, for the Ethiopian dataset, clusters derived from the fuzzy algorithm had the highest predictive power (77% for milk yield and 48% for milk sales), while for the Tanzania data, clusters derived from Self-Organizing Maps were the best performing. The average cluster membership reallocation was 15%, 12%, and 34% for K-means, SOM, and fuzzy, respectively, for households in Ethiopia. Based on the divergent performance of the various algorithms evaluated, it is evident that, despite similar information being available for the study populations, the uniqueness of the data from each country provided an over-riding influence on cluster robustness and prediction accuracy. The results obtained in this study demonstrate the difficulty of generalizing model application and use across countries and production systems, despite seemingly similar information being collected.
机译:小农户乳制品生产系统的异质性使服务提供,信息共享和新技术的传播变得复杂,特别是那些需要最大化生产率和利润的技术。为了获得可以在其中进行干预的同质群体,有必要定义从事类似管理活动的农民群体。本文探索了使用各种无监督学习算法评估定义集群的最佳方法的生产集群定义的鲁棒性。数据是从埃塞俄比亚和坦桑尼亚的8179个小型奶牛场收集的。通过主成分分析和领域专家知识,从总共500个变量中选择了35个用于定义生产集群和家庭成员资格的变量。比较了三种聚类算法,即K-均值,模糊和自组织映射(SOM),它们的分组一致性和预测准确性均很高。在集群之间用于训练和测试数据的家庭重新分配最少的模型被认为是最可靠的。通过将模型与固定效应模型拟合在一起,可以得到预测准确性,该模型包括关于牛奶产量,销售量和育种方法选择的生产集群。结果表明,对于埃塞俄比亚数据集,从模糊算法得出的聚类具有最高的预测能力(牛奶产量的77%和牛奶销售的48%),而坦桑尼亚的数据中,自组织图得出的聚类最好。表演。埃塞俄比亚家庭的K均值,SOM和Fuzzy的平均集群成员重新分配率分别为15%,12%和34%。基于所评估的各种算法的不同性能,很明显,尽管可供研究人群使用的信息相似,但每个国家/地区的数据的唯一性都对聚类的鲁棒性和预测准确性产生了至关重要的影响。尽管收集了看似相似的信息,但本研究获得的结果证明了在各个国家和生产系统中推广模型应用和使用的难度。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号