首页> 外文OA文献 >Optimal string clustering based on a Laplace-like mixture and EM algorithm on a set of strings
【2h】

Optimal string clustering based on a Laplace-like mixture and EM algorithm on a set of strings

机译:基于LAPLACS的混合和EM算法在一组字符串上最佳串集群

代理获取
本网站仅为用户提供外文OA文献查询和代理获取服务,本网站没有原文。下单后我们将采用程序或人工为您竭诚获取高质量的原文,但由于OA文献来源多样且变更频繁,仍可能出现获取不到、文献不完整或与标题不符等情况,如果获取不到我们将提供退款服务。请知悉。

摘要

In this study, we address the problem of clustering string data in anunsupervised manner by developing a theory of a mixture model and an EMalgorithm for string data based on probability theory on a topological monoidof strings developed in our previous studies. We first construct a parametricdistribution on a set of strings in the motif of the Laplace distribution on aset of real numbers and reveal its basic properties. This Laplace-likedistribution has two parameters: a string that represents the location of thedistribution and a positive real number that represents the dispersion. It isdifficult to explicitly write maximum likelihood estimators of the parametersbecause their log likelihood function is a complex function, the variables ofwhich include a string; however, we construct estimators that almost surelyconverge to the maximum likelihood estimators as the number of observed stringsincreases and demonstrate that the estimators strongly consistently estimatethe parameters. Next, we develop an iteration algorithm for estimating theparameters of the mixture model of the Laplace-like distributions anddemonstrate that the algorithm almost surely converges to the EM algorithm forthe Laplace-like mixture and strongly consistently estimates its parameters asthe numbers of observed strings and iterations increase. Finally, we derive aprocedure for unsupervised string clustering from the Laplace-like mixture thatis asymptotically optimal in the sense that the posterior probability of makingcorrect classifications is maximized.
机译:在这项研究中,我们通过开发混合模型理论和基于在我们以前的研究中发育的拓扑长弦串的概率理论的符号数据的串数据的理论,解决了串联数据的串行数据的问题。我们首先在LAPPALL分布上的一组字符串上构建参数分布,在Real Numbers的ASET上,并揭示其基本属性。这艘Laplace-LikeDistRibution有两个参数:一个字符串,代表分区的位置和代表分散的正面实数。它是明确地写出参数的最大似然估计值的isdffifificigultause,因为它们的日志似然函数是一个复杂的函数,其中包含一个字符串;然而,我们构建几乎肯定地将最大似然估计变频器的估算器作为观察到的弦序列的数量,并证明了估计变得强烈一致地估计参数。接下来,我们开发一种估计LAPAPLITS样本的混合模型的分参数的迭代算法,并且该算法几乎肯定地将LAPLACE的混合物达到了EM算法,并强烈一致地估计其参数观察串的数量和迭代的参数增加。最后,我们从拉普拉斯特的混合物中派生了无监督的弦乐群,即渐近最佳的渐近概率最大化,即制造矫正分类的后验概率最大化。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
代理获取

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号