【24h】

Web information extraction based on hidden Markov model

机译:基于隐马尔可夫模型的Web信息提取

获取原文
获取原文并翻译 | 示例

摘要

This paper proposes a semantic-block-based hidden Markov model. Semantic block is segmented from the elicited information of various websites based on their characteristic of semi-structure. The model adopts semantic block as the basic element in an observation sequence, replacing the original element — word, in order to improve the accuracy and efficiency of the transition matrix. Also, it optimizes the observation probability distribution and the estimation accuracy of state transition sequence by adopting the “voting strategy” and modifying Viterbi algorithm. In the end, the experiment results are able to show that the new model and algorithms give satisfying performance in recall and precision for web information extraction.
机译:本文提出了一种基于语义块的隐马尔可夫模型。根据各个网站的半结构特征,从各个网站的信息中分割出语义块。该模型采用语义块作为观察序列中的基本元素,代替了原始元素(单词),以提高转换矩阵的准确性和效率。此外,它通过采用“投票策略”并修改了Viterbi算法,优化了观察概率分布和状态转换序列的估计精度。最后,实验结果表明,新的模型和算法在召回率和精确度方面都具有令人满意的性能。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号