首页> 中文期刊>小型微型计算机系统 >一种面向e-Science环境的多领域Web文本特征抽取模型

一种面向e-Science环境的多领域Web文本特征抽取模型

     

摘要

The traditional information extraction methods based on specific domain usually depend on the domain dictionaries to discover the text feature. It is inconvenient for reproducing and difficult to transplant in multi-domain environment. The application scope is limited seriously. Oriented to the deficiencies above, a multi-domain web text feature extraction model for e-Science is proposed (named e-WTDE). This model adopts the Chinese split words technology without dictionary into the process of multi-domain text feature discovery and avoids the dependency of domain dictionaries effectively. With the help of classification of common and individual features, the model tracks the generation and the development trend of domain events dynamically, and forms a couple of local data centers eventually. Through cooperative scheduling the domain knowledge between different local data centers, the knowledge utilization efficiency of the domain information in the global scope is improved sharply. To validate the performance, the experiments on the multi-domain text feature extraction, topic features dynamical tracking and the domain knowledge cooperative scheduling demonstrate that the model has higher application validity and practicality in e-Science environment.%传统领域信息抽取方法多依赖领域词典实现文本特征的发现,既不便于实验复现,也不易于其在多领域环境中移植与推广,严重制约了模型的应用范围.针对上述不足,提出一种适用于e-Science环境的多领域Web文本特征抽取模型(简称e-WTDE).该模型将无词典分词技术引入多领域文本特征发现过程,摆脱了对于领域词典的依赖;借助对领域主题及其具体事件中共性与个性特征的抽取与分类,模型动态追踪领域事件发生及其发展变化,并最终形成多个区域性数据中心;通过对各数据中心中领域知识的协同调度,有力提高了领域信息在全局范围内的利用效率.验证实验中分别对多领域特征抽取、主题特征动态追踪以及领域知识协同调度予以有效性验证,并进一步证明了模型的实用效果.

著录项

相似文献

  • 中文文献
  • 外文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号