一种面向e-Science环境的多领域Web文本特征抽取模型

翁彧; 胡长军; 席强; 张学春

首页> 中文期刊>小型微型计算机系统 >一种面向e-Science环境的多领域Web文本特征抽取模型

一种面向e-Science环境的多领域Web文本特征抽取模型

开具论文收录证明 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

The traditional information extraction methods based on specific domain usually depend on the domain dictionaries to discover the text feature. It is inconvenient for reproducing and difficult to transplant in multi-domain environment. The application scope is limited seriously. Oriented to the deficiencies above, a multi-domain web text feature extraction model for e-Science is proposed (named e-WTDE). This model adopts the Chinese split words technology without dictionary into the process of multi-domain text feature discovery and avoids the dependency of domain dictionaries effectively. With the help of classification of common and individual features, the model tracks the generation and the development trend of domain events dynamically, and forms a couple of local data centers eventually. Through cooperative scheduling the domain knowledge between different local data centers, the knowledge utilization efficiency of the domain information in the global scope is improved sharply. To validate the performance, the experiments on the multi-domain text feature extraction, topic features dynamical tracking and the domain knowledge cooperative scheduling demonstrate that the model has higher application validity and practicality in e-Science environment.%传统领域信息抽取方法多依赖领域词典实现文本特征的发现,既不便于实验复现,也不易于其在多领域环境中移植与推广,严重制约了模型的应用范围.针对上述不足,提出一种适用于e-Science环境的多领域Web文本特征抽取模型(简称e-WTDE).该模型将无词典分词技术引入多领域文本特征发现过程,摆脱了对于领域词典的依赖;借助对领域主题及其具体事件中共性与个性特征的抽取与分类,模型动态追踪领域事件发生及其发展变化,并最终形成多个区域性数据中心;通过对各数据中心中领域知识的协同调度,有力提高了领域信息在全局范围内的利用效率.验证实验中分别对多领域特征抽取、主题特征动态追踪以及领域知识协同调度予以有效性验证,并进一步证明了模型的实用效果.

著录项

来源
《小型微型计算机系统》|2011年第1期|17-23|共7页
作者
翁彧; 胡长军; 席强; 张学春;
展开▼
作者单位

北京科技大学信息工程学院,北京,100083;

中央民族大学信息工程学院,北京,100081;

北京科技大学信息工程学院,北京,100083;

北京科技大学信息工程学院,北京,100083;

北京科技大学信息工程学院,北京,100083;

展开▼
原文格式 PDF
正文语种 chi
中图分类程序设计、软件工程;
关键词
e-science环境; 特征发现; 多领域数据模型; Web文本挖掘;
入库时间 2023-07-24 22:44:23

相似文献

中文文献
外文文献
专利

1. 基于领域本体的中文Web文本主题特征抽取方法 [J] . 朱恒民 ,马静 ,黄卫东 . 情报理论与实践 . 2008,第002期
2. 一种面向领域WEB服务的数据中心模型① [J] . . 计算机系统应用 . 2013,第006期
3. 一种面向多领域支持高可靠Web服务合成的服务发现模型 [J] . 申德荣 ,寇月 ,聂铁铮 . 小型微型计算机系统 . 2008,第003期
4. 面向领域的Web文本结构化分析 [J] . 杨春磊 ,刘念唐 ,林雨 . 合肥工业大学学报（自然科学版） . 2013,第003期
5. 面向Web文本关键词自动抽取的DON模型研究 [J] . 彭浩 ,蔡美玲 ,王瑞龙 . 计算机工程与应用 . 2012,第031期
6. 一种基于本体的面向特定领域的Web服务匹配方法 [C] . . 2008全国软件与应用学术会议(NASAC'08) . 2008
7. 面向领域的构件化的WEB辅助开发环境研究 [A] . 昂卫武 . 2001

一种面向e-Science环境的多领域Web文本特征抽取模型

摘要

著录项

相似文献

相关主题

期刊订阅