...
首页> 外文期刊>Computers, Materials & Continua >Information Classification and Extraction on Official Web Pages of Organizations
【24h】

Information Classification and Extraction on Official Web Pages of Organizations

机译:关于组织官方网页的信息分类和提取

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

As a real-time and authoritative source, the official Web pages of organizations contain a large amount of information. The diversity of Web content and format makes it essential for pre-processing to get the unified attributed data, which has the value of organizational analysis and mining. The existing research on dealing with multiple Web scenarios and accuracy performance is insufficient. This paper aims to propose a method to transform organizational official Web pages into the data with attributes. After locating the active blocks in the Web pages, the structural and content features are proposed to classify information with the specific model. The extraction methods based on trigger lexicon and LSTM (Long Short-Term Memory) are proposed, which efficiently process the classified information and extract data that matches the attributes. Finally, an accurate and efficient method to classify and extract information from organizational official Web pages is formed. Experimental results show that our approach improves the performing indicators and exceeds the level of state of the art on real data set from organizational official Web pages.
机译:作为一个实时和权威来源,组织的官方网页包含大量信息。 Web内容和格式的多样性使得预处理是必不可少的,以获取统一的属性数据,这具有组织分析和挖掘的价值。关于处理多个Web场景和准确性性能的现有研究不足。本文旨在提出一种将组织官方网页转换为具有属性的数据的方法。在将主动块定位在网页之后,建议构造和内容特征对特定模型进行分类信息。提出了基于触发词典和LSTM(长短期存储器)的提取方法,从而有效地处理符合属性的分类信息和提取数据。最后,形成了准确和有效的方法来分类和提取组织官方网页的信息。实验结果表明,我们的方法改善了表演指标,并超出了来自组织官方网页的真实数据上的最新状态。

著录项

  • 来源
    《Computers, Materials & Continua》 |2020年第3期|2057-2073|共17页
  • 作者单位

    School of Computer Science and Technology Harbin Institute of Technology Harbin 150006 China;

    School of Computer Science and Technology Harbin Institute of Technology Harbin 150006 China;

    School of Computer Science and Technology Harbin Institute of Technology Harbin 150006 China;

    School of Computer Science and Technology Harbin Institute of Technology Harbin 150006 China;

    School of Computer Science and Technology Harbin Institute of Technology Harbin 150006 China;

    China Electronic Equipment System Engineering Company Beijing 100039 China;

  • 收录信息
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类
  • 关键词

    Web pre-process; feature classification; data extraction; trigger lexicon; LSTM;

    机译:网预流程;特征分类;数据提取;触发词典;LSTM.;

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号