首页> 外文会议>International Conference on Enterprise Information Systems >Natural Language Processing Techniques for Document Classification in IT Benchmarking Automated Identification of Domain Specific Terms
【24h】

Natural Language Processing Techniques for Document Classification in IT Benchmarking Automated Identification of Domain Specific Terms

机译:自然语言处理文档分类的技术在其基准测试自动识别域特定术语

获取原文

摘要

In the domain of IT benchmarking collected data are often stored in natural language text and therefore intrinsically unstructured. To ease data analysis and data evaluations across different types of IT benchmarking approaches a semantic representation of this information is crucial. Thus, the identification of conceptual (se-mantical) similarities is the first step in the development of an integrative data management in this domain. As an ontology is a specification of such a conceptualization an association of terms, relations between terms and related instances must be developed. Building on previous research we present an approach for an automated term extraction by the use of natural language processing (NLP) techniques. Terms are automatically extracted out of existing IT benchmarking documents leading to a domain specific dictionary. These extracted terms are representative for each document and describe the purpose and content of each file and server as a basis for the ontology development process in the domain of IT benchmarking.
机译:在它的域名中,基准测试收集的数据通常存储在自然语言文本中,因此本质上是非结构化的。为了简化不同类型的数据分析和数据评估,基准测试方法此信息的语义表示至关重要。因此,识别概念(SE-LAN​​TICAL)相似性是在该域中开发集成数据管理的第一步。作为本体的规范是这种概念化的规范,必须开发术语和相关实例之间的关系。在以前的研究中构建我们通过使用自然语言处理(NLP)技术来提取自动化术语提取方法。术语被自动从现有的IT基准测试文档中提取,导致域特定词典。这些提取的术语是每个文档的代表性,并描述每个文件和服务器的目的和内容作为IT基准测试域中的本体开发过程的基础。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号