【24h】

Labeling Turkish news stories with CRF

机译:用CRF标记土耳其新闻故事

获取原文
获取原文并翻译 | 示例

摘要

Drastically document increase in Web requires semantic web applications in order to lead the Web to its full potential. Extracting important phrases in a document facilitates finding expected information. In this paper, a new approach that is labeling the main subject, main predicate, main location and main date of an electronic document is introduced. The main subject label tells whom or what the document about. The main predicate label tells what the subject is or does. The main location label tells where the activities passed and the main date label tells when the document passed. With the help of this new methodology, extraction of not only high level description of the content, but also the attribute of a phrase in a document is provided. As experimental set, Turkish news stories are selected. To use as a training and test set, manual labeling is made by human annotators. Then, different models for each label are implemented to extract the labels automatically and they are compared to manually labeled results to evaluation process of this study.
机译:Web中文档的大量增加需要语义Web应用程序才能使Web发挥其全部潜能。在文档中提取重要短语有助于查找期望的信息。本文介绍了一种标记电子文档的主要主题,主要谓词,主要位置和主要日期的新方法。主要主题标签会告知文档的有关对象或对象。主谓词标签告诉主题是什么或做什么。主要位置标签告知活动在何处传递,主要日期标签告知文档何时通过。借助这种新方法,不仅可以提取内容的高级描述,还可以提取文档中短语的属性。作为实验集,选择了土耳其新闻报道。要用作培训和测试集,人工标注人员可以手动标记。然后,为每个标签实施不同的模型以自动提取标签,并将它们与手动标记的结果进行比较,以进行本研究的评估过程。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号