首页> 外文会议>International Conference on Data and Software Engineering >Handling Out of Vocabulary in supervised event extraction on Indonesian tweets: Using word representation, word list, word context and document level features
【24h】

Handling Out of Vocabulary in supervised event extraction on Indonesian tweets: Using word representation, word list, word context and document level features

机译:处理印度尼西亚推文的受监管事件提取中的词汇:使用单词表示,单词列表,单词上下文和文档级别功能

获取原文

摘要

Extracting event information from Twitter is still promising since there are many Twitter accounts built just to spread the event information broadly. The most difficult part to extract event information is the Out of Vocabulary (OOV) problem, especially for event name. Here, we tried to enhance the features used for our supervised event extraction. These features include the word representation (skip-gram model and brown cluster), word list (event name and event location), word context and document level feature. By using CRF as the classification algorithm, 4 fold cross validation technique, and 1,300 tweets, the best F-Measure score achieved for OOV cases was 0.6 which is a significant improvement compared to the baseline of 0.445. The enhanced features also improved the F-Measure score for all vocabulary case from 0.693 (baseline) into 0.814 (proposed).
机译:从Twitter提取事件信息仍然很有希望,因为建立了许多Twitter帐户只是为了广泛传播事件信息。提取事件信息最困难的部分是词汇不足(OOV)问题,尤其是对于事件名称而言。在这里,我们试图增强用于监督事件提取的功能。这些功能包括单词表示(跳过语法模型和棕色群集),单词列表(事件名称和事件位置),单词上下文和文档级别功能。通过使用CRF作为分类算法,4倍交叉验证技术和1,300条推文,OOV案例获得的最佳F-Measure得分为0.6,与基线0.445相比有显着提高。增强的功能还将所有词汇情况的F-Measure分数从0.693(基准)提高到0.814(建议)。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号