首页> 外文会议>International Conference on Knowledge and Systems Engineering >Keyphrase generation for Vietnamese administrative documents: a collaborative approach
【24h】

Keyphrase generation for Vietnamese administrative documents: a collaborative approach

机译:越南行政文件的关键正常生成:一种协作方法

获取原文

摘要

Keyphrases of a given document can be considered as its condensed summary. Unsupervised models focus on extracting keyphrases based only on the information contained in that document without interacting with other documents. While a good performance supervised learning model for keyphrase generation requires a massive effort to build training data, which can not generalize to new domains. Moreover, according to human perception, a user would comprehend the topic expressed in a document better if that user has already read other documents that express the same topic. Based on the above idea, we proposed a collaborative keyphrase generation system (CollabKG): a novel semi-supervised method by leveraging limited labeled data. The amount of labeled data will be enriched over time by the user. In our work, we conduct research on a large scale dataset consisting of 500,000 Vietnamese administrative documents. In CollabKG, each document is represented as a feature vector, and a cluster pruning algorithm is employed to accelerate finding the most similar documents. The generated keyphrases were manually evaluated for relevance and accuracy. In the final, the result we achieved shows high ratification. Therefore, we can conclude that CollabKG has good performance and fits a real-time system.
机译:可以将给定文件的关键短缺视为其浓缩摘要。无监督的模型专注于仅基于该文档中包含的信息,而不与其他文档进行交互。虽然关键字一代的良好性能监督学习模型需要大量努力来构建培训数据,但不能概括到新域。此外,根据人类的感知,如果该用户已经读取了表达相同主题的其他文档,则用户将更好地理解文档中表达的主题。基于上述思想,我们提出了一种协作关键的基础酶生成系统(Collabkg):通过利用有限标记数据来实现新的半监督方法。用户将随时间富集标记数据的量。在我们的工作中,我们对由500,000名越南行政文件组成的大型数据集进行研究。在Collabkg中,每个文档被表示为特征向量,并且采用群集修剪算法来加速查找最相似的文档。手动评估生成的关键势以获取相关性和准确性。在决赛中,我们实现的结果显示出高估值。因此,我们可以得出结论,Collabkg具有良好的性能并适合实时系统。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号