首页> 外文会议>International Conference on Knowledge Engineering and Applications >Knowledge extraction through etymological networks: Synonym discovery in Sino-Korean words
【24h】

Knowledge extraction through etymological networks: Synonym discovery in Sino-Korean words

机译:通过词源网络提取:中朝鲜语的同义词发现

获取原文

摘要

Extracting knowledge from a text is a very active area of research. Techniques such as word embedding and LSA have brought great breakthroughs and have been used in applications such as automatic translation. We propose a novel approach to extract knowledge from text that relies on a graph to express the complex etymological structures formed by the historical roots of words. Our approach is specially fit for the study of Sino-Korean vo- cabulary, where the etymological roots of words are clearly shown in their writing. We use our approach to build a bipartite graph based on the Chinese etymological roots of Sino-Korean words, and then use the network structure to extract features describing pairs of nodes. We used these features in a classification scheme to discover pairs of nodes that represent synonym characters. Our model is simpler than previous work on synonym discovery with Chinese characters, and obtains good results. The code and data for our work are made openly available.
机译:从文本中提取知识是一个非常活跃的研究领域。 Word Embedding和LSA等技术带来了很大的突破,已用于自动翻译等应用中。我们提出了一种提取文本中提取知识的新方法,依赖于图表,以表达由单词历史根系形成的复杂的导源结构。我们的方法是特别适合研究中韩国vo-cabulary的研究,其中单词的词源根源是在他们的写作中进行的。我们使用我们的方法基于中朝鲜单词的汉族词源根来构建二角形图,然后使用网络结构来提取描述节点对的特征。我们在分类方案中使用了这些功能,以发现代表同义词字符的对节点。我们的模型比以前的汉字在同义词发现上的工作更简单,获得了良好的效果。我们工作的代码和数据是公开的。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号