首页> 外国专利> Automatic Extraction of Domain Specific Terminology from a Large Corpus

Automatic Extraction of Domain Specific Terminology from a Large Corpus

机译:从大型语料库中自动提取领域特定术语

摘要

A method of extracting jargon from a document corpus stored in a database using a processor and a user interface is described herein. A sub-domain input is entered through the user interface to initiate a review of the document corpus stored in the database. The processor separates the document corpus into at least one sub-corpus and a remainder corpus. The at least one sub-corpus is defined by the sub-domain input. A first topic model and a second topic model are built to generate respective topic similarity scores for at least one term extracted from the at least one sub-corpus and at least one corresponding term extracted from the remainder corpus. The respective topic similarity scores are compared by the processor to identify jargon terms and thereby provide a list of j argon terms through the user interface.
机译:本文描述了一种使用处理器和用户界面从存储在数据库中的文档语料库中提取行话的方法。通过用户界面输入子域输入,以启动对存储在数据库中的文档语料库的审阅。处理器将文档语料库分为至少一个子语料库和其余语料库。至少一个子语料库由子域输入定义。建立第一主题模型和第二主题模型以生成针对从至少一个子语料库中提取的至少一个术语和从其余语料库中提取的至少一个对应术语的相应主题相似性分数。处理器比较各个主题相似性分数,以识别术语,从而通过用户界面提供j个氩气术语的列表。

著录项

相似文献

  • 专利
  • 外文文献
  • 中文文献
获取专利

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号