首页>
外国专利>
Automatic Extraction of Domain Specific Terminology from a Large Corpus
Automatic Extraction of Domain Specific Terminology from a Large Corpus
展开▼
机译:从大型语料库中自动提取领域特定术语
展开▼
页面导航
摘要
著录项
相似文献
摘要
A method of extracting jargon from a document corpus stored in a database using a processor and a user interface is described herein. A sub-domain input is entered through the user interface to initiate a review of the document corpus stored in the database. The processor separates the document corpus into at least one sub-corpus and a remainder corpus. The at least one sub-corpus is defined by the sub-domain input. A first topic model and a second topic model are built to generate respective topic similarity scores for at least one term extracted from the at least one sub-corpus and at least one corresponding term extracted from the remainder corpus. The respective topic similarity scores are compared by the processor to identify jargon terms and thereby provide a list of j argon terms through the user interface.
展开▼