首页>
外国专利>
Optimization of text-based training set selection for language processing modules
Optimization of text-based training set selection for language processing modules
展开▼
机译:针对语言处理模块的基于文本的培训集选择的优化
展开▼
页面导航
摘要
著录项
相似文献
摘要
A device and a method provide for selection of a database from a corpus using an, optimization function. The method includes defining a size of a database, calculating a distance using a distance function for each pair in a set of pairs, and executing an optimization function using the distance to select each entry saved in the database until the number of saved entries equals the size of the database. Each pair in the set of pairs includes either two entries selected from a corpus or one entry selected from a set of previously selected entries and another entry selected from a set of a remaining portion of the corpus. The distance function may be a Levenshtein distance function or a generalized Levenshtein distance function.
展开▼