首页>
外国专利>
Learning word segmentation from non-white space languages corpora
Learning word segmentation from non-white space languages corpora
展开▼
机译:从非空白语言语料库学习分词
展开▼
页面导航
摘要
著录项
相似文献
摘要
Illustrative embodiments provide a computer implemented method, apparatus, and computer program product for learning word segmentation from non-white space language corpora. In one illustrative embodiment, the computer implemented method receives text input characters and calculates a ratio-measure for each pair of characters in the input characters. The computer implemented method further determines whether the ratio-measure of each pair of characters is equal to a predetermined threshold value. Responsive to determining the ratio-measure is less than the predetermined threshold value, and a local-minimum value, the computer method further identifies the pair as a weak pair and breaks the weak pair of characters.
展开▼