Mining Pinyin-to-character conversion rules from large-scale corpus: a rough set approach

Xiaolong W.; Chen Qingcai; Yeung D.S.

首页> 外文期刊>IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics >Mining Pinyin-to-character conversion rules from large-scale corpus: a rough set approach

【24h】

Mining Pinyin-to-character conversion rules from large-scale corpus: a rough set approach

机译：从大型语料库挖掘拼音到字符的转换规则：一种粗糙集方法

获取原文

获取原文并翻译 | 示例

开具论文收录证明 >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

The paper introduces a rough set technique for solving the problem of mining Pinyin-to-character (PTC) conversion rules. It first presents a text-structuring method by constructing a language information table from a corpus for each pinyin, which it will then apply to a free-form textual corpus. Data generalization and rule extraction algorithms can then be used to eliminate redundant information and extract consistent PTC conversion rules. The design of our model also addresses a number of important issues such as the long-distance dependency problem, the storage requirements of the rule base, and the consistency of the extracted rules, while the performance of the extracted rules as well as the effects of different model parameters are evaluated experimentally. These results show that by the smoothing method, high precision conversion (0.947) and recall rates (0.84) can be achieved even for rules represented directly by pinyin rather than words. A comparison with the baseline tri-gram model also shows good complement between our method and the tri-gram language model.

机译：本文介绍了一种粗糙集技术，用于解决挖掘拼音到字符（PTC）转换规则的问题。它首先提出了一种文本构造方法，即通过从每个拼音的语料库构建语言信息表，然后将其应用于自由格式的文本语料库。然后，可以使用数据概括和规则提取算法来消除冗余信息并提取一致的PTC转换规则。我们模型的设计还解决了许多重要问题，例如长距离依赖问题，规则库的存储要求以及提取的规则的一致性，而提取的规则的性能以及实验评估了不同的模型参数。这些结果表明，通过平滑方法，即使对于直接由拼音而不是单词表示的规则，也可以实现高精度转换（0.947）和召回率（0.84）。与基线三元语法模型的比较也显示了我们的方法和三元语法语言模型之间的良好互补。

著录项

来源
《IEEE transactions on systems, man, and cybernetics. Part B, Cybernetics》 |2004年第2期|p.834-844|共11页
作者
Xiaolong W.; Chen Qingcai; Yeung D.S.;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自动化基础理论;
关键词
data mining; natural languages; rough set theory; text analysis; Pinyin-to-character conversion rule mining; baseline tri-gram model; consistent PTC conversion rules; data generalization; free-form textual corpus; high precision conversion; language information ta;

机译：数据挖掘;自然语言;粗糙集理论;文本分析;拼音到字符转换规则挖掘;基线三元模型;一致的PTC转换规则;数据概括;自由格式文本语料库;高精度转换;语言信息ta;

相似文献

外文文献
中文文献
专利

1. Usage of Fuzzy, Rough, and Soft Set Approach in Association Rule Mining [J] . Satya Ranjan Dash, Satchidananda Dehuri, Uma kant Sahoo International journal of artificial life research . 2012,第3期

机译：模糊，粗糙和软集方法在关联规则挖掘中的使用
2. A New Data Mining Approach Combined with Extension Set and Rough Set [J] . Zhi-hang Tang, Wen-bin Tian Journal of software . 2014,第2期

机译：扩展集和粗糙集相结合的数据挖掘新方法
3. A New Data Mining Approach Combined with Extension Set and Rough Set [J] . Zhi-hang Tang, Wen-bin Tian Journal of Computers . 2014,第2期

机译：扩展集和粗糙集相结合的数据挖掘新方法
4. Approach for Mining Fault Rules of Power Grid based on the Combination of Rough Set Theory and Association Rule [C] . Yongchao Liang, Xijia Zhang, Zhou Peng International Conference on Measurement, Instrumentation and Automation . 2013

机译：基于粗糙集理论与关联规则组合的电网采矿故障规则的方法
5. Learning rules from examples under uncertainty--an approach based on rough-set boundaries and entropy. [D] . Chan, Chien-Chung. 1989

机译：从不确定情况下的示例中学习规则-一种基于粗糙边界和熵的方法。
6. Preference Mining Using Neighborhood Rough Set Model on Two Universes [O] . Kai Zeng 2016

机译：在两个宇宙上使用邻域粗糙集模型进行优先挖掘
7. Mining Pinyin-to-Character Conversion Rules From Large-Scale Corpus: A Rough Set Approach [O] . Wang Xiaolong, Chen Qingcai, Daniel S. Yeung, 2014

机译：从大规模语料库挖掘拼音到字符转换规则：粗糙集方法
8. Measuring uncertainty by extracting fuzzy rules using rough sets and extracting fuzzy rules under uncertainty and measuring definability using rough sets [R] . Worm, Jeffrey A., Culas, Donald E. 1991

机译：通过粗糙集提取模糊规则并在不确定条件下提取模糊规则并使用粗糙集测量可定义性来测量不确定性

Mining Pinyin-to-character conversion rules from large-scale corpus: a rough set approach

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅