首页> 外文会议>International Conference on Computing and Information Technology >Thai Word Safe Segmentation with Bounding Extension for Data Indexing in Search Engine
【24h】

Thai Word Safe Segmentation with Bounding Extension for Data Indexing in Search Engine

机译:泰语单词安全分割,用于搜索引擎中的数据索引的边界扩展

获取原文

摘要

Word segmentation ambiguity in Thai language affects data indexing process by creating the inverted index relatively to the segmentation results. This phenomenon leads to unreasonable search result. This article proposes Thai word Safe segmentation algorithm using dictionary to solve this problem so that all different terms in an ambiguous part of the sentence are queryable. Next, it shows the bounding extension to improve Safe segmentation performance. It also compares several off-the-shelf implementations of the trie data structure which -we believe- is the best data structure for dictionary-based Thai word segmentation and compares the efficiency of serializable libraries for deserializing trie in the analyzer's initial state. Finally, it evaluates the Safe segmentation with several implementations called Safe Analyzer. The experimental results also show that the linked-list Trie and Protostuff library give the outstanding results. The Safe segmentation can definitely solve the ambiguity problem but still it could not solve the misspell within text accurately.
机译:泰语语言中的单词分割歧义通过相对于分段结果创建反转索引来影响数据索引过程。这种现象导致不合理的搜索结果。本文提出了使用字典来解决此问题的泰语安全分割算法,以便在句子的模糊部分中的所有不同术语都是查询的。接下来,它显示了提高安全分割性能的边界扩展。它还比较了TRIE数据结构的若干现成的实现 - 我们相信 - 是基于字典的泰语字分割的最佳数据结构,并比较了分析仪的初始状态下的可序列化库的效率。最后,它使用称为安全分析仪的若干实施方式评估安全分割。实验结果还表明,联系人的Trie和质子化图书馆给出了出色的结果。安全细分肯定可以解决歧义问题,但仍然无法准确地解决文本内的拼盘。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号