【24h】

PicAChoo

机译:PicAChoo

获取原文

摘要

Although documents have hundreds of thousands of unique words, only a small number of words are significantly useful for intelligent services. For this reason, feature extraction has become an important issue to be addressed in various fields, such as information retrieval, text mining, pattern recognition, etc. Numerous supporting tools for feature extraction are available, but most of them deal with text as a simple literal. Unfortunately, text is not just a literal, but a semantically significant unit including linguistic characteristics. So, we need customized extraction methods that consider the characteristics of source documents. PicAChoo stands for 'Pick And Choose', and it provides an environment which enables feature extraction methods using the structure of sentences and the part-of-speech information of words. Moreover, we suggest dynamic composition of different extraction methods without hard-coding.
机译:尽管文档具有成千上万的唯一单词,但是对于智能服务而言,只有很少的单词非常有用。因此,特征提取已成为信息检索,文本挖掘,模式识别等各个领域中要解决的重要问题。可以使用多种支持特征提取的工具,但大多数工具都将文本处理简单化。文字。不幸的是,文本不仅是文字,而且是包括语言特征在内的语义上重要的单元。因此,我们需要考虑源文档特征的定制提取方法。 PicAChoo代表“选择”,它提供了一个环境,该环境可以使用句子的结构和词的词性信息来进行特征提取方法。此外,我们建议在不进行硬编码的情况下动态提取不同提取方法的成分。

著录项

相似文献

  • 外文文献
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号