首页> 外文期刊>Engineering Applications of Artificial Intelligence >Machine learning of syntactic parse trees for search and classification of text
【24h】

Machine learning of syntactic parse trees for search and classification of text

机译:语法分析树的机器学习,用于文本搜索和分类

获取原文
获取原文并翻译 | 示例

摘要

We build an open-source toolkit which implements deterministic learning to support search and text classification tasks. We extend the mechanism of logical generalization towards syntactic parse trees and attempt to detect weak semantic signals from them. Generalization of syntactic parse tree as a syntactic similarity measure is defined as the set of maximum common sub-trees and performed at a level of paragraphs, sentences, phrases and individual words. We analyze semantic features of such similarity measure and compare it with semantics of traditional anti-unification of terms. Nearest-neighbor machine learning is then applied to relate a sentence to a semantic class. Using syntactic parse tree-based similarity measure instead of bag-of-words and keyword frequency approach, we expect to detect a weak semantic signal otherwise unobservable. The proposed approach is evaluated in a four distinct domains where a lack of semantic information makes classification of sentences rather difficult. We describe a toolkit which is a part of Apache Software Foun-dation project OpenNLP, designed to aid search engineers in tasks requiring text relevance assessment.
机译:我们构建了一个开源工具包,该工具包实现确定性学习以支持搜索和文本分类任务。我们将逻辑泛化机制扩展到语法分析树,并尝试从中检测出微弱的语义信号。句法分析树作为句法相似性度量的通用化被定义为最大公共子树的集合,并在段落,句子,短语和单个单词的级别上执行。我们分析了这种相似性度量的语义特征,并将其与传统的术语反统一语义进行了比较。然后应用最近邻机器学习将句子与语义类相关联。我们期望使用基于句法分析树的相似性度量来代替词袋和关键字频率方法,从而期望检测到微弱的语义信号,否则无法观察到。在缺少语义信息的情况下,很难对句子进行分类,在四个不同的领域对提出的方法进行了评估。我们描述了一个工具包,该工具包是Apache软件基金会项目OpenNLP的一部分,旨在帮助搜索工程师完成需要文本相关性评估的任务。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号