N-Gram: a Method of Conflating Terms An Approach to Text Categorization and Question Answering Systems in the Arabic language

Riyad Al-Shalabi; Ghassan Kannan; Marwan S. Abualrub; Khalid Mohammad Nahar Mohammad Al-Modallal

首页> 外文期刊>International journal of applied science & computations >N-Gram: a Method of Conflating Terms An Approach to Text Categorization and Question Answering Systems in the Arabic language

【24h】

N-Gram: a Method of Conflating Terms An Approach to Text Categorization and Question Answering Systems in the Arabic language

机译：N-Gram：术语混用的方法阿拉伯文本分类和问答系统的方法

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

文献代查 >>

页面导航

摘要
著录项
相似文献
相关主题

摘要

Our main application program walks through the implementation of theN-Gram technique for Question Answering Systems. The goal of this program is to try to find a paragraph in an Arabic document that can serve as an answer to a question. The implementation uses the Prolog Language. The overall idea is coupling an information retrieval system with a shallow approach to natural language processing. The essential first step in accomplishing this task is the categorization of texts. We mean that for search purposes the search must be guided toward only the related categories: say science, medicine, social problems? Society? history, and other vital categories. Our paper proceeds to attack this vital step, which must be handled as a separate task. We know that this task is already completed in a typical English corpus, such as, for example, the TREC-8 context. We describe the categorization of documents in detail and we also give an overview of advanced topics in this domain. The user asks a question in unstructured language but with a careful choice of words, since document categorization is based on word occurrence information. To process the user's question we use mainly the N-gram, but to enhance the process for high occurrences success we remove some known suffixes, numbers, English words, and others, which are called Stop-Words. This process of removing words is called normalizatioa For simplicity we assumed that the collection of targeted documents is identified ahead of time r and stored as a text file. The rest of the words forming the question are farther processed by the body of our program, which sues N-Grams to compute the similarity between a word and other words from a paragraph of a selected document. Based on the similarity results, we may assign a value. Depending on the values for each word, a selected paragraph may be returned as an answer.

机译：我们的主要应用程序遍历了用于问答系统的N-Gram技术的实现。该程序的目标是尝试在阿拉伯语文档中找到一个可以回答问题的段落。该实现使用Prolog语言。总体思路是将信息检索系统与自然语言处理的浅层方法相结合。完成此任务的基本第一步是文本的分类。我们的意思是出于搜索目的，搜索必须仅针对相关类别：科学，医学，社会问题？社会？历史和其他重要类别。我们的论文着手攻击这一至关重要的步骤，必须将其作为单独的任务来处理。我们知道，该任务已经在典型的英语语料库中完成，例如TREC-8上下文。我们将详细描述文档的分类，并且还将概述该领域的高级主题。由于文档分类是基于单词出现信息，因此用户可以使用非结构化语言提问，但要谨慎选择单词。为了处理用户的问题，我们主要使用N-gram，但是为了增强高成功率的过程，我们删除了一些已知的后缀，数字，英文单词以及其他被称为Stop-Words的单词。删除单词的过程称为normalizatioa。为简单起见，我们假设目标文档的集合在时间r之前被标识并存储为文本文件。构成问题的其余单词将由我们的程序主体进一步处理，该程序会起诉N-Grams以计算所选文档段落中某个单词与其他单词之间的相似度。根据相似性结果，我们可以分配一个值。根据每个单词的值，选定的段落可能会作为答案返回。

著录项

来源
《International journal of applied science & computations》 |2005年第2期|共15页
作者
Riyad Al-Shalabi; Ghassan Kannan; Marwan S. Abualrub; Khalid Mohammad Nahar Mohammad Al-Modallal;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种 eng
中图分类自然科学总论;
关键词
N-Gram; Text Categorization; Question Answering Systems; Information Retrieval; Natural Language Processing;

机译：N-Gram文本分类问答系统信息检索自然语言处理;

相似文献

外文文献
中文文献
专利

1. N-Gram: a Method of Conflating Terms An Approach to Text Categorization and Question Answering Systems in the Arabic language [J] . Riyad Al-Shalabi, Ghassan Kannan, Marwan S. Abualrub, International journal of applied science & computations . 2005,第2期

机译：N-Gram：术语混用的方法阿拉伯文本分类和问答系统的方法
2. Evaluation of N-Gram Conflation Approaches for Arabic Text Retrieval [J] . Farag Ahmed, Andreas Nuernberger Journal of the American Society for Information Science and Technology . 2009,第7期

机译：阿拉伯文本检索的N-Gram合并方法的评估
3. Development and Evaluation of a Web Based Question Answering System for Arabic Language [J] . Heba Kurdi, Sara Alkhaider, Nada Alfaifi Computer Science & Information Technology . 2014,第2期

机译：基于Web的阿拉伯语问答系统的开发和评估
4. Answer filtering via text categorization in question answering systems [C] . Moschitti, A. . 2003

机译：通过问答系统中的文本分类进行答案过滤
5. Optimization and effectiveness of N-grams approach for indexing and retrieval in Arabic information retrieval systems. [D] . AlShehri, Abdullah Mohammed. 2002

机译：阿拉伯信息检索系统中用于索引和检索的N元语法方法的优化和有效性。
6. Text Categorization of Heart Lung and Blood Studies in the Database of Genotypes and Phenotypes (dbGaP) Utilizing n-grams and Metadata Features [O] . Mindy K. Ross, Ko-Wei Lin, Karen Truong, 2013

机译：利用n-gram和元数据特征对基因型和表型（dbGaP）数据库中的心脏肺和血液研究进行文本分类
7. Answer filtering via Text Categorization in Question Answering Systems [O] . Alessandro Moschitti 2015

机译：通过问答系统中的文本分类来回答过滤
8. State of the Art Study on Natural Language Based Question Answering Systems (Studie zum Stand der Forschung ueber Natuerlichsprachliche Frage/Antwort-Systeme) [R] . Fauser, A., Rathke, C. 1981

机译：基于自然语言的问答系统的研究现状（硕士研究生学位论文）

N-Gram: a Method of Conflating Terms An Approach to Text Categorization and Question Answering Systems in the Arabic language

摘要

著录项

相似文献

相关主题

期刊订阅