A Machine Learning Approach For An Indonesian-english Cross Language Question Answering System

Ayu PURWARIANTI; Masatoshi TSUCHIYA; Seiichi NAKAGAWA

首页> 外文期刊>IEICE transactions on information and systems >A Machine Learning Approach For An Indonesian-english Cross Language Question Answering System

【24h】

A Machine Learning Approach For An Indonesian-english Cross Language Question Answering System

机译：A Machine Learning Approach For An Indonesian-english Cross Language Question Answering System

获取原文

获取原文并翻译 | 示例

掌桥外文数据库（机构版） >>

开具论文收录证明 >>

页面导航

摘要
著录项
相关主题

摘要

We have built a CLQA (Cross Language Question Answering) system for a source language with limited data resources (e.g. Indonesian) using a machine learning approach. The CLQA system consists of four modules: question analyzer, keyword translator, passage retriever and answer finder. We used machine learning in two modules, the question classifier (part of the question analyzer) and the answer finder. In the question classifier, we classify the EAT (Expected Answer Type) of a question by using SVM (Support Vector Machine) method. Features for the classification module are basically the output of our shallow question parsing module. To improve the classification score, we use statistical information extracted from our Indonesian corpus. In the answer finder module, using an approach different from the common approach in which answer is located by matching the named entity of the word corpus with the EAT of question, we locate the answer by text chunking the word corpus. The features for the SVM based text chunking process consist of question features, word corpus features and similarity scores between the word corpus and the question keyword. In this way, we eliminate the named entity tagging process for the target document. As for the keyword translator module, we use an Indonesian-English dictionary to translate Indonesian keywords into English. We also use some simple patterns to transform some borrowed English words. The keywords are then combined in boolean queries in order to retrieve relevant passages using IDF scores. We first conducted an experiment using 2,837 questions (about 10 are used as the test data) obtained from 18 Indonesian college students. We next conducted a similar experiment using the NTCIR (NII Test Collection for IR Systems) 2005 CLQA task by translating the English questions into Indonesian. Compared to the Japanese-English and Chinese-English CLQA results in the NTCIR 2005, we found that our system is superior to others except for one system that uses a high data resource employing 3 dictionaries. Further, a rough comparison with two other Indonesian-English CLQA systems revealed that our system achieved higher accuracy score.

著录项

来源
《IEICE transactions on information and systems》 |2007年第11期|1841-1852|共12页
作者
Ayu PURWARIANTI; Masatoshi TSUCHIYA; Seiichi NAKAGAWA;
展开▼
作者单位

Electronic and Information Department, Toyohashi University of Technology, Toyohashi-shi, 441-8580 Japan;

展开▼
收录信息美国《科学引文索引》(SCI);美国《工程索引》(EI);
原文格式 PDF
正文语种英语
中图分类无线电电子学、电信技术;
关键词
cross language question answering; indonesian-english clqa; limited resource language; machine learning;
入库时间 2024-01-25 20:04:51

A Machine Learning Approach For An Indonesian-english Cross Language Question Answering System

摘要

著录项

相关主题

期刊订阅