首页> 外文学位 >Identifying similarity in text: Multi-lingual analysis for summarization.
【24h】

Identifying similarity in text: Multi-lingual analysis for summarization.

机译:识别文本中的相似性:多语言分析以进行总结。

获取原文
获取原文并翻译 | 示例

摘要

Early work in the computational treatment of natural language focused on summarization, and machine translation. In my research I have concentrated on the area of summarization of documents in different languages. This thesis presents my work on multi-lingual text similarity. This work enables the identification of short units of text (usually sentences) that contain similar information even though they are written in different languages. I present my work on SimFinderML, a framework for multi-lingual text similarity computation that makes it easy to experiment with parameters for similarity computation and add support for other languages. An in-depth examination and evaluation of the system is performed using Arabic and English data. I also apply the concept of multi-lingual text similarity to summarization in two different systems. The first improves readability of English summaries of Arabic text by replacing machine translated Arabic sentences with highly similar English sentences when possible. The second is a novel summarization system that supports comparative analysis of Arabic and English documents in two ways. First, given Arabic and English documents that describe the same event, SimFinderML clusters sentences to present information that is supported by both the Arabic and English documents. Second, the system provides an analysis of how the Arabic and English documents differ by presenting information that is supported exclusively by documents in only one language. This novel form of summarization is a first step at analyzing the difference in perspectives from news reported in different languages.
机译:在自然语言的计算处理中的早期工作集中在摘要和机器翻译上。在我的研究中,我专注于使用不同语言的文档摘要领域。本文介绍了我在多语言文本相似性方面的工作。这项工作可以识别包含相似信息的短文本单元(通常是句子),即使它们是用不同的语言编写的。我介绍了有关SimFinderML的工作,SimFinderML是一种用于多语言文本相似度计算的框架,可以轻松地进行相似度计算的参数实验并添加对其他语言的支持。使用阿拉伯和英语数据对系统进行深入检查和评估。我还将多语言文本相似性的概念应用于两个不同系统中的汇总。第一种方法是通过尽可能将机器翻译的阿拉伯文句子替换为高度相似的英语句子,从而提高阿拉伯文文本的英语摘要的可读性。第二个是一个新颖的摘要系统,它以两种方式支持阿拉伯和英语文档的比较分析。首先,给定描述同一事件的阿拉伯语和英语文档,SimFinderML将句子聚类以呈现阿拉伯语和英语文档都支持的信息。其次,该系统通过仅以一种语言显示仅由文档支持的信息,从而分析了阿拉伯文和英文文档的差异。这种新颖的摘要形式是从不同语言报道的新闻分析观点差异的第一步。

著录项

  • 作者

    Evans, David Kirk.;

  • 作者单位

    Columbia University.;

  • 授予单位 Columbia University.;
  • 学科 Computer Science.
  • 学位 Ph.D.
  • 年度 2005
  • 页码 168 p.
  • 总页数 168
  • 原文格式 PDF
  • 正文语种 eng
  • 中图分类 自动化技术、计算机技术;
  • 关键词

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号