首页> 外文学位 >Identifying similarity in text: Multi-lingual analysis for summarization.

【24h】

Identifying similarity in text: Multi-lingual analysis for summarization.

机译：识别文本中的相似性：多语言分析以进行总结。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

Early work in the computational treatment of natural language focused on summarization, and machine translation. In my research I have concentrated on the area of summarization of documents in different languages. This thesis presents my work on multi-lingual text similarity. This work enables the identification of short units of text (usually sentences) that contain similar information even though they are written in different languages. I present my work on SimFinderML, a framework for multi-lingual text similarity computation that makes it easy to experiment with parameters for similarity computation and add support for other languages. An in-depth examination and evaluation of the system is performed using Arabic and English data. I also apply the concept of multi-lingual text similarity to summarization in two different systems. The first improves readability of English summaries of Arabic text by replacing machine translated Arabic sentences with highly similar English sentences when possible. The second is a novel summarization system that supports comparative analysis of Arabic and English documents in two ways. First, given Arabic and English documents that describe the same event, SimFinderML clusters sentences to present information that is supported by both the Arabic and English documents. Second, the system provides an analysis of how the Arabic and English documents differ by presenting information that is supported exclusively by documents in only one language. This novel form of summarization is a first step at analyzing the difference in perspectives from news reported in different languages.

机译：在自然语言的计算处理中的早期工作集中在摘要和机器翻译上。在我的研究中，我专注于使用不同语言的文档摘要领域。本文介绍了我在多语言文本相似性方面的工作。这项工作可以识别包含相似信息的短文本单元（通常是句子），即使它们是用不同的语言编写的。我介绍了有关SimFinderML的工作，SimFinderML是一种用于多语言文本相似度计算的框架，可以轻松地进行相似度计算的参数实验并添加对其他语言的支持。使用阿拉伯和英语数据对系统进行深入检查和评估。我还将多语言文本相似性的概念应用于两个不同系统中的汇总。第一种方法是通过尽可能将机器翻译的阿拉伯文句子替换为高度相似的英语句子，从而提高阿拉伯文文本的英语摘要的可读性。第二个是一个新颖的摘要系统，它以两种方式支持阿拉伯和英语文档的比较分析。首先，给定描述同一事件的阿拉伯语和英语文档，SimFinderML将句子聚类以呈现阿拉伯语和英语文档都支持的信息。其次，该系统通过仅以一种语言显示仅由文档支持的信息，从而分析了阿拉伯文和英文文档的差异。这种新颖的摘要形式是从不同语言报道的新闻分析观点差异的第一步。

著录项

作者
Evans, David Kirk.;
展开▼
作者单位

Columbia University.;

展开▼
授予单位 Columbia University.;
学科 Computer Science.
学位 Ph.D.
年度 2005
页码 168 p.
总页数 168
原文格式 PDF
正文语种 eng
中图分类自动化技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Text Similarity Computation Model for Identifying Rumor Based on Bayesian Network in Microblog [J] . Li Chengcheng, Liu Fengming, Li Pu The international arab journal of information technology . 2020,第5期

机译：基于MicroBlog中的贝叶斯网络识别谣言的文本相似性计算模型
2. Software development for identifying Persian text similarity [J] . Elham Mahdipour, Rahele Shojaeian Razavi, Zahra Gheibi International Journal of Intelligent Information Systems . 2014,第6a1期

机译：用于识别波斯文字相似性的软件开发
3. Identifying risks areas related to medication administrations - text mining analysis using free-text descriptions of incident reports [J] . Marja H?rk?nen, Jussi Paananen, Trevor Murrells, BMC Health Services Research . 2019,第1期

机译：识别与药物管理部门有关的风险区域 - 使用事件报告的自由文本描述进行文本挖掘分析
4. Semantic Similarity Measurements for Multi-lingual Short Texts Using Wikipedia [C] . Nakamura T., Shirakawa M., Hara T., IEEE/WIC/ACM International Joint Conferences on Web Intelligence and Intelligent Agent Technologies . 2014

机译：使用维基百科的多语言短文本的语义相似性度量
5. Identifying the gist of conversational text: Automatic keyword extraction and summarization. [D] . Liu, Fei. 2011

机译：识别对话文本的要点：自动关键词提取和汇总。
6. Identifying risks areas related to medication administrations - text mining analysis using free-text descriptions of incident reports [O] . Marja Härkänen, Jussi Paananen, Trevor Murrells, 2019

机译：识别与药物管理相关的风险领域-使用事件报告的自由文本描述进行文本挖掘分析
7. Text categorization and similarity analysis: similarity measure, literature review [O] . Fowke Michael, Hinze Annika, Heese Ralf 2013

机译：文本分类和相似性分析：相似性度量，文献综述

Identifying similarity in text: Multi-lingual analysis for summarization.

摘要

著录项

相似文献

相关主题

期刊订阅