Content Modelling Intelligence System Based On Automatic Text Summarization

Sanjan S Malagi; Rachana Radhakrishnan; Monisha R.; Keerthana S.; D V Ashoka

摘要

Nowadays, within the period of having huge information, literary information is rapidly developing and is accessible in numerous diverse languages. Often due to time limitations, we are not able to devour all the information that is accessible. With the fast-paced world, it is troublesome to peruse all the textual content. Therefore, the necessity for content summarization comes to the spotlight. It is in this manner we are able to summarize the content so that it gets easier to ingest the data, keeping up the substance, and understanding the data. A few content summarization approaches have been presented in the past for a long time for English and some other European languages but there are startlingly few methods that can be found for the local languages of India. This paper presents a study of extractive content summarization methods for multiple Indian and international languages like Hindi, Kannada, Telugu, Marathi, German, French, etc. This paper proposes a system of Optical Character Recognition (OCR) which extracts the content from the uploaded picture. The main motive of the OCR is the creation of editable records from documents that already exist or picture files. The Optical Character Recognition also works on sentence discovery to protect a document’s structure. The paper also presents a strategy for programmed sentence extraction utilizing the Text-rank algorithm. This approach relegates scores to the sentences by weighting the highlights like term frequency, word events, and noun weight and expressions. The outcome of this work demonstrates that our approach gives more accuracy and also provides text-to-speech with the interpretation of one language to another while maintaining coherence and accomplishes superior results when compared with existing methods.

机译：如今，在具有巨大信息时，文学信息迅速发展，可在许多不同语言中获取。通常由于时间限制，我们无法吞噬所有可访问的信息。随着快节奏的世界，仔细仔细仔细说明所有文本内容都很麻烦。因此，内容摘要的必要性来到聚光灯。正是通过这种方式，我们能够总结内容，以便更容易摄取数据，保持物质，并理解数据。过去已经介绍了一些内容摘要方法，长期以来展示了英语和其他一些欧洲语言，但是有很少的方法可以找到印度当地语言的方法。本文提出了诸如印地语，kannada，Telugu，Marathi，德语，法语等多种印度和国际语言的提取内容摘要方法的研究。本文提出了一种从上传的内容提取内容的光学字符识别（OCR）系统。图片。 OCR的主要动机是创建已存在或图片文件的文档的可编辑记录。光学字符识别也适用于句子发现以保护文档的结构。本文还提出了利用文本排名算法的编程句子提取的策略。该方法通过加权术语频率，单词事件和名词权重和表达式来加权亮点来降级到句子的分数。这项工作的结果表明，我们的方法提供了更准确的准确性，并在与现有方法相比，在保持一致性的同时，将一种语言的解释提供给另一语言的文本和言论。

Content Modelling Intelligence System Based On Automatic Text Summarization

摘要

著录项

相关主题

期刊订阅