首页> 外文期刊>ACM transactions on Asian language information processing >Understanding Document Semantics from Summaries: A Case Study on Hindi Texts
【24h】

Understanding Document Semantics from Summaries: A Case Study on Hindi Texts

机译:从摘要中了解文档语义:以印地语文本为例

获取原文
获取原文并翻译 | 示例
       

摘要

Summary of a document contains words that actually contribute to the semantics of the document. Latent Semantic Analysis (LSA) is a mathematical model that is used to understand document semantics by deriving a semantic structure based on patterns of word correlations in the document. When using LSA to capture semantics from summaries, it is observed that LSA performs quite well despite being completely independent of any external sources of semantics. However, LSA can be remodeled to enhance its capability to analyze correlations within texts. By taking advantage of the model being language independent, this article presents two stages of LSA remodeling to understand document semantics in the Indian context, specifically from Hindi text summaries. One stage of remodeling is done by providing supplementary information, such as document category and domain information. The second stage of remodeling is done by using a supervised term weighting measure in the process. The remodeled LSA's performance is empirically evaluated in a document classification application by comparing the accuracies of classification to plain LSA. An improvement in the performance of LSA in the range of 4.7% to 6.2% is achieved from the remodel when compared to the plain model. The results suggest that summaries of documents efficiently capture the semantic structure of documents and is an alternative to full-length documents for understanding document semantics.
机译:文档摘要包含实际上有助于文档语义的单词。潜在语义分析(LSA)是一种数学模型,用于通过基于文档中单词相关性模式得出语义结构来理解文档语义。当使用LSA从摘要中捕获语义时,可以观察到LSA的表现非常好,尽管它完全独立于任何外部语义源。但是,可以对LSA进行重塑,以增强其分析文本中相关性的能力。通过利用模型独立于语言的优势,本文介绍了LSA重塑的两个阶段,以理解印度语境中的文档语义,特别是从印地语文本摘要中。通过提供补充信息(例如文档类别和域信息)来完成重构的一个阶段。重塑的第二阶段是通过在过程中使用监督的术语加权度量来完成的。通过将分类的准确性与普通LSA进行比较,可以在文档分类应用程序中根据经验评估经过重塑的LSA的性能。与普通模型相比,通过重塑可以使LSA的性能提高4.7%至6.2%。结果表明,文档摘要有效地捕获了文档的语义结构,并且是全长文档的另一种理解文档语义的方法。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号