...
首页> 外文期刊>International journal of embedded and real-time communication systems >Hierarchical Interpretable Topical Embeddings for Exploratory Search and Real-Time Document Tracking
【24h】

Hierarchical Interpretable Topical Embeddings for Exploratory Search and Real-Time Document Tracking

机译:用于探索性搜索和实时文档跟踪的分层可解释的主题嵌入

获取原文
获取原文并翻译 | 示例
   

获取外文期刊封面封底 >>

       

摘要

Real-time monitoring of scientific papers and technological news requires fast processing of complicated search demands motivated by thematically relevant information acquisition. For this case, the authors develop an exploratory search engine based on probabilistic hierarchical topic modeling. Topic model gives a low dimensional sparse interpretable vector representation (topical embedding) of a text, which is used for ranking documents by their similarity to the query. They explore several ways of comparing topical vectors including searching with thematically homogeneous text segments. Topical hierarchies are built using the regularized EM-algorithm from BigARTM project. The topic-based search achieves better precision and recall than other approaches (TF-IDF, fastText, LSTM, BERT) and even human assessors who spend up to an hour to complete the same search task. They also discover that blending hierarchical topic vectors with neural pretrained embeddings is a promising way of enriching both models that helps to get precision and recall higher than 90%.
机译:科学论文和技术新闻的实时监测需要通过主题相关信息获取的快速处理复杂的搜索需求。对于这种情况,作者基于概率分层主题建模开发探索性搜索引擎。主题模型为文本提供了一个低维稀疏可解释的矢量表示(主题嵌入),文本用于通过与查询的相似性来排序文档。他们探讨了几种比较了局部向量,包括在主题均匀文本段中搜索的几种方式。主题层次结构是使用BigArtm项目的正则化EM-算法构建的。基于主题的搜索比其他方法(TF-IDF,FastText,LSTM,BERT)甚至人工评估员均达到更好的精度和召回,均花费多小时才能完成相同的搜索任务。他们还发现,具有神经佩带嵌入的混合分层主题向量是丰富这两种模型的有希望的有希望的方法,有助于获得高于90%的精确度和召回。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号