Quality evaluation of topics identification algorithms.

机译：主题识别算法的质量评估。

获取原文

获取原文并翻译 | 示例

页面导航

摘要
著录项
相似文献
相关主题

摘要

The need for effective text retrieval tools, such as search engines, is omnipresent in the corporate marketplace and defence industry alike. The task of indexing large quantities of text from various sources, such as news and social media is too enormous to be accomplished by humans alone. Automatically identifying keywords, or topics, from unstructured text is an important challenge. Extensive computational experiments were conducted using topic identification methods: the Retrieval Activation and Decay (ReAD) algorithm, the Priming Activation Indexing (PAI) algorithm and the Term Frequency- Inverse Document Frequency (TFIDF) method. These experiments were conducted with a subset of the well known Reuters financial dataset. The computational experiments were conducted to identify the parameters that would return higher quality topics using several well known topics quality evaluation methods: the Fl, the precision, the recall and the Normalized Mutual Information (NMI) measures. Two novel evaluation measures were also proposed: Simple Match Five (SM5) and Expanded Match Five (EM5). The results were generated using the parameters that would return high quality topics according to different computational measures. An online survey with volunteer evaluators was conducted in order to validate these results. The parameters that yielded higher topic qualities were inconsistent from one type of measurement to the next. For the chosen parameters, it was found that TFIDF produced higher quality topics than PAI, and PAI produced higher quality topics than ReAD when submitted to human evaluations. It was found that neither the proposed measures nor the established Fl measure were adequate indicators of topic quality.;Keywords: Topics Identification, Topics Evaluation, Topics Quality.

机译：对于有效的文本检索工具（例如搜索引擎）的需求在公司市场和国防工业中无处不在。索引来自各种来源（例如新闻和社交媒体）的大量文本的任务非常艰巨，无法仅靠人类来完成。从非结构化文本中自动识别关键字或主题是一项重要的挑战。使用主题识别方法进行了广泛的计算实验：检索激活和衰减（ReAD）算法，启动激活索引（PAI）算法和词频-反文档频率（TFIDF）方法。这些实验是使用著名的路透社金融数据集的子集进行的。进行了计算实验，以使用几种众所周知的主题质量评估方法来确定将返回更高质量主题的参数：F1，精度，召回率和归一化互信息（NMI）度量。还提出了两种新颖的评估措施：简单匹配五（SM5）和扩展匹配五（EM5）。结果是使用参数生成的，这些参数将根据不同的计算手段返回高质量的主题。为了验证这些结果，与志愿者评估者进行了在线调查。从一种类型的测量到另一种类型的测量，产生更高主题质量的参数是不一致的。对于选定的参数，发现当提交给人工评估时，TFIDF产生的质量主题高于PAI，PAI产生的质量主题高于ReAD。结果发现，提议的措施和既定的Fl措施都不是主题质量的充分指标。关键词：主题识别，主题评估，主题质量。

著录项

作者
Decarie, Francois Andre Martin.;
展开▼
作者单位

Royal Military College of Canada (Canada).;

展开▼
授予单位 Royal Military College of Canada (Canada).;
学科 Computer science.;Information science.
学位 M.Sc.
年度 2013
页码 150 p.
总页数 150
原文格式 PDF
正文语种 eng
中图分类
关键词

相似文献

外文文献
中文文献
专利

1. Verification of single-peptide protein identifications by the application of complementary database search algorithms. [J] . Rohrbough JG, Breci L, Merchant N, Journal of biomolecular techniques :JBT. . 2006,第5期

机译：通过应用互补数据库搜索算法验证单肽蛋白质鉴定。
2. Verification of single-peptide protein identifications by the application of complementary database search algorithms. [J] . Rohrbough JG, Breci L, Merchant N, Journal of biomolecular techniques :JBT. . 2006,第5期

机译：通过应用互补数据库搜索算法验证单肽蛋白质鉴定。
3. Standardized evaluation methodology and reference database for evaluating coronary artery centerline extraction algorithms. [J] . Schaap M, Metz CT, van Walsum T Medical image analysis . 2009,第5期

机译：用于评估冠状动脉中心线提取算法的标准化评估方法和参考数据库。
4. Machine Reading Tea Leaves: Automatically Evaluating Topic Coherence and Topic Model Quality [C] . Jey Han Lau, David Newman, Timothy Baldwin Conference of the European Chapter of the Association for Computational Linguistics . 2014

机译：机器阅读茶叶：自动评估主题连贯性和主题模型质量
5. Topics in Approximation Algorithms. [D] . Khare, Monik. 2013

机译：近似算法中的主题。
6. An evaluation of computer assisted clinical classification algorithms. [O] . C. G. Chute, Y. Yang, J. Buntrock 1994

机译：对计算机辅助临床分类算法的评估。
7. Novel localised quality of service routing algorithms. Performance evaluation of some new localised quality of service routing algorithms based on bandwidth and delay as the metrics for candidate path selection. [O] . Alghamdi Turki A. 2010

机译：新颖的本地化服务质量路由算法。基于带宽和延迟作为候选路径选择的指标，对一些新的本地化服务质量路由算法进行性能评估。
8. Evaluation of the WSSC (Weapon System Support Cost) Cost Allocation Algorithms. VII. Quantitative and Data-Related Topics [R] . Zunic, G. J., Clark, D. A., Smith, D. E., 1985

机译：评估WssC（武器系统支持成本）成本分配算法。七。定量和数据相关主题

Quality evaluation of topics identification algorithms.

摘要

著录项

相似文献

相关主题

期刊订阅