Effective String Processing and Matching for Author Disambiguation

Wei-Sheng Chin; Yong Zhuang; Yu-Chin Juan; Felix Wu; Hsiao-Yu Tung; Tong Yu; Jui-Pin Wang; Cheng-Xia Chang; Chun-Pai Yang; Wei-Cheng Chang; Kuan-Hao Huang; Tzu-Ming Kuo; Shan-Wei Lin; Young-San Lin; Yu-Chen Lu; Yu-Chuan Su; Cheng-Kuang Wei; Tu-Chun Yin; Chun-Liang Li; Ting-Wei Lin; Cheng-Hao Tsai; Shou-De Lin; Hsuan-Tien Lin; Chih-Jen Lin

首页> 外文期刊>Journal of machine learning research >Effective String Processing and Matching for Author Disambiguation

【24h】

Effective String Processing and Matching for Author Disambiguation

机译：有效的字符串处理和匹配，以消除作者的歧义

获取原文

获取外文期刊封面目录资料

开具论文收录证明 >>

文献代查 >>

文献数据库（团队版） >>

页面导航

摘要
著录项
引文网络
相似文献
相关主题

摘要

Track 2 of KDD Cup 2013 aims at determining duplicated authorsin a data set from Microsoft Academic Search. This type ofproblems appears in many large-scale applications that compileinformation from different sources. This paper describes oursolution developed at National Taiwan University to win thefirst prize of the competition. We propose an effective namematching framework and realize two implementations. An importantstrategy in our approach is to consider Chinese and non-Chinesenames separately because of their different naming conventions.Post-processing including merging results of two predictionsfurther boosts the performance. Our approach achieves F1-score0.99202 on the private leader board, while 0.99195 on the publicleader board. color="gray">

机译：KDD Cup 2013的第2轨旨在确定Microsoft学术搜索中数据集中的重复作者。这种类型的问题出现在许多从不同来源收集信息的大型应用程序中。本文介绍了我们在国立台湾大学开发的赢得比赛一等奖的解决方案。我们提出一个有效的名称匹配框架并实现两个实现。我们的方法中的一项重要策略是因为中文和非中文名称的命名约定不同而分别考虑它们。后处理（包括两个预测结果的合并）进一步提高了性能。我们的方法在私人排行榜上达到F1-score0.99202，而在公共排行榜上达到0.99195。 color =“ gray”>

著录项

来源
《Journal of machine learning research》 |2014年第9期|共28页
作者
Wei-Sheng Chin; Yong Zhuang; Yu-Chin Juan; Felix Wu; Hsiao-Yu Tung; Tong Yu; Jui-Pin Wang; Cheng-Xia Chang; Chun-Pai Yang; Wei-Cheng Chang; Kuan-Hao Huang; Tzu-Ming Kuo; Shan-Wei Lin; Young-San Lin; Yu-Chen Lu; Yu-Chuan Su; Cheng-Kuang Wei; Tu-Chun Yin; Chun-Liang Li; Ting-Wei Lin; Cheng-Hao Tsai; Shou-De Lin; Hsuan-Tien Lin; Chih-Jen Lin;
展开▼
作者单位

展开▼
收录信息
原文格式 PDF
正文语种
中图分类计算技术、计算机技术;
关键词

相似文献

外文文献
中文文献
专利

1. Effect of forename string on author name disambiguation [J] . Jinseok Kim, Jenna Kim Journal of the Association for Information Science and Technology . 2020,第7期

机译：ForeName String对作者名称歧义的影响
2. Cost-effective on-demand associative author name disambiguation [J] . Adriano Veloso, Anderson A. Ferreira, Marcos Andre Goncalves, Information Processing & Management . 2012,第4期

机译：具有成本效益的按需关联作者姓名消除歧义
3. Perfect Hashing Based Parallel Algorithms for Multiple String Matching on Graphic Processing Units [J] . Cheng-Hung Lin, Jin-Cheng Li, Chen-Hsiung Liu, IEEE Transactions on Parallel and Distributed Systems . 2017,第9期

机译：图形处理单元上基于完美哈希的并行多字符串匹配并行算法
4. String matching on IDP: a string matching algorithm for vector processors and its implementation [C] . Mishina, Y., Kojima, . 1993

机译：IDP上的字符串匹配：矢量处理器的字符串匹配算法及其实现
5. Things and Strings and More: Improving Place Name Disambiguation from Short Texts by Combining Entity Co-Occurrence, Topic Modeling, and Word Embedding [D] . Ju, Yiting. 2017

机译：事物和字符串和更多：通过组合实体共同发生，主题建模和单词嵌入来改善从短文本的歧义
6. A parallel approximate string matching under Levenshtein distance on graphics processing units using warp-shuffle operations [O] . ThienLuan Ho, Seung-Rohk Oh, HyunJin Kim -1

机译：使用warp-shuffle操作在图形处理单元上在Levenshtein距离下的并行近似字符串匹配
7. Effective String Processing and Matching for Author Disambiguation [O] . Wei-sheng Chin, Yong Zhuang, Yu-chin Juan, 2016

机译：作者消歧的有效字符串处理和匹配

Effective String Processing and Matching for Author Disambiguation

摘要

著录项

引文网络

相似文献

相关主题

期刊订阅