首页> 外文会议>International Conference on Data Mining Workshops >A Machine Learning Approach to Detecting Start Reading Location of eBooks
【24h】

A Machine Learning Approach to Detecting Start Reading Location of eBooks

机译:检测电子书启动读取位置的机器学习方法

获取原文

摘要

Machine Learning and NLP (Natural Language Processing) have aided the development of new and improved user experience features in many applications. We address the problem of automatically identifying the "Start Reading Location" (SRL) of eBooks, i.e. the location of the logical beginning or start of main content. This improves eBook reading experience by taking users automatically to the logical start location without requiring them to flip through several front-matter sections such as "Dedication" and "About the Author". Automatic identification of SRL is complex since many eBooks do not adhere to any well-defined convention with respect to section naming, formatting and layout patterns. We formulate SRL as a classification problem based on detailed rule-based and NLP-based classification schemes. Our models are being used in production for Kindle eBooks and have led to a 400% increase in coverage (number of books which had SRL stamped) compared to what could be achieved earlier through an entirely manual process, while also maintaining a high accuracy of 95%.
机译:机器学习和NLP(自然语言处理)在许多应用中辅导开发新的和改进的用户体验功能。我们解决了自动识别电子书的“开始读取位置”(SRL)的问题,即主内容的逻辑开始或开始的位置。这通过将用户自动取向逻辑起始位置来提高电子书阅读体验,而无需它们以翻转多个前后部分,例如“奉献”和“关于作者”。 SRL的自动识别是复杂的,因为许多电子书不遵守任何关于截图,格式化和布局模式的任何明确的公约。根据基于规则的基于规则和基于NLP的分类方案,我们将SRL标记为分类问题。我们的模型用于Kindle电子书的生产,并导致覆盖范围增加400%(具有SRL盖章的书籍数),而通过完全通过手动过程可以实现的内容,同时也保持高精度为95 %。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号