Machine Learning and NLP (Natural Language Processing) have aided the development of new and improved user experience features in many applications. We address the problem of automatically identifying the "Start Reading Location" (SRL) of eBooks, i.e. the location of the logical beginning or start of main content. This improves eBook reading experience by taking users automatically to the logical start location without requiring them to flip through several front-matter sections such as "Dedication" and "About the Author". Automatic identification of SRL is complex since many eBooks do not adhere to any well-defined convention with respect to section naming, formatting and layout patterns. We formulate SRL as a classification problem based on detailed rule-based and NLP-based classification schemes. Our models are being used in production for Kindle eBooks and have led to a 400% increase in coverage (number of books which had SRL stamped) compared to what could be achieved earlier through an entirely manual process, while also maintaining a high accuracy of 95%.
展开▼