首页> 外文会议>Global Conference for Advancement in Technology >Categorization of Multilingual Text on Languages of Indic Script
【24h】

Categorization of Multilingual Text on Languages of Indic Script

机译:印度文字语言的多语言文本分类

获取原文

摘要

Due to the social media revolution in the world wide web of internet, a new aspect of language mixing and the ensuing language processing has opened up. There are a lot of messengers, applications and social websites that backup various languages for posting our messages and comments. Language detection and identification is a task of supervised machine learning which maps a text onto a unique language from among a set of trained languages. People many often use mixed languages in their communications. Identification of a native language from the mixed data is important as this conveys useful information. The present work explores Supervised Learning Method like K-Nearest Neighbor (KNN) technique, and character level n-gram method for the purpose of categorization of English, Bengali, Assamese, Marathi and Hindi web documents.
机译:由于Internet互联网上的社交媒体革命,语言混合和随之而来的语言处理的新方面已经打开。有很多使者,应用程序和社交网站都备份了各种语言来发布我们的消息和评论。语言检测和识别是有监督的机器学习的任务,该学习将文本从一组训练的语言中映射到唯一的语言。许多人在交流中经常使用混合语言。从混合数据中识别母语很重要,因为这可以传达有用的信息。本工作探索了诸如K-最近邻(KNN)技术之类的监督学习方法和字符级n-gram方法,以对英语,孟加拉语,阿萨姆语,马拉地语和北印度语网络文档进行分类。

著录项

相似文献

  • 外文文献
  • 中文文献
  • 专利
获取原文

客服邮箱:kefu@zhangqiaokeyan.com

京公网安备:11010802029741号 ICP备案号:京ICP备15016152号-6 六维联合信息科技 (北京) 有限公司©版权所有
  • 客服微信

  • 服务号